μ (mu)  mean value, a measure of location 
 πΠ (pi)  lower-case: ratio of circumference to diameter for a circle; upper-case: product operator 
 σΣ (sigma)  as a variable: standard deviation (lower-case) or covariance matrix (upper-case), measuring dispersion; as an operator: summation 
 2D  2 Dimension 
 ABM  Adaptive Basis-function Model 
 AdaBoost  Adaptive Boosting 
 adaline  adaptive linear element 
 ADF  Assumed Density Filter 
 AIC  Akaike Information Criterion 
 ALS  Alternating Least Squares 
 ANOVA  ANalysis Of VAriance 
 ARD  Automatic Relevance Determination 
 AUC  Area Under the Curve 
 AWS  Amazon Web Services 
 BART  Bayesian Adaptive Regression Trees 
 BFGS  Broyden, Fletcher, Goldfarb, Shanno 
 BIC  Bayesian Information Criterion 
 BMA  Bayesian Model Averaging 
 BP  Belief Propagation 
 BUGS  Bayesian Updating using Gibbs Sampling 
 C4.5  Classifier (tree) 4.5: successor to ID3 
 CART  Classification And Regression Trees 
 CD  Contrastive Divergence 
 CDF  Cumulative Distribution Function 
 CG  Conjugate Gradient 
 CI  Central Interval 
 CI  Confidence Interval 
 CI  Credible Interval 
 CIFAR  Canadian Institute For Advanced Research 
 CNN  Convolutional Neural Network 
 COLT  COmputational Learning Theory 
 CPD  Conditional Probability Distribution 
 CPU  Central Processing Unit 
 CRC  Canada Research Chair 
 CRF  Conditional Random Field 
 CUDA  Compute Unified Device Architecture 
 CV  Cross Validation 
 d-separation  dependence separation 
 DAG  Directed Acyclic Graph 
 DBM  Deep Boltzmann Machine 
 DBN  Deep Belief Network 
 DBN  Dynamic Bayesian Network 
 DCM  Dirichlet Compound Multinomial 
 DDN  Deep Directed Network 
 DGM  Directed Graphical Model 
 DNA  DeoxyriboNucleic Acid 
 DNN  Deep Neural Network 
 dof  degrees of freedom 
 DP  Dirichlet Process 
 EB  Empirical Bayes 
 EC2  Elastic Compute Cloud 
 ECOC  Error Correcting Output Code 
 EER  Equal Error Rate 
 EB  Empirical Bayes 
 EM  Expectation Maximization 
 EP  Expectation Propagation 
 ERM  Empirical Risk Minimization 
 exp  exponent 
 FA  Factor Analysis 
 FDR  False Discovery Rate 
 FLDA  Fisher's Linear Discriminant Analysis 
 FNR  False Negative Rate 
 FPR  False Positive Rate 
 GAM  Generalized Additive Model 
 GaP  Gamma Poisson 
 GCV  Generalized Cross Validation 
 GDA  Gaussian Discriminant Analysis 
 GGM  Gaussian Graphical Model 
 GLM  Generalized Linear Model 
 GLMM  Generalized Linear Mixture Model 
 GM  Graphical Model 
 GMM  Gaussian Mixture Model 
 GP  Gaussian Process 
 GPU  Graphics Processing Unit 
 HDI  Highest Density Interval 
 HLDA  Heteroscedastic Linear Discriminant Analysis 
 HME  Hierarchical Mixture of Experts 
 HMM  Hidden Markov Model 
 HPD  Highest Posterior Density 
 HTTP  Hyper Text Transfer Protocol 
 ICA  Independent Component Analysis 
 ICML  International Conference on Machine Learning 
 ID3  Iterative Dichotimiser (tree) 3 
 iff  if and only if 
 IID  Independent, Identically Distributed 
 IP  Imputation Posterior 
 IPF  Iterative Proportional Fitting 
 IRLS  Iteratively Reweighted Least Squares 
 JAGS  Just Another Gibbs Sampler 
 JTA  Junction Tree Algorithm 
 k  a popular variable name for a count; e.g. the number of nearest neighbors or the number of clusters 
 KDE  Kernel Density Estimate 
 KL  Kullback - Leibler 
 KNN  "k" Nearest Neighbor, where "k" is the number of neighbors 
 l0, l1, l2  Lesbegue space, where the norm is defined as the "p"-th root of the sum of abolute values raised to the "p"-th power 
 L1VM  l1 regularized Vector Machine 
 LARS  Least Angle Regression 
 LASSO  Least Absolute Shrinkage and Selection Operator 
 LaTeX  Lamport TeX typesetting system [TeX: pronounced "tech"; an abbreviated form of tau, epsilon, chi] 
 L-BFGS  Limited-memory Broyden Fletcher Goldfarb Shanno 
 LBP  Loopy Belief Propagation 
 LDA  Latent Dirichlet Allocation 
 LDA  Linear Discriminant Analysis 
 LeNet5  LeCunn convolutional neural Network 5 
 LG-SSM  Linear Gaussian State Space Model 
 LMS  Least Mean Squares 
 log  logarithm 
 LOOCV  Leave One Out Cross Validation 
 LSTM  Long Short-Term Memory 
 LVM  Latent Variable Model 
 MAP  Maximum A Posteriori 
 MAP  Mean Average Precision 
 MAR  Missing At Random 
 MARS  Multiple Adaptive Regression Splines 
 MART  Multiple Additive Regression Trees 
 MatLab  Matrix Laboratory 
 MC  Monte Carlo 
 MCAR  Missing Completely At Random 
 MCMC  Markov Chain Monte Carlo 
 MDL  Minimum Description Length 
 MEMM  Maximum Entropy Markov Model 
 MH  Metropolis Hastings 
 MI  Mutual Information 
 MIT  Massachusetts Institute of Technology 
 ML  Machine Learning 
 ML  Maximum Likelihood 
 MLE  Maximum Likelihood Estimate 
 MLP  Multi-Layer Perceptron 
 MNIST  Modified National Institute of Standards and Technology data 
 MPCA  Multinomial Principal Component Analysis 
 MRF  Markov Random Field 
 MSE  Mean Squared Error 
 MVN  Multi-Variate Normal 
 NaN  Not a Number 
 NB  Nota Bene 
 NBC  Naive Bayes Classifier 
 NHST  Null Hypothesis Significance Testing 
 NIG  Normal Inverse Gaussian 
 NIPS  Neural Information Processing Systems conference 
 NLL  Negative Log Likelihood 
 NMAR  Not Missing At Random 
 NP  Non-deterministic Polynomial time 
 NSERC  Natural Sciences and Engineering Research Council 
 OLS  Ordinary Least Squares 
 P  Polynomial time 
 p-value  the probability of a false rejection of the null hypothesis 
 PAC  Probably Approximately Correct 
 PCA  Principal Component Analysis 
 PDF  Probability Density Function 
 PMI  Pointwise Mutual Information 
 PMTK  Probabilistic Modeling ToolKit 
 PPCA  Probabilistic Principal Component Analysis 
 PR  Precision Recall 
 QDA  Quadratic Discriminant Analysis 
 QQ  Quantile-Quantile 
 RBF  Radial Basis Function 
 RBM  Restricted Boltzmann Machine 
 RNN  Recurrent Neural Network 
 RBPF  Rao-Blackwellized Particle Filtering 
 RKHS  Reproducing Kernel Hilbert Space 
 ROC  Receiver Operating Characteristic 
 RRM  Regularized Risk Minimization 
 RSS  Residual Sum of Squares 
 RVM  Relevance Vector Machine 
 SAT  Scholastic Aptitude Test 
 SBL  Sparse Bayesian Learning 
 SdA  Stacked denoising Autoencoder 
 SGD  Stochastic Gradient Descent 
 SIR  Sampling Importance Resampling 
 SLAM  Simultaneous Localization And Mapping 
 SLT  Statistical Learning Theory 
 SpAM  Sparse Additive Model 
 SSE  Sum of Squared Errors 
 SSM  State Space Model 
 SSVM  Structural Support Vector Machine 
 SVD  Singular Value Decomposition 
 t  a popular variable name for a test statistic; as in "t" test or "t" distribution 
 TNR  False Positive Rate 
 TPR  True Positive Rate 
 UCB  Upper Confidence Bound 
 UGM  Undirected Graphical Model 
 UKF  Unscented Kalman Filter 
 VB  Variational Bayes 
 VC  Vapnik-Chervonenkis 
 VE  Variable Elimination 
 VIBES  Variational Inference on a Bayesian network 
 XOR  eXclusive OR