| μ (mu) | mean value, a measure of location |
| π, Π (pi) | lower-case: ratio of circumference to diameter for a circle; upper-case: product operator |
| σ, Σ (sigma) | as a variable: standard deviation (lower-case) or covariance matrix (upper-case), measuring dispersion; as an operator: summation |
| 2D | 2 Dimension |
| ABM | Adaptive Basis-function Model |
| AdaBoost | Adaptive Boosting |
| adaline | adaptive linear element |
| ADF | Assumed Density Filter |
| AIC | Akaike Information Criterion |
| ALS | Alternating Least Squares |
| ANOVA | ANalysis Of VAriance |
| ARD | Automatic Relevance Determination |
| AUC | Area Under the Curve |
| AWS | Amazon Web Services |
| BART | Bayesian Adaptive Regression Trees |
| BFGS | Broyden, Fletcher, Goldfarb, Shanno |
| BIC | Bayesian Information Criterion |
| BMA | Bayesian Model Averaging |
| BP | Belief Propagation |
| BUGS | Bayesian Updating using Gibbs Sampling |
| C4.5 | Classifier (tree) 4.5: successor to ID3 |
| CART | Classification And Regression Trees |
| CD | Contrastive Divergence |
| CDF | Cumulative Distribution Function |
| CG | Conjugate Gradient |
| CI | Central Interval |
| CI | Confidence Interval |
| CI | Credible Interval |
| CIFAR | Canadian Institute For Advanced Research |
| CNN | Convolutional Neural Network |
| COLT | COmputational Learning Theory |
| CPD | Conditional Probability Distribution |
| CPU | Central Processing Unit |
| CRC | Canada Research Chair |
| CRF | Conditional Random Field |
| CUDA | Compute Unified Device Architecture |
| CV | Cross Validation |
| d-separation | dependence separation |
| DAG | Directed Acyclic Graph |
| DBM | Deep Boltzmann Machine |
| DBN | Deep Belief Network |
| DBN | Dynamic Bayesian Network |
| DCM | Dirichlet Compound Multinomial |
| DDN | Deep Directed Network |
| DGM | Directed Graphical Model |
| DNA | DeoxyriboNucleic Acid |
| DNN | Deep Neural Network |
| dof | degrees of freedom |
| DP | Dirichlet Process |
| EB | Empirical Bayes |
| EC2 | Elastic Compute Cloud |
| ECOC | Error Correcting Output Code |
| EER | Equal Error Rate |
| EB | Empirical Bayes |
| EM | Expectation Maximization |
| EP | Expectation Propagation |
| ERM | Empirical Risk Minimization |
| exp | exponent |
| FA | Factor Analysis |
| FDR | False Discovery Rate |
| FLDA | Fisher's Linear Discriminant Analysis |
| FNR | False Negative Rate |
| FPR | False Positive Rate |
| GAM | Generalized Additive Model |
| GaP | Gamma Poisson |
| GCV | Generalized Cross Validation |
| GDA | Gaussian Discriminant Analysis |
| GGM | Gaussian Graphical Model |
| GLM | Generalized Linear Model |
| GLMM | Generalized Linear Mixture Model |
| GM | Graphical Model |
| GMM | Gaussian Mixture Model |
| GP | Gaussian Process |
| GPU | Graphics Processing Unit |
| HDI | Highest Density Interval |
| HLDA | Heteroscedastic Linear Discriminant Analysis |
| HME | Hierarchical Mixture of Experts |
| HMM | Hidden Markov Model |
| HPD | Highest Posterior Density |
| HTTP | Hyper Text Transfer Protocol |
| ICA | Independent Component Analysis |
| ICML | International Conference on Machine Learning |
| ID3 | Iterative Dichotimiser (tree) 3 |
| iff | if and only if |
| IID | Independent, Identically Distributed |
| IP | Imputation Posterior |
| IPF | Iterative Proportional Fitting |
| IRLS | Iteratively Reweighted Least Squares |
| JAGS | Just Another Gibbs Sampler |
| JTA | Junction Tree Algorithm |
| k | a popular variable name for a count; e.g. the number of nearest neighbors or the number of clusters |
| KDE | Kernel Density Estimate |
| KL | Kullback - Leibler |
| KNN | "k" Nearest Neighbor, where "k" is the number of neighbors |
| l0, l1, l2 | Lesbegue space, where the norm is defined as the "p"-th root of the sum of abolute values raised to the "p"-th power |
| L1VM | l1 regularized Vector Machine |
| LARS | Least Angle Regression |
| LASSO | Least Absolute Shrinkage and Selection Operator |
| LaTeX | Lamport TeX typesetting system [TeX: pronounced "tech"; an abbreviated form of tau, epsilon, chi] |
| L-BFGS | Limited-memory Broyden Fletcher Goldfarb Shanno |
| LBP | Loopy Belief Propagation |
| LDA | Latent Dirichlet Allocation |
| LDA | Linear Discriminant Analysis |
| LeNet5 | LeCunn convolutional neural Network 5 |
| LG-SSM | Linear Gaussian State Space Model |
| LMS | Least Mean Squares |
| log | logarithm |
| LOOCV | Leave One Out Cross Validation |
| LSTM | Long Short-Term Memory |
| LVM | Latent Variable Model |
| MAP | Maximum A Posteriori |
| MAP | Mean Average Precision |
| MAR | Missing At Random |
| MARS | Multiple Adaptive Regression Splines |
| MART | Multiple Additive Regression Trees |
| MatLab | Matrix Laboratory |
| MC | Monte Carlo |
| MCAR | Missing Completely At Random |
| MCMC | Markov Chain Monte Carlo |
| MDL | Minimum Description Length |
| MEMM | Maximum Entropy Markov Model |
| MH | Metropolis Hastings |
| MI | Mutual Information |
| MIT | Massachusetts Institute of Technology |
| ML | Machine Learning |
| ML | Maximum Likelihood |
| MLE | Maximum Likelihood Estimate |
| MLP | Multi-Layer Perceptron |
| MNIST | Modified National Institute of Standards and Technology data |
| MPCA | Multinomial Principal Component Analysis |
| MRF | Markov Random Field |
| MSE | Mean Squared Error |
| MVN | Multi-Variate Normal |
| NaN | Not a Number |
| NB | Nota Bene |
| NBC | Naive Bayes Classifier |
| NHST | Null Hypothesis Significance Testing |
| NIG | Normal Inverse Gaussian |
| NIPS | Neural Information Processing Systems conference |
| NLL | Negative Log Likelihood |
| NMAR | Not Missing At Random |
| NP | Non-deterministic Polynomial time |
| NSERC | Natural Sciences and Engineering Research Council |
| OLS | Ordinary Least Squares |
| P | Polynomial time |
| p-value | the probability of a false rejection of the null hypothesis |
| PAC | Probably Approximately Correct |
| PCA | Principal Component Analysis |
| PDF | Probability Density Function |
| PMI | Pointwise Mutual Information |
| PMTK | Probabilistic Modeling ToolKit |
| PPCA | Probabilistic Principal Component Analysis |
| PR | Precision Recall |
| QDA | Quadratic Discriminant Analysis |
| QQ | Quantile-Quantile |
| RBF | Radial Basis Function |
| RBM | Restricted Boltzmann Machine |
| RNN | Recurrent Neural Network |
| RBPF | Rao-Blackwellized Particle Filtering |
| RKHS | Reproducing Kernel Hilbert Space |
| ROC | Receiver Operating Characteristic |
| RRM | Regularized Risk Minimization |
| RSS | Residual Sum of Squares |
| RVM | Relevance Vector Machine |
| SAT | Scholastic Aptitude Test |
| SBL | Sparse Bayesian Learning |
| SdA | Stacked denoising Autoencoder |
| SGD | Stochastic Gradient Descent |
| SIR | Sampling Importance Resampling |
| SLAM | Simultaneous Localization And Mapping |
| SLT | Statistical Learning Theory |
| SpAM | Sparse Additive Model |
| SSE | Sum of Squared Errors |
| SSM | State Space Model |
| SSVM | Structural Support Vector Machine |
| SVD | Singular Value Decomposition |
| t | a popular variable name for a test statistic; as in "t" test or "t" distribution |
| TNR | False Positive Rate |
| TPR | True Positive Rate |
| UCB | Upper Confidence Bound |
| UGM | Undirected Graphical Model |
| UKF | Unscented Kalman Filter |
| VB | Variational Bayes |
| VC | Vapnik-Chervonenkis |
| VE | Variable Elimination |
| VIBES | Variational Inference on a Bayesian network |
| XOR | eXclusive OR |