absolute error (aka laplacian error)a loss function that can be used to measure error for a regression model [mean(abs(actual - prediction))]
argumentan input value for a function
baggingan ensemble model, where a bootstrap sample of the training data is used to construct each member of the ensemble [the term "bag" is a contraction of "bootstrap aggregation"]
Bayes classifiera classifier that uses prior, likelihood, and evidence values to estimate a posterior probability, where the predicted class is the class that has the largest posterior probability
Bayes decision boundarythe set of input values that partitions the input space into two or more distinct regions (where the maximum posterior probability is equal for two or more classes)
Bayes error ratethe error rate for a Bayes classifier
biasfor Y = f(X) + epsilon, the difference between f(x) and a prediction for Y
bias-variance trade-offmodifying the complexity of a model to minimize overall test error (where overall test error is composed of both bias and variance terms); decreasing bias increases variance, while decreasing variance increases bias
binarya qualitative variable with two possible output values (e.g. positive or negative, "1" or "0", "1" or "-1")
boostingan ensemble model, where each model added to the ensemble reduces the error produced by existing ensemble members (improving the performance of the overall ensemble)
boxplota one-dimensional graph that uses a box, whiskers, and outlier points to characterize the distribution of values for a quantitative variable [the lower bound of the box is the 25th percentile; the middle of the box is the 50th percentile (median); the upper bound of the box is the 75th percentile; the length of the whisker is a multiple of the interquartile range (75th - 25th percentile)]
categoricalanother name for qualitative
classa label for a group
classificationpredicting a qualitative output value
cluster analysisconstructing a model to map input observations to groups (e.g. customer segmentation)
conditional probabilitya posterior probability (a probability conditioned on an "evidence" expression); e.g. Pr(Y = j | X = x) [read as "the probability that the value of variable Y is equal to j given that value of variable X is equal to x"]
contour plota two-dimesional graph of three-dimensional data, where a (contour) line indicates the connected points have the same value for the third dimension
cross validationa form of resampling used for model selection
data framea data set organized into rows (observations) and columns (variables); may contain both quantitative and qualitative variables
degrees of freedoma quantity that summarizes the flexibility of a model
dependent variableanother name for an output variable
endogenous variableanother name for an output variable
error ratethe proportion of classification model predictions that are incorrect
error termin Y = f(X) + epsilon, this is the epsilon (irreducible error)
exogenous variableanother name for an input variable
expected test Mean Squared Error (MSE)another name for test MSE
expected valuethe average value for a variable [mean(x)]
featureanother name for an input variable
fitanother name for constructing (training) a model
flexibleable to model non-linear input-to-output mappings
functiona mapping from input values to output values
generalized additive modelan ensemble model where the output is the sum of component model predictions
heatmapa two-dimensional graph of three-dimensional data, where points with the same color have the same value for the third dimension (e.g. blue for smaller values, red for larger values)
histograma bar chart for a quantitative variable, where the width of the bar identifies an interval for values and the height of the bar identifies the quantity of observations within the interval
independent variableanother name for an input variable
indicator variablea variable that takes on the value 1 if an expression is true and 0 if the expression is false
input variablea variable that is passed to a model to produce output
irreducible errorerror that cannot be reduced by improving a predictive model
k-nearest neighborsthe "k" observations in the training data which are closest to an observation from the test data
least absolute shrinkage and selection operator (lasso)a form of regularization using an l1 penalty on model coefficients
least squaresa simple algorithm for constructing a linear regression model [solve(t(X) %*% X, diag(ncol(X))) %*% t(X) %*% y]
linear modela vector of coefficients, used to map an input vector to an output value via an inner product operation
logistic regressiona generalized linear model used for classification, where the linear model predicts the log odds of class membership (which is then mapped to the probability of class membership, using the logistic function)
machine learningusing data to create a model to map one-or-more input values to one-or-more output variables
matrixan array with two indices
mean squared error (aka gaussian error)a loss function that can be used to measure error for a regression model [mean((actual - prediction)^2)]
noiseanother name for irreducible error
non-parameterica model where the size of the model is variable; i.e. the "size" of the model can grow with the size of the training data
output variablea variable that is produced by a model
overfittingfitting "noise" in the training data; i.e. decreasing training set error while increasing testing set error
parametrica model where the size of the model is fixed; i.e. the "size" of the model does not grow with the size of the training data
predictoranother name for an input variable
qualitativea categorical value that identifies a quality (e.g. gender)
quantitativea numeric value that measures quantity (e.g. height)
reducible errorerror that can be reduced by improving a predictive model (e.g. adding useful input variables)
regressionpredicting a quantitative output value
responseanother name for an output variable
scalara single numeric value
scatterplota two-dimensional graph that plots points for coordinates observed in a data set
scatterplot matrixa matrix that consists of the pairwise scatterplots for a set of quantitative variables
semi-supervised learningusing both labeled and unlabeled observations to construct a model (output values are provided for only a subset of the training data)
smoothing splinea spline regression model that supports non-linear regression, by penalizing model complexity
superviseda learning algorithm is provided both input and output values for training the model (e.g. regression, classification)
support vector machinea type of classification or regression model, where prototypical observations (support vectors) from the training data are used to make predictions
systematicin Y = f(X) + epsilon, this is the f()
targetanother name for an output variable
tensoran array with more than two indices
test datadata that is used to evaluate a model, but was not used to construct the model
test Mean Squared Error (MSE)MSE for test data
testing errorthe error rate for the test data for a classification model
thin-plate splinea form of smoothing spline that supports non-linear regression
trainingthe process of constructing a model
training datadata that is used to construct a model
training errorthe error rate for the training data for a classification model
training Mean Squared Error (MSE)MSE for the training data
unsuperviseda learning algorithm is provided only input values (not output values) for training the model (e.g. dimensionality reduction, clustering)
variablea symbol that represents an attribute with more than one possible value
variancethe expected value of the squared deviation from the mean for a variable [mean((x - mean(x))^2)]
vectoran array with one index
workspacethe set of variables currently defined for your R environment