Least squares is a term that can be used to express different concepts:
- Least squares criterion
- Least squares estimators
- Least squares method
Classically, the three terms were blended as one, and the bundle meaning of least squares is sometimes still used in statistics.
It was first formalized in 1905.
Least squares criterion
The least squares criterion, least squares objective or least square loss function is a criterion to choose parameters that minimize the sum of squared residuals.
It can be also referred as the , based on this criterion.
Pointwise loss formula:
$$\ell(y_i, f(x_i;\beta)) = (y_i – f(x_i;\beta))^2$$
Empirical loss criterion formula:
$$L_n(\beta) = \sum_{i=1}^n \ell(y_i, f(x_i;\beta)) = \sum_{i=1}^n (y_i – f(x_i;\beta))^2$$
Estimator definition formula
$$\hat{\beta} = \arg\min_{\beta \in \Theta} L_n(\beta) = \arg\min_{\beta \in \Theta} \sum_{i=1}^n (y_i – f(x_i;\beta))^2$$
The least squares criterion formula is solved using different methods.
In linear models it has a closed form, while in non-linear models it uses an iterative optimization.
This default meaning of “least square” is more used in the concept of machine learning.
It is the basis for linear regression. You can read this post about linear regression.
Least squares estimator
The least squares estimator is a solution to the least squares criterion.
It is a way to solve the least squares criterion.
Least squares estimators
Least squares types:
- OLS
- LLS
- NLS
- GLS
Ordinary least squares
Ordinary least squares (OLS) is the least squares estimation in linear regression.
Its formula is:
$$\hat{\beta} = \text{arg} \min_{\beta} L(\beta)$$
It means minimizing the squared error in this formula:
$$f(x; \beta) = \beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p$$
Linear least squares (LLS) is…
Non-linear least squares (NLLS) is…
Generalized Least Squares (GLS) is…
Least squares variants:
- Ordinary
- Penalized
- Regularized
- Weighted / Generalized
- Iterative / Robust
- Constrained / Specialized
- Dynamic / Non-linear
Penalized least squares is the learning used in the ridge algorithm.
Partial least squares
Partial least squares regression [online]. Wikipedia. Available at: https://en.wikipedia.org/wiki/Partial_least_squares_regression
Least squares method
A least square method is a way to estimate or compute the least squares estimator.
A training procedure or fitting method in linear regression is an algorithm to find the parameter values that minimize an objective function (typically, the least squares error).
It is used to establish the values for the parameters in the linear regression formulas.
Training procedure or fitting methods used in linear regression (OLS):
- Closed-form
- Normal equation
- Numerical optimization
- Gradient descent
- Conjugate gradient
- Gauss-Newton (sometimes)
- Levenberg–Marquardt
- Numerical linear algebra
- QR decomposition
- SVD
The training procedure or approach linked to linear regression is usually least squares.
Normal equation
The normal equation is a closed-form solution to the least squares estimator. Its formula is:
$$\hat{\beta} = (X^T \, X)^{-1} \, X^T y$$
The normal equation is the solution to the least squares minimization problem when the objective is quadratic.
The Gram matrix is \(X^T \, X\).
For OLS to work, \(X^T \, X\) must be invertible.
It is not invertible when there is perfect multicollinearity. You can read more about this concept on this post about statistical collinearity.
Gradient descent
You can read this post about gradient descent.
SVD
Singular value decomposition (SVD) is a standard matrix factorization technique that can decompose the training set matrix X into the matrix multiplication of three matrices.
The Moore–Penrose pseudoinverse using SVD is computed as:
$$X^{+} = V \Sigma^{+} U^T$$
To compute the matrix \Sigma^{+}, the algorithm takes \Epsilon and sets to zero all values smaller than a tiny threshold value, then it replaces all the nonzero vaues with their inverse, and finally it transposes the resulting matrix.
The framework Scikit-learn implements it in the class LinearRegression.
Reference
Reference:
- Least squares [online]. Wikipedia. Available at: https://en.wikipedia.org/wiki/Least_squares
Related entries
- Frequentist inferential statistics
- Linear regression