Least squares – RunModule

Least squares is a term that can be used to express different concepts:

Least squares criterion
Least squares estimators
Least squares method

Classically, the three terms were blended as one, and the bundle meaning of least squares is sometimes still used in statistics.

It was first formalized in 1905.

Least squares criterion

The least squares criterion, least squares objective or least square loss function is a criterion to choose parameters that minimize the sum of squared residuals.

It can be also referred as the , based on this criterion.

Pointwise loss formula:

$$\ell(y_i, f(x_i;\beta)) = (y_i – f(x_i;\beta))^2$$

Empirical loss criterion formula:

$$L_n(\beta) = \sum_{i=1}^n \ell(y_i, f(x_i;\beta)) = \sum_{i=1}^n (y_i – f(x_i;\beta))^2$$

Estimator definition formula

$$\hat{\beta} = \arg\min_{\beta \in \Theta} L_n(\beta) = \arg\min_{\beta \in \Theta} \sum_{i=1}^n (y_i – f(x_i;\beta))^2$$

The least squares criterion formula is solved using different methods.

In linear models it has a closed form, while in non-linear models it uses an iterative optimization.

This default meaning of “least square” is more used in the concept of machine learning.

It is the basis for linear regression. You can read this post about linear regression.

Least squares estimator

The least squares estimator is a solution to the least squares criterion.

It is a way to solve the least squares criterion.

Least squares estimators

Least squares types:

Ordinary least squares

Ordinary least squares (OLS) is the least squares estimation in linear regression.

Its formula is:

$$\hat{\beta} = \text{arg} \min_{\beta} L(\beta)$$

It means minimizing the squared error in this formula:

$$f(x; \beta) = \beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p$$

Linear least squares (LLS) is…

Non-linear least squares (NLLS) is…

Generalized Least Squares (GLS) is…

Least squares variants:

Ordinary
Penalized
Regularized
Weighted / Generalized
Iterative / Robust
Constrained / Specialized
Dynamic / Non-linear

Penalized least squares is the learning used in the ridge algorithm.

Partial least squares

Partial least squares regression [online]. Wikipedia. Available at: https://en.wikipedia.org/wiki/Partial_least_squares_regression

Least squares method

A least square method is a way to estimate or compute the least squares estimator.

A training procedure or fitting method in linear regression is an algorithm to find the parameter values that minimize an objective function (typically, the least squares error).

It is used to establish the values for the parameters in the linear regression formulas.

Training procedure or fitting methods used in linear regression (OLS):

Closed-form
- Normal equation
Numerical optimization
- Gradient descent
- Conjugate gradient
- Gauss-Newton (sometimes)
- Levenberg–Marquardt
Numerical linear algebra
- QR decomposition
- SVD

The training procedure or approach linked to linear regression is usually least squares.

Normal equation

The normal equation is a closed-form solution to the least squares estimator. Its formula is:

$$\hat{\beta} = (X^T \, X)^{-1} \, X^T y$$

The normal equation is the solution to the least squares minimization problem when the objective is quadratic.

The Gram matrix is $X^T \, X$.

For OLS to work, $X^T \, X$ must be invertible.

It is not invertible when there is perfect multicollinearity. You can read more about this concept on this post about statistical collinearity.

Gradient descent

You can read this post about gradient descent.

SVD

Singular value decomposition (SVD) is a standard matrix factorization technique that can decompose the training set matrix X into the matrix multiplication of three matrices.

The Moore–Penrose pseudoinverse using SVD is computed as:

$$X^{+} = V \Sigma^{+} U^T$$

To compute the matrix \Sigma^{+}, the algorithm takes \Epsilon and sets to zero all values smaller than a tiny threshold value, then it replaces all the nonzero vaues with their inverse, and finally it transposes the resulting matrix.

The framework Scikit-learn implements it in the class LinearRegression.

Reference

Reference:

Least squares [online]. Wikipedia. Available at: https://en.wikipedia.org/wiki/Least_squares

Related entries

Frequentist inferential statistics
Linear regression