Statistical deviance

In statistics, deviance (desviación in Spanish) is a measure of how bad a statistical model fits the data. It specifically measures the “lack of fit”.

The deviance is defined as -2 times the log-likelihood ratio of the fitted model compared to a saturated model (a theoretical model that fits the data perfectly). The smaller the deviance, the better the fit.

It is used in logistic regression to assess models in the same way RSS (Residual Sum of Squares) is used in linear regression models.

$$\text{deviance} =−2 \log \ell(\hat{\theta})$$

$$\text{deviance} = -2 \log \left( \frac{\ell(\hat{\theta}_{\text{model}})}{\ell(\hat{\theta}_{\text{saturated}})} \right)$$

It is used in logistic regression to assess models, in the same way RSS is used in linear regression models.

Deviance types

Deviance types:

  • Null deviance
  • Residual deviance

The null deviance only considers the intercept (no predictors). It represents the total variation in the data and is used as a baseline reference.

The residual deviance t considers the predictors in your model. It represents the variation that remains unexplained after fitting the model.

The difference between them shows how much the model improved by adding predictors, following a Chi-squared distribution:

$$\text{Null deviance}-\text{Residual deviance} =−2 \log \text{LR} ~ \chi^2$$

Deviance residuals

In classification, residuals are not calculated by simple subtraction because the outcome is categorical (classes). However, we do have an expected value (the predicted probability \(\hat{p}\)​) and an observed value (the actual class \(y\)).

The deviance residual calculates how much each individual observation contributes to the total deviance based on the probability the model assigned to the correct class.

It is calculated for a single observation i as:

$$\text{deviance residual}_i = \pm \sqrt{-2 \log{\hat{p}_{i,\text{correct class}}}}$$

A value close to zero means the observed value is well explained by the model (the model gave the correct class a high probability).

A high absolute value means the observed value is not well explained (the model was “surprised” by the actual outcome).

Bibliography

Related entries

  • Logistic regression

Leave a Reply

Your email address will not be published. Required fields are marked *