# 5.2.3. Over-dispersion in GLMM

Over-dispersion is “the polite statistician’s version of Murphy’s law: if something can go wrong, it will” (Crawley, 2013). It is particularly relevant when working with count or proportion data where variation of a response variable does not strictly conform to the Poisson or binomial distribution, respectively. Fundamentally, over-dispersion causes poor model fitting where the difference between observed and predicted values from the tested model are larger than what would be predicted by the error structure. To identify possible over-dispersion in the data for a given model, divide the deviance (−2 times the log-likelihood ratio of the reduced model, e.g. a model with only a term for the intercept, compared to the full model; see McCullagh and Nelder,1989) by its degrees of freedom: this is called the dispersion parameter. If the deviance is reasonably close to the degrees of freedom (i.e. the dispersion or scale parameter = 1) then evidence of over-dispersion is lacking.

Causes of
over-dispersion can be apparent or real. Apparent over-dispersion is due to
model misspecification, i.e. missing covariates or interactions, outliers in
the response variable, non-linear effects of covariates entered as linear
effects, the wrong link function, etc. Real over-dispersion occurs when model
misspecifications can be ruled out, and variation in the data is real due to
too many zeros, clustering of observations, or correlation between observations
(Zuur*
et al.*, 2009). Solutions to over-dispersion can include: i)
adding covariates or interactions, ii) including individual-level random
effects, e.g. using bee as a random effect, where multiple bees are observed
per cage, iii) using alternative distributions: if there is no random effect
included in the model consider quasi-binomial and quasi-Poisson; if there are, consider replacing Poisson with
negative-binomial, and iv) using a zero-inflated GLMM (a model that allows for
numerous zeros in your dataset, the frequency of the number zero is inflated)
if appropriate. Over-dispersion cannot occur for normally distributed response
variables because the variance is estimated independently from the mean. However, residuals often have “heavy tails”,
i.e. more outlying observations than expected for a normal distribution, which
nevertheless can be addressed by some software packages.