10.4.2. Dispersion in statistical models
For a binomial distribution, the variance np(1-p) depends on the mean np. When the variance in the
observations is bigger or smaller than the expected variance, data are said to
show over- or under-dispersion. Both types of dispersion are indicated by the
goodness-of-fit tests of fitted models by the ratio of the residual deviance of
the fitted model to the number of degrees of freedom, values appreciably larger
than 1 indicating over-dispersion and values lower than1 indicating
under-dispersion. Both types can strongly affect and invalidate model
hypothesis testing (standard errors, confidence intervals and p-values). See
Twisk (2010), Zuur et al. (2009),
Hardin and Hilbe (2007) and Myers et al.
(2002) for examples. Causes of under- or over-dispersion can be related to the
frequency characteristics of the data, with relatively small and large
beekeepers/operations present in different numbers (heterogeneity of the sample
population). An important assumption of a binomial distribution, namely
independence of observations (independent Bernoulli trials), might be violated
when losses are not independent (are clustered) through an unknown factor (i.e.
effects of a certain location, incidence of pathogens) that cannot be used
(properly) in the model.
When under- or over-dispersion are not reduced after using the most significant model factors derived from the data and/or stratifying available data according to binomial trial size, the solution is using a different distribution for the dependent variable. A suitable candidate is the quasi-binomial distribution, in which variance is characterised by adding an additional parameter to the binomial distribution, and hypothesis testing can be corrected for the extra-binomial variance. The form of the quasi-binomial probability distribution is:
See the manual available online by Kindt and Coe (2005) for an excellent example of the use of a quasi-binomial distribution and its differences compared to the standard binomial distribution. An excess of zero values (no loss) can be a cause of over-dispersion. To investigate the relation between predictor variables and the presence of zero values (no loss), zero-inflation techniques can be used (for example, Hall (2000)).