10.3.2. Loss calculations and Confidence Intervals
(1) Regarding loss rates, rather than the raw numbers of colonies kept and number of colonies lost which are used in their calculation, different quantities are of interest. The overall loss rate is the proportion calculated as the total number of lost colonies in the sample of beekeepers divided by the total number of colonies at risk of loss in the sample. (VanEngelsdorp et al. (2013) refer to this as "total loss". As this suggests to us the total number of colonies lost rather than any kind of rate or proportion, we prefer the terms overall loss rate or overall proportion of colonies lost). Adjustments can be made to this calculation to take account of colony management (VanEngelsdorp et al., 2012). The overall loss rate is influenced disproportionately by the larger beekeepers, who are fewer in number. Using this approach, confidence intervals for proportions may be calculated. There are several ways to do this.
Alternatively, the average loss rate is the average of the individual loss rates (number of colonies lost divided by number of colonies at risk) experienced by different beekeepers in the sample. Using this approach, confidence intervals should be those for an average, not a proportion. However, a difficulty of using the average loss rate is that the loss rates experienced by beekeepers with different sizes of operation are not equally variable, yet they are weighted equally in the calculation of this average. While the loss rates can only range between 0 and 1 (0 to 100%), larger scale beekeepers have many more colonies which can be lost, and can experience a much larger set of possible loss rates within this range; therefore, their loss rates are subject to greater variation. Also, there are many ties in the individual loss rates, for example due to the large number of beekeepers with no losses. The median individual loss rate could well be zero. Average individual loss rate is often higher than overall loss rate, owing to the larger number of small scale beekeepers present in many populations of beekeepers, who can suffer extreme individual loss rates. For this reason, the use of medians and Kruskal-Wallis tests to compare loss rates should be avoided. Owing to these various difficulties, we recommend use of the overall loss rate.
(2) Another difficulty is that the usual procedure to calculate standard errors and confidence intervals for the overall loss rate (the proportion of colonies lost) is based on the binomial distribution, as the number of losses is limited by the number of colonies at risk. This assumes that each bee colony is lost or not independently of any other colony, and also that the probability of loss is the same for all colonies. Within apiaries, whether or not a colony is lost is likely dependent on whether or not neighbouring colonies are lost. Furthermore, the probabilities of losing a colony are likely to differ between beekeepers. One way to account for that extra source of variation in the data is to model the data using a generalisation of the binomial distribution. There are different ways to do this. One approach uses generalised linear modelling using a quasi-binomial distribution and a logit link function, and derives a confidence interval for the overall loss rate based on the standard error of the estimated intercept in an intercept-only model (see VanEngelsdorp et al. (2012) and below).
(3) Another approach to calculating confidence intervals, when it is felt that formulae based on parametric models are not appropriate, is to use the nonparametric bootstrap approach, based on resampling the data (Efron and Tibshirani, 1994). This avoids the need to specify any particular model for the data. This is easy to implement in a software package such as R.