Personal tools

# 2.2.3. Extrapolating from sample to colony

A confidence interval of a statistical population parameter, for example, the mean detection rate in brood or the prevalence in the population/colony, can be estimated in a variety of ways (Reiczigel, 2003), most of which can be found in modern statistical software. We do not recommend using the (asymptotic) normal approximation to the binomial method; it gives unreasonable results for low and high prevalence. We show here Wilson’s score method (Reiczigel, 2003), defined as:

Equation V.

(2N + z± z√{z2 + 4N(1 −  )}) / 2(N + z2),

where N is the sample size; is the observed proportion as used by Reiczigel (2003) to indicate that it is an estimated quantity; and z is the 1 – α/2 quantile, which can be defined as a critical value/threshold, from the standard normal distribution. A shortcoming for all the methods, not only Wilson’s method, is that they assume bees in a sample are independent of each other (i.e. there is no over-dispersion, discussed below section 5.2.), which is typically not true, especially given the transmission routes of bee parasites and pathogens (for a detailed discussion of the shortcoming of all methods of confidence interval calculation, see Reiczigel, (2003)). If the degree of over-dispersion can be estimated, it can be used to adjust confidence limits, most easily by replacing the actual sample size with the effective sample size (if bees are not independent, then the effective sample size is smaller than the actual sample size). One calculates the effective sample size by dividing the actual sample size by the over-dispersion parameter (see section 5.2.3., design effect or deff and see Madden and Hughes (1999) for a complete explanation). The latter can be estimated as a parameter assuming the data are beta-binomial distributed, but more easily using software by assuming the distribution is quasi-binomial.  The beta-binomial distribution is a true statistical distribution, the quasi-binomial is not, but the theoretical differences are probably of less importance to practitioners than the practical differences using software. Estimating the parameters of the stochastic model and/ or the distribution which will be used to fit the data, based on a beta-binomial distribution (simultaneously estimating the linear predictor, such as regression type effects and treatment type effects, and the other parameters characterising the distribution), is typically difficult in today’s software. On the other hand, there are standard algorithms for estimating these quantities if one assumes the data are generated by a quasi-binomial distribution. Essentially, the latter includes a multiplier (not a true parameter) that brings the theoretical variance, as determined by a function of the linear predictor, to the observed variance.  This multiplier may be labelled the over-dispersion parameter in software output. The quasi-binomial distribution is typically in the part of the software that estimates generalised linear models, and requires having bees grouped in logical categories (e.g. based on age or location in a colony), and there must be replication (e.g. two groups that get treatment A, two that get treatment B, etc.). In this kind of analysis, for the dependent variable one gives the number of positive bees and the total number of bees for each category (for some software, e.g. in R, one gives the number of positive bees and the number of negative bees for each category).

Prevalence (, estimated proportion positive in the population, as in section 2.2.1. and 2.2.2.) and a 95% confidence interval based on Wilson’s score method is given in Fig. 4 for sample sizes (N) of 15, 30, and 60 bees. Note that, for the usual sample size of 30, there is still considerable uncertainty about the true infection prevalence (close to 30% if half the bees are estimated to be infected).

Fig. 4. Estimated proportion of infected bees in a population as a function of the number of bees diagnosed as positive () for various sample sizes (N = 15, 30, 60).  Lower and upper limits for a 95% confidence interval are based on Wilson’s score method.