9. Choice of sample size
In a probability-based sample, the sample size can be calculated statistically in order to achieve a required level of precision of estimates from the data collected, where these estimates have been identified in advance as being of interest. The formulae required depend on the sampling scheme to be used. Schaeffer et al. (1990) give details.
For example in a simple random sample, to estimate a mean, e.g. average number of colonies kept per beekeeper, to within a distance or error bound B of the correct value with approximately 95% confidence, the formula for the sample size is where and is the variance in the population of the quantity of interest, e.g. the number of colonies kept, and is the population size. In the case of a very large population of beekeepers, where N is not known exactly, an approximation to this sample size is given by . The population variance may be estimated from the variance calculated from data in a previous survey of the same population, or from a pilot survey. To estimate a total (by the population size times the sample average) with the same precision uses this same formula but with . Box 10 provides an example of the calculations.
to maximise the
required sample size. Box
11 shows the calculations.
Box 11. Sample size calculation for a survey to estimate a proportion.
For example, using a simple random sampling approach, to estimate an overall proportion of losses which was 20% last year (so p=0.20 approximately), to within a margin of error of 5% (B=0.05) of the true value with an approximate confidence level of 95%, the sample size is calculated as follows. The population size is assumed large, but is unknown. So we use the large population version of the sample size formula for estimation of a proportion given by . Here this gives , giving exactly. So the sample should be composed of at least 256 individuals to achieve the required level of precision.
If there is more than one quantity to be estimated, as there will be in surveys of beekeepers, the larger of the relevant calculated sample sizes can be used, where this is feasible, or it can be decided to focus on one more important estimator, e.g. the proportion of beekeepers experiencing winter colony loss or the proportion experiencing CDS losses. It is then accepted that any other estimates requiring a larger sample size will be estimated with lower precision than is desirable.
For a stratified sample, which takes simple random samples from each stratum, similar calculations may be done to obtain the overall sample size required to estimate the mean or total or proportion to within an error bound B of the true value with approximately 95% confidence. See Schaeffer et al. (1990), for example, for details.
Various approaches are possible to divide the chosen sample size between the strata, including the proportional method which takes the sample size in the th stratum proportional to , where is the size of the th stratum and is the population size. This means taking , where is the th stratum weight or the proportion of the population belonging to stratum .
Neyman allocation is a more complex method which splits the sample between strata in
order to minimise the variance of the unbiased estimator of the population mean
(given by , where where and is the mean
of the sample from stratum ) or of the
total (taken as times the
estimator for the mean) by taking the th stratum sample size proportional to or , where is the variance within stratum and is is the standard deviation the variance within
stratum . So
The within stratum variances may be estimated from previous experience or a pilot survey.
To estimate a proportion (by , where is the sample proportion in stratum ), the same formula can be used for allocation as for estimating a mean, but is replaced by where is the value of the population proportion in stratum (and in practice an estimate of this is used).
The Neyman approach can also be modified, if required, to incorporate different sampling costs for each stratum. More complex modified Neyman allocation schemes are also possible (Särndal et al., 1992).
More generally it may be decided, in order to achieve a suitable coverage of the population, that a fixed percentage of the population should be sampled. For some of the COLOSS surveys, a guideline for acceptable coverage has been that, where possible, at least 5% of beekeepers should be surveyed. This is a simple way to choose sample size, especially in a non-probability sample for which sample size calculations are not valid.
Another concern in a smaller population which may be surveyed repeatedly is not to overburden individuals, but to maintain goodwill. This may mean taking a smaller sample than is ideal. Data processing concerns may also limit the sample size.
If the level of non-response can be anticipated, for example, from recent experience, the calculated or chosen sample size can be increased accordingly, in order still to give a sample of the required size, as , where is the original sample size, is the new size, and is the expected non-response rate as a proportion, e.g., .
Obtaining standard errors of estimates, or confidence intervals, as part of the data analysis indicates how precisely the various quantities of interest have been estimated (see sections 4.1.2. and 10.).