# 10.4.5. Example of advanced analysis

The analysis below uses the Dutch data collected
with the full 2011 COLOSS questionnaire, as an example of how to estimate
overall loss rates, calculate confidence intervals and fit GZLMs. It uses the
quasi-binomial family of GZLMs, to account for any extra-binomial variation in
the data. It is a simple illustration of how model fitting can be done in R,
with factors and covariates, rather than a procedure for determining a best
fitting model. Guidance on model building may be found, for example, in Dobson
(2002) and Zuur *et al.* (2009).

The data was “cleaned” prior to use to remove some inconsistent values.
The “glm” procedure in R is sensitive to invalid values in the data, and will
generate error messages rather than omit the cases with invalid data values, so
it is best to deal with these before attempting model fitting (or any other
kind of analysis). The analysis below uses the variables ColOct10 as the number
of colonies kept at 1^{st} October 2010, and Loss1011, the stated
number of colonies lost over winter 2010/2011, rather than the calculated
population at risk or calculated colonies lost. Even so, in one case Loss1011
was missing and in six other cases Loss1011 was greater than ColOct10, causing
negative calculated values of a new variable, NotLost, the number of colonies
surviving. In some cases, though not all, this was due to winter management
(making in/decreases) of colonies. These few cases were also removed before
carrying out the analysis shown below.

The analysis does not show all available options for the “glm” procedure. Several diagnostic plots are available, for example.

a) Calculation of overall loss rate and confidence interval from a null model (Boxes 12-14).

b) Fitting a GZLM with an explanatory term.

The second step in model building is the use of explanatory variables. Explanation of the methods for evaluating model fit and determining optimal models is outside the scope of this document. For this example analysis, the variable Region is used. The region variable is one that is largely outside of the beekeeper’s control, rather like pesticide use by farmers, yet for various reasons may be associated with the loss rate. In some countries, region may be a substitute for meteorological variables. Boxes 15 to 18 and Fig. 3 show the analysis.

**Estimated probability of loss and 95% confidence interval per region.**

Fig. 3.Fig. 3.