Personal tools

5.1.1 Tests for normality and homogeneity of variances

The flow diagram in Fig. 6 gives a simple decision tree to choose the right test; for more examples, see Table 6. Starting at the top, one has to make a decision based on what kind of data one has. If two variables are categorical, then a chi-square test could be applicable. When investigating the relationship between two continuous variables, a correlation will be suitable. In the event one wants to compare two or more groups and test if they are different, one follows the pathway “difference”. The next question to answer is how many variables one wants to compare. Is it one variable (for example the effect of a new varroa treatment on brood development in a honey bee colony), or is it the effect of varroa treatment and supplementary feeding on brood development? For the latter, one could conduct a 2-way ANOVA or an even more complex model depending on the actual data set. For the former, the next question would be “how many treatments?”; sticking with the example, does the experiment consist of two groups (control and treatment) or more (control and different dosages of the treatment)? In both cases, the next decision would be based on if the data sets are independent or dependent. Relating back to the example, one could design the experiment where some of the colonies are in the treatment group and some in the control, in which case one could say that the groups are independent. However, one could as well compare before and after the application of the varroa agent, in which case all colonies would be in the before (control) and after (treatment) group. In this case it is easy to see that the before might affect the after or that the two groups are not independent. A classical example of dependent data is weight loss in humans before and after the start of diet; clearly weight loss depends on starting weight.

To arrive at an informed decision about the extent of non-normality or heterogeneity of variances in your data, a critical first step is to plot your data: i) for correlational analyses as in regression, use a scatterplot ii) for ‘groups’ (e.g. levels of a treatment factor), use a histogram or box plot; it provides an immediate indication of your data’s distribution, especially whether variances are homogeneous. The next step would be to objectively test for departures from normality and homoscedasticity. Shapiro-Wilks W, particularly for sample sizes < 50, or Lilliefors test, can be used to test for normality, and the Anderson-Darling test is of similar if not better value (Stephens, 1974). Similarly, for groups of data, Levene’s test tests the null hypothesis that different groups have equal variances. If tests are significant, assumptions that a distribution is normal or its variances are equal must be rejected and either the data has to be transformed or non-parametric tests have to be conducted.

Fig. 6. A basic decision tree on how to select the appropriate statistical test is shown.

Table 6.
Guideline to statistical analyses in honey bee research including examples/ suggestions for tests and graphical representation. Blank fields indicate that a wide variety of options are possible and all have pros and cons.

 Subject Variable Short description Fields of research where it is used Synthetic representation Measure of dispersion Statistical test Graphical representation Notes Honey bee Morphometric variables (e.g. fore-wing angles) Measures related to body size. Other data can be included here such as, for example, cuticular hydrocarbons Taxonomic studies Average Standard deviation Parametric tests such as ANOVA. Multivariate analysis such as PCA and DA Bar charts for single variables, scatterplots for PC, DA Please note that some morphometric data are ratios; consider possible deviations from normality Physiological parameters (e.g. concentration of a certain compound in the haemolymph) Measures related to the functioning of honey bee systems Average Standard deviation Bar charts or lines Survival Median Range Kaplan Meyer Cox hazard Bar charts or lines scatterplots Pathogens (e.g. DWV, Nosema) Prevalence Proportion of infected individuals Epidemiological studies Average Standard deviation can be used but transformation is necessary due to non-normal distribution Fisher exact solution or Chi square according to sample size Bar charts, pie charts Infection level Number of pathogens (e.g. viral particles) Epidemiological studies, studies on bee-parasite interaction Average Parametric tests (e.g. t test/ANOVA) can be used after log transformation otherwise non parametric tests can be used (e.g. Mann-Whitney/Kruskal-Wallis) Parasites (e.g. Varroa destructor) Fertility Proportion of reproducing females Factors of tolerance, biology of parasites Average Range Fisher exact solution or chi square according to sample size Fecundity Number of offspring per female Factors of tolerance, biology of parasites Average Standard deviation