Personal tools

# 5.1. How to choose a simple statistical test

Before addressing the question of how to choose a test, we describe differences between parametric and non-parametric statistics. As stated in the introduction, one has to know what kind of data one has or will obtain. In the discussion below, we use a traditional definition of “parametric” versus “non-parametric tests”. In all statistical tests, parameters of one kind or another (means, medians, etc.) are estimated. The distinction has grown murkier over the years as more and more statistical distributions become available for use in contexts where previously only the normal distribution was allowed (e.g. regression, ANOVA). “Parametric” tests assume (1) models where the residuals (the variation that is not explained by the explanatory variables one is testing, i.e. inherent biological variation of the experimental units), following fitting a linear predictor of some kind, are normally distributed, or that the data follow a (2) Poisson, multinomial, or hypergeometric distribution. This definition holds for simple models only, parametric models are actually a large class of models where all essential attributes of the data can be captured by a finite number of parameters (estimated from the data), so include many distributions and both linear and non-linear models, but the distribution(s) must be specified when analysing the data. The complete definition is quite mathematical. A non-parametric test does not require that the data be samples from any particular distribution (i.e. they are distribution-free). This is the feature that makes them so popular.

For models based on the normal distribution, this does not mean that the dependent variable is normally distributed; in fact one hopes it is multimodal, with a different mode for each different treatment. However, if one subtracts (or conditions on) the linear predictor (e.g. subtract each treatment mean from its group of observations), the distribution of each resulting group (and all groups combined) follows the same normal distribution. Also, the discussion below pertains only to “simple” statistical tests and where observations are independent.

Note that chi-square and related tests are often considered “non-parametric” tests.  This is incorrect; they are very distribution dependent (data must be drawn from Poisson, multinomial, or hypergeometric distributions), and observations must be independent. While “non-parametric” tests may not require that one samples from a particular distribution, they do require that each set of samples come from the same general distribution. That is, one sample cannot come from a right-skewed distribution and the other from a left-skewed distribution; both must have the same degree of skew and in the same direction. Note that when one has dichotomous (Yes/No) or categorical data, non-parametric tests will be required if we stay in the realm of “simple” statistical tests (Fig. 4). For parametric statistics based on the normal distribution, an important second assumption is that the variance among groups of residuals is similar (homogeneous variances, also called homoscedasticity) (as shown in Fig. 5a) and not heterogeneous variances (heteroscedasticity, Fig. 5b). If only one assumption is violated, a parametric statistic is not applicable. The alternative in such a case would be to either transform the data (see Table 4 and section 5.2.), so that the transformed data no longer violate assumptions, or to conduct non-parametric statistics. The advantage of non-parametric statistics is that they do not assume a specific distribution of the data; the disadvantage is that the power (1-ß, see section 1.) is lower compared to their parametric counterparts (Wasserman, 2006), though the differences may not be great. Power itself is not of such great concern because biologically relevant effects shall be detected with a large enough effect size in a well-designed experiment. Table 3 provides a comparison between parametric and non-parametric statistics.

Fig. 5a. Two similar distributions with different means, where variances of the two groups are homogeneous; b. shows three different distributions where the means are the same but the variances of three groups are heterogeneous.

Table 3. Comparison between parametric and non-parametric statistics.

 Parametric Non-parametric Distribution Normal Any Variance Homogenous Any General data type Interval or ratio (continuous) Interval, ratio, ordinal or nominal Power Higher Lower Example Tests Correlation Pearson Spearman Independent data t-test for independent samples Mann-Whitney U test Independent data more than 2 groups One way ANOVA Kruskal Wallis ANOVA Two repeated measures, 2 groups Matched pair t-test Wilcoxon paired test Two repeated measures, >2 groups Repeated measures ANOVA Friedman ANOVA

Table 4. Links for GLMM models for analyses of data from cage experiments.

 Distribution Canonical Link Gaussian identity (no transformation) Poisson log Binomial logit Gamma inverse