1.1. Types of data

There are several points to consider in selecting a statistical analysis including sample size, distribution of the data, and type of data. These points and the statistical analysis in general should be considered before conducting an experiment or collecting data. One should know beforehand what kind of measurement and what type of data one is collecting. The dependent variable is the variable that may be affected by which treatment a subject is given (e.g. control vs. treated, an ANOVA framework), or as a function of some other measured variable (e.g. age, a regression framework).  Data normally include all measured quantities of an experiment (dependent and independent/predictor/factor variables). The dependent variable can be one of several types: nominal, ordinal, interval or ratio, or combinations thereof. An example of nominal data is categorical (e.g. bee location A/B/C, where the location of a bee is influenced by some explanatory variables, such as age) or dichotomous responses (yes/no). Ordinal data are also categorical, but which can be ordered sequentially. For example, the five stages of ovarian activation (Hess, 1942; Schäfer et al., 2006; Pirk et al., 2010; Carreck et al., 2013) are ordinal data because undeveloped ovaries are smaller than intermediate ovaries, which are smaller than fully developed. However, one cannot say intermediate is half of fully developed. If one assigned numbers to ranked categories, one could calculate a mean, but it would be most likely a biologically meaningless value. The third and fourth data types are interval and ratio; both carry information about the order of data points and the size of intervals between values. For example, temperature in Celsius is on an interval scale, but temperature in Kelvin is on a ratio scale. The difference is that the former has an arbitrary “zero point” and negative values are used, whereas the latter has an absolute origin of zero. Other examples of data with an absolute zero point are length, mass, angle, and duration.

The type of dependent variable data is important because it will determine the type of statistical analysis that can or cannot be used. For example, a common linear regression analysis would not be appropriate if the dependent variable is categorical. (Note: In such a case a logistic regression, discussed below, may work).