# 1.1. Types of data

There
are several points to consider in selecting a statistical analysis including
sample size, distribution of the data, and type of data. These points and the
statistical analysis in general should be considered ** before** conducting an experiment or collecting data. One
should know beforehand what kind of measurement and what type of data one is
collecting. The dependent variable is the variable that may be affected by
which treatment a subject is given (e.g. control

*vs.*treated, an ANOVA framework), or as a function of some other measured variable (e.g. age, a regression framework). Data normally include all measured quantities of an experiment (dependent and independent/predictor/factor variables). The dependent variable can be one of several types: nominal, ordinal, interval or ratio, or combinations thereof. An example of nominal data is categorical (e.g. bee location A/B/C, where the location of a bee is influenced by some explanatory variables, such as age) or dichotomous responses (yes/no). Ordinal data are also categorical, but which can be ordered sequentially. For example, the five stages of ovarian activation (Hess, 1942; Schäfer

*et al.*, 2006; Pirk

*et al.*, 2010; Carreck

*et al.*, 2013) are ordinal data because undeveloped ovaries are smaller than intermediate ovaries, which are smaller than fully developed. However, one cannot say intermediate is half of fully developed. If one assigned numbers to ranked categories, one could calculate a mean, but it would be most likely a biologically meaningless value. The third and fourth data types are interval and ratio; both carry information about the order of data points and the size of intervals between values. For example, temperature in Celsius is on an interval scale, but temperature in Kelvin is on a ratio scale. The difference is that the former has an arbitrary “zero point” and negative values are used, whereas the latter has an absolute origin of zero. Other examples of data with an absolute zero point are length, mass, angle, and duration.

The type of dependent variable data is important because it will determine the type of statistical analysis that can or cannot be used. For example, a common linear regression analysis would not be appropriate if the dependent variable is categorical. (Note: In such a case a logistic regression, discussed below, may work).