# 3.3.2.1. Data analysis

A method of analysis alternative to F-statistics and phylogenetic trees
is assignment testing, which can be applied with several variations (Manel *et
al*., 2005). Two main types of assignment test can be distinguished:

Deterministic assignment compares the genotype of each individual, and
groups are formed according to the sampling location or other likely
categories. The assignment analysis then compares the probability for each
sampled genotype being drawn at random from its own group of individuals, or
from one or more alternative groups, based on the allele frequencies of each
group. The population of origin is determined from the probability; however, it
is also possible to reject the hypothesis that any of the reference populations
are the source of origin, based on the calculated probabilities. The software
package GENECLASS is the most advanced tool for this task (Piry *et al*.,
2004). For small sample sizes of less than 30 individuals, it is best to
consider the individual genotypes as belonging to each population (as is), for
large sample sizes it is better to remove the individual genotype from all
subgroups (“leave-one-out” approach) to avoid self-assignment.

Alternative to the classical or deterministic assignment test, Bayesian
assignment works without prior knowledge of the number of populations. Instead,
it tries to determine the best assortment of the genotypes, while varying the
number of clusters that the individuals are sorted into. The data from
microsatellites are entered raw for analysis without designation of population
origin, and the software varies the number of clusters in order to determine
not only their numbers, but also for each individual from which cluster it most
likely originates. The program STRUCTURE (Pritchard *et al*., 2000) is the
most commonly used, but there are also several other options. Ideally, the
numbers of clusters found resembles the number of populations expected by the
investigator. However, more objective methods exist to determine the optimal
number of clusters for a given data set, based on the posterior probability
calculated (Evanno *et al.,* 2005). The Bayesian method is sensitive and
can assign populations at various levels, like closely related subspecies and
more distantly related branches. However, it is important to avoid genotyping
related individuals, as the software is clearly capable of picking up
differences based on resemblance due to common ancestry. An example of this
method in honey bees is a study of various levels of introgression of *A. m.
Iigustica* into populations of *A. m. mellifera* (Jensen *et al*.,
2005). Assignment tests have also been used to detect recent hybrids using the
software NewHybrids, because individuals with intermediate probability are
likely to have mixed origin (Soland-Reckeweg *et al*., 2009).

Spatial methods have been developed for the use with DNA
microsatellites. Currently there are studies underway with the methods in
GENELAND (Guillot, 2005) and TESS (Durand, 2009), two software packages based
on Bayesian assignment, and in ADEGENET (Jombart, 2008), a PCA based software
package in the Statistical language R. We recommend analysis with ADEGENET,
which produces interesting and rapid results, even without geographical data
attached to the genotypes (Uzunov *et al.,* 2013), and it has fewer
underlying assumptions than the other methods.