2.2. Quality filtering and data analysis
Entry into the analysis of large data sets produced by NGS may appear intimidating at first, but several software pipelines are available to assist you in these analyses. The most commonly used pipelines (based on the number of citations) are QIIME (Caporaso et al., 2010b) and mothur (Schloss et al., 2009), but other useful software, such as the Joint Genome Institute’s PyroTagger (Kunin and Hugenholtz, 2010), also exist. QIIME and mothur, with a couple of exceptions that we note below, have all of the following analyses available in their respective pipelines, and both pipelines are freely available. Both of these packages take NGS files as input and additionally take user-defined files (mapping files in QIIME or oligonucleotide files in mothur) that contain:
- primer sequences.
- sample names.
- barcode sequences specific to each sample.
- optional metadata for QIIME
These files allow the multiplexed reads to be assigned to the sample from which they came and allow for alpha and beta diversity analyses downstream in the pipeline. Some popular analyses, such as UniFrac (Lozupone and Knight, 2005; Hamady et al., 2009), are newer methods designed to explicitly deal with 16S amplicon community surveys, while other commonly used analyses have a long history of use in community ecology.