11.5. Mapping and methylation assessment
- Trim the above sequence data using the Illumina Data Analysis Pipeline.
- Align bisulfite-converted sequencing reads to the honey bee genome using the BSSeeker software (http://pellegrini.mcdb.ucla.edu/BS_Seeker/BS_Seeker.html) as described in Foret et al. (2012); http://www.pnas.org/content/suppl/2012/03/12/1202392109.DCSupplemental/pnas.201202392SI.pdf#nameddest=STXT.
- Reads containing consecutive CHN nucleotides are the product of incomplete bisulfite conversion and must first be discarded.
- To increase the accuracy of methylation calls, only those cytosines fulfilling neighbourhood quality standards are counted (bases of quality 20 or more, flanked by at least three perfectly matching bases with a PHRAP quality score of 15 or more).
- The methylation status of each cytosine base can be modelled by a binomial distribution with the number of trial equal to the number of mapping reads and the probability equal to the conversion rate.
- A base is called methylated if the number of reads supporting a methylated status departed from this null model significantly at the 5% level after correcting for multiple testing.
- Differentially methylated genes are identified using generalized linear models of the binomial family; the response vector CpGmeth (number of methylated and non-methylated reads for each CpG in a gene) was modelled as a function of two discrete categorical variables, the caste and the CpG position: CpGmeth = caste * CpGi.
- P-values are corrected for multiple testing using the Benjamini and Hochberg method. These tests are carried out using the R statistical environment (http://www.r-project.org).
- Honey bee ESTs and predicted genes are loaded into a Mysql database and visualized with Gbrowse, where CpG methylation levels in queens and workers are added as separate tracks.