Curated reference database of SSU rRNA for northern marine and freshwater communities of Archaea, Bacteria and microbial eukaryotes
Abstract
High throughput sequencing technologies, such as Roche 454 pyrosequencing and Illumina can enable semi-quantitative study of communities of single-celled organisms by generating hundreds of thousands of short sequence reads from a single environmental sample. However, to identify the taxa to which these reads belong requires a reliable database of reference sequences.
We maintain databases of taxa from all three domains of life found in marine and freshwater samples in the Canadian Arctic and subarctic, along with an accompanying file in Fasta format of the quality-checked reference sequences. These files are suitable for use in data-processing pipelines for next-generation sequencing using open-source software such as QIIME, mothur, or UPARSE, when the user wishes to assign taxonomic identities by sequence similarity to short reads.
Table 1. Number of sequences and sequence-length for three taxonomic databases
The creation of these databases has been described in Comeau et al. 2011 and 2012. Briefly, we targeted the V4 variable region of the 18S rRNA gene for Eukarya and the V6-V8 and V3-V5 variable regions of the 16S rRNA gene for Bacteria and Archaea respectively. Reference sequences were originally imported from the SILVA database for Archaea and the Greengenes database for Bacteria, and are labeled with the original accession numbers from these databases, while the Eukarya database was assembled de novo, based on taxa found in our studies. We have edited the taxonomic identifications to reflect recent developments in the literature and included high-quality sequences from environmental clone libraries alongside cultured representatives when the former represent clades that are widespread in arctic and subarctic aquatic environments. Taxonomic identification of uncultured clones is based on well-supported phylogenetic trees, and they have been rigorously screened for potential chimeras using UCHIME (Edgar et al. 2011).
Because our focus is on single-celled organisms, our coverage of Metazoa, Fungi, and Streptophyta (land plants) from the Eukaryota database is sufficient to identify and remove these sequences from a sample, but should not be used for detailed taxonomic analysis within these groups. By the same token, chloroplast reference sequences are included in the Bacteria database primarily with the goal of identifying and removing these sequences from analysis.
These databases have been successfully used in numerous studies of microbial communities in high-latitude coastal and offshore marine environments (e.g. Comeau et al. 2011, Monier et al. 2014), as well as high-latitude lakes and ponds (Comeau et al. 2012, Negandhi et al. 2014, Crevecoeur et al. 2015).
References
Edgar, R.C., B.J. Haas, J.C. Clemente, C. Quince, R. Knight, 2011. UCHIME improves sensitivity and speed of chimera detection. Bioinformatics. doi: 10.1093/bioinformatics/btr381
We maintain databases of taxa from all three domains of life found in marine and freshwater samples in the Canadian Arctic and subarctic, along with an accompanying file in Fasta format of the quality-checked reference sequences. These files are suitable for use in data-processing pipelines for next-generation sequencing using open-source software such as QIIME, mothur, or UPARSE, when the user wishes to assign taxonomic identities by sequence similarity to short reads.
Table 1. Number of sequences and sequence-length for three taxonomic databases
| Domain | Number of Sequences | Mean sequence length (range) | Base-pairs |
| Eukarya | 766 | 440 | (216-657) |
| Bacteria | 33,293 | 435 | (304–571) |
| Archaea | 2288 | 557 | (532–591) |
The creation of these databases has been described in Comeau et al. 2011 and 2012. Briefly, we targeted the V4 variable region of the 18S rRNA gene for Eukarya and the V6-V8 and V3-V5 variable regions of the 16S rRNA gene for Bacteria and Archaea respectively. Reference sequences were originally imported from the SILVA database for Archaea and the Greengenes database for Bacteria, and are labeled with the original accession numbers from these databases, while the Eukarya database was assembled de novo, based on taxa found in our studies. We have edited the taxonomic identifications to reflect recent developments in the literature and included high-quality sequences from environmental clone libraries alongside cultured representatives when the former represent clades that are widespread in arctic and subarctic aquatic environments. Taxonomic identification of uncultured clones is based on well-supported phylogenetic trees, and they have been rigorously screened for potential chimeras using UCHIME (Edgar et al. 2011).
Because our focus is on single-celled organisms, our coverage of Metazoa, Fungi, and Streptophyta (land plants) from the Eukaryota database is sufficient to identify and remove these sequences from a sample, but should not be used for detailed taxonomic analysis within these groups. By the same token, chloroplast reference sequences are included in the Bacteria database primarily with the goal of identifying and removing these sequences from analysis.
These databases have been successfully used in numerous studies of microbial communities in high-latitude coastal and offshore marine environments (e.g. Comeau et al. 2011, Monier et al. 2014), as well as high-latitude lakes and ponds (Comeau et al. 2012, Negandhi et al. 2014, Crevecoeur et al. 2015).
References
Edgar, R.C., B.J. Haas, J.C. Clemente, C. Quince, R. Knight, 2011. UCHIME improves sensitivity and speed of chimera detection. Bioinformatics. doi: 10.1093/bioinformatics/btr381
Data citation
Lovejoy, C., Comeau, A., Thaler, M. 2016. Curated reference database of SSU rRNA for northern marine and freshwater communities of Archaea, Bacteria and microbial eukaryotes, v. 1.1. Nordicana D23, doi: 10.5885/45409XD-79A199B76BCC4110.
Location map
Key references
-
Comeau, A.M., T. Harding, P.E. Galand, W.F., Vincent, C. Lovejoy, 2012. Vertical distribution of microbial communities in a perennially stratified Arctic lake with saline, anoxic bottom waters. Scientific Reports, 2: 604.
DOI: 10.1038/srep00604 -
Comeau, A.M., W.K.W. Li, J.-É. Tremblay, E.C. Carmack, C. Lovejoy, 2011. Arctic Ocean microbial community structure before and after the 2007 record sea ice minimum. PLoS One, 6: e27492.
DOI: 10.1371/journal.pone.0027492 -
Crevecoeur, S., W.F. Vincent, J. Comte, C. Lovejoy, 2015. Bacterial community structure across environmental gradients in permafrost thaw ponds: methanotroph-rich ecosystems. Frontiers in Microbiology.
DOI: 10.3389/fmicb.2015.00192 -
Monier, A., J. Comte, M. Babin, A. Forest, A. Matsouka, C. Lovejoy, 2014. Oceanographic structure drives the assembly processes of microbial eukaryotic communities. ISME Journal.
DOI: 10.1038/ismej.2014.197 -
Negandhi, K., I. Laurion, C. Lovejoy, 2014. Bacterial communities and greenhouse gas emissions of shallow ponds in the High Arctic. Polar Biology.
DOI: 10.1007/s00300-014-1555-1
Contributors
Comte, Jérôme
Université Laval
Crevecoeur, Sophie
Université Laval
Monier, Adam
University of Exeter
Onda, Deo
Université Laval
Potvin, Marianne
Université Laval
Version history
-
Version 1.1 (2002–2008)Updated March 1, 2016
-
Version 1.0 (2002–2008)Updated December 11, 2015
You can request an older version by contacting nordicana@cen.ulaval.ca
Measurement sites
Supplementary material
Download
Data available for download are in ZIP format. Please properly cite the data when using it.
