Curated reference database of SSU rRNA for northern marine and freshwater communities of Archaea, Bacteria and microbial eukaryotes

Connie Lovejoy 1, André Comeau 2, Mary Thaler 1

1 Université Laval
2 Dalhousie University

Abstract

High throughput sequencing technologies, such as Roche 454 pyrosequencing and Illumina can enable semi-quantitative study of communities of single-celled organisms by generating hundreds of thousands of short sequence reads from a single environmental sample. However, to identify the taxa to which these reads belong requires a reliable database of reference sequences.
We maintain databases of taxa from all three domains of life found in marine and freshwater samples in the Canadian Arctic and subarctic, along with an accompanying file in Fasta format of the quality-checked reference sequences. These files are suitable for use in data-processing pipelines for next-generation sequencing using open-source software such as QIIME, mothur, or UPARSE, when the user wishes to assign taxonomic identities by sequence similarity to short reads.

Table 1. Number of sequences and sequence-length for three taxonomic databases
Domain Number of Sequences Mean sequence length (range) Base-pairs
Eukarya 766 440 (216-657)
Bacteria 33,293 435 (304–571)
Archaea 2288 557 (532–591)


The creation of these databases has been described in Comeau et al. 2011 and 2012. Briefly, we targeted the V4 variable region of the 18S rRNA gene for Eukarya and the V6-V8 and V3-V5 variable regions of the 16S rRNA gene for Bacteria and Archaea respectively. Reference sequences were originally imported from the SILVA database for Archaea and the Greengenes database for Bacteria, and are labeled with the original accession numbers from these databases, while the Eukarya database was assembled de novo, based on taxa found in our studies. We have edited the taxonomic identifications to reflect recent developments in the literature and included high-quality sequences from environmental clone libraries alongside cultured representatives when the former represent clades that are widespread in arctic and subarctic aquatic environments. Taxonomic identification of uncultured clones is based on well-supported phylogenetic trees, and they have been rigorously screened for potential chimeras using UCHIME (Edgar et al. 2011).
Because our focus is on single-celled organisms, our coverage of Metazoa, Fungi, and Streptophyta (land plants) from the Eukaryota database is sufficient to identify and remove these sequences from a sample, but should not be used for detailed taxonomic analysis within these groups. By the same token, chloroplast reference sequences are included in the Bacteria database primarily with the goal of identifying and removing these sequences from analysis.
These databases have been successfully used in numerous studies of microbial communities in high-latitude coastal and offshore marine environments (e.g. Comeau et al. 2011, Monier et al. 2014), as well as high-latitude lakes and ponds (Comeau et al. 2012, Negandhi et al. 2014, Crevecoeur et al. 2015).

References
Edgar, R.C., B.J. Haas, J.C. Clemente, C. Quince, R. Knight, 2011. UCHIME improves sensitivity and speed of chimera detection. Bioinformatics. doi: 10.1093/bioinformatics/btr381

Data citation

Lovejoy, C., Comeau, A., Thaler, M. 2016. Curated reference database of SSU rRNA for northern marine and freshwater communities of Archaea, Bacteria and microbial eukaryotes, v. 1.1. Nordicana D23, doi: 10.5885/45409XD-79A199B76BCC4110.

Location map

Key references

  • Comeau, A.M., T. Harding, P.E. Galand, W.F., Vincent, C. Lovejoy, 2012. Vertical distribution of microbial communities in a perennially stratified Arctic lake with saline, anoxic bottom waters. Scientific Reports, 2: 604.
    DOI: 10.1038/srep00604
  • Comeau, A.M., W.K.W. Li, J.-É. Tremblay, E.C. Carmack, C. Lovejoy, 2011. Arctic Ocean microbial community structure before and after the 2007 record sea ice minimum. PLoS One, 6: e27492.
    DOI: 10.1371/journal.pone.0027492
  • Crevecoeur, S., W.F. Vincent, J. Comte, C. Lovejoy, 2015. Bacterial community structure across environmental gradients in permafrost thaw ponds: methanotroph-rich ecosystems. Frontiers in Microbiology.
    DOI: 10.3389/fmicb.2015.00192
  • Monier, A., J. Comte, M. Babin, A. Forest, A. Matsouka, C. Lovejoy, 2014. Oceanographic structure drives the assembly processes of microbial eukaryotic communities. ISME Journal.
    DOI: 10.1038/ismej.2014.197
  • Negandhi, K., I. Laurion, C. Lovejoy, 2014. Bacterial communities and greenhouse gas emissions of shallow ponds in the High Arctic. Polar Biology.
    DOI: 10.1007/s00300-014-1555-1

Contributors

Comte, Jérôme Université Laval Crevecoeur, Sophie Université Laval Monier, Adam University of Exeter Onda, Deo Université Laval Potvin, Marianne Université Laval

Version history

You can request an older version by contacting nordicana@cen.ulaval.ca

Measurement sites

Site Latitude Longitude Altitude (m)
AO-NW01 75.99 156.87 -5 More info
Lake A 83.03 -75.43 5 More info

Supplementary material

Download

Data available for download are in ZIP format. Please properly cite the data when using it.

rRNA gene sequences (Eukarya) Get file
File: d_000012943_base.zip
Sites
Data
Taxonomy (Eukarya) Get file
File: d_000012944_base.zip
Sites
Data
Taxonomy (Eukarya) 08/2002 – 05/2008
rRNA gene sequences (Bacteria) Get file
File: d_000012942_base.zip
Sites
Data
Taxonomy (Bacteria) Get file
File: d_000012945_base.zip
Sites
Data
Taxonomy (Bacteria) 08/2002 – 05/2008
rRNA gene sequences (Archaea) Get file
File: d_000012940_base.zip
Sites
Data
Taxonomy (Archaea) Get file
File: d_000012941_base.zip
Sites
Data
Taxonomy (Archaea) 08/2002 – 05/2008