Curated reference database of SSU rRNA for northern marine and freshwater communities of Archaea, Bacteria and microbial eukaryotes

High throughput sequencing technologies, such as Roche 454 pyrosequencing and Illumina can enable semi-quantitative study of communities of single-celled organisms by generating hundreds of thousands of short sequence reads from a single environmental sample. However, to identify the taxa to which these reads belong requires a reliable database of reference sequences.

We maintain databases of taxa from all three domains of life found in marine and freshwater samples in the Canadian Arctic and subarctic, along with an accompanying file in Fasta format of the quality-checked reference sequences. These files are suitable for use in data-processing pipelines for next-generation sequencing using open-source software such as QIIME, mothur, or UPARSE, when the user wishes to assign taxonomic identities by sequence similarity to short reads.

Table 1. Number of sequences and sequence-length for three taxonomic databases

Domain Number of Sequences Mean sequence length (range) Base-pairs

Eukarya 766 440 (216-657)

Bacteria 33,293 435 (304–571)

Archaea 2288 557 (532–591)

The creation of these databases has been described in Comeau et al. 2011 and 2012. Briefly, we targeted the V4 variable region of the 18S rRNA gene for Eukarya and the V6-V8 and V3-V5 variable regions of the 16S rRNA gene for Bacteria and Archaea respectively. Reference sequences were originally imported from the SILVA database for Archaea and the Greengenes database for Bacteria, and are labeled with the original accession numbers from these databases, while the Eukarya database was assembled de novo, based on taxa found in our studies. We have edited the taxonomic identifications to reflect recent developments in the literature and included high-quality sequences from environmental clone libraries alongside cultured representatives when the former represent clades that are widespread in arctic and subarctic aquatic environments. Taxonomic identification of uncultured clones is based on well-supported phylogenetic trees, and they have been rigorously screened for potential chimeras using UCHIME (Edgar et al. 2011).

Because our focus is on single-celled organisms, our coverage of Metazoa, Fungi, and Streptophyta (land plants) from the Eukaryota database is sufficient to identify and remove these sequences from a sample, but should not be used for detailed taxonomic analysis within these groups. By the same token, chloroplast reference sequences are included in the Bacteria database primarily with the goal of identifying and removing these sequences from analysis.

These databases have been successfully used in numerous studies of microbial communities in high-latitude coastal and offshore marine environments (e.g. Comeau et al. 2011, Monier et al. 2014), as well as high-latitude lakes and ponds (Comeau et al. 2012, Negandhi et al. 2014, Crevecoeur et al. 2015).

References Edgar, R.C., B.J. Haas, J.C. Clemente, C. Quince, R. Knight, 2011. UCHIME improves sensitivity and speed of chimera detection. Bioinformatics. doi: 10.1093/bioinformatics/btr381

Data and Resources

Additional Info

Field Value
Source https://nordicana.cen.ulaval.ca/en/publication.php?doi=45409XD-79A199B76BCC4110
Version 1.1
Citation Lovejoy, C., Comeau, A., Thaler, M. 2016. Curated reference database of SSU rRNA for northern marine and freshwater communities of Archaea, Bacteria and microbial eukaryotes, v. 1.1 (2002-2008). Nordicana D23, doi: 10.5885/45409XD-79A199B76BCC4110.
Temporal coverage 1
Temporal coverage start
2002-08-26
Temporal coverage end
2008-05-01
Spatial coverage { "coordinates": [ 156.87, 75.99 ], "type": "Point" }
Station cen-whapmagoostui-kuujuarapik-research-station
Collaborator nordicana-d
Variable measured 1
Variable name
rRNA gene sequences (Eukarya)
Variable description
Reference sequences of the variable region of the small-subunit ribosomal RNA gene for taxa found in the Arctic. Sequences are in FASTA format.
Variable unit
Variable URL
https://nordicana.cen.ulaval.ca/en/infodonnees.php?id=1272
Measurement Technique
Date published
Status
Publisher 1
Publisher name
Nordicana D
Publisher URL
https://nordicana.cen.ulaval.ca/en/
Provider 1
Provider name
Centre for Nordic Studies (CEN)
Provider URL
https://www.cen.ulaval.ca/