Overview
The data set provided here is a 3.5 year, approximately monthly
time series of microbial eukaryote (or protist) relative sequence
abundances collected by the Plumes and Blooms program. Plumes and
Blooms samples seven stations on a North-South transect in the
Santa Barbara Channel.
Sample collection and laboratory methods
Discrete seawater samples for amplicon sequencing analysis were
collected in acid-washed polycarbonate bottles from 5 L Niskin
bottles deployed on a rosette. Samples were transported to the
laboratory in a cooler until sampling particulate DNA within ~10
hr of collection. Approximately one (for samples collected at
depths ≤ 75 m) or two (for samples collected at depths > 75 m)
L samples were filtered under gentle peristaltic pressure through
47 mm 1.2 μm mixed cellulose esters or nylon filters. Filters were
stored frozen in 5 mL cryovials in 1.8 mL sucrose lysis buffer
(750 mmol L-1 sucrose, 20 mmol
L-1 EDTA, 400 mmol
L-1 NaCl, 50 mmol
L-1 Tris-HCl; pH 8.0) at -80 ºC.
Genomic DNA was extracted following the phenol-chloroform method
described in Catlett et al. (2020). The V9 region of the 18S rRNA
gene was amplified with a one-step PCR using custom dual-indexed
primers (Kozich et al. 2013) designed from the 1391F and EukB
primers (Stoeck et al. 2010) following the “Standard” method
evaluated in Catlett et al. (2020). Following purification,
normalization, and pooling of PCR products, sequencing was
performed with a MiSeq PE150 v2 kit (Illumina) at the DNA
Technologies Core of the UC Davis Genome Center. Each sequencing
run included technical PCR/sequencing triplicates of a mock
community consisting of 22 evenly represented full-length
protistan 18S amplicons, at least one no-template control PCR, and
multiple DNA extraction blanks. Data from samples amplified with
certain index primers that were found to reduce precision in our
DNA meta-barcoding workflow, and data from one sequencing run
where one negative control showed signs of contamination, were
discarded (Catlett al. 2020).
Bioinformatic methods
We used the DADA2 method (Callahan et al. 2016; v1.14.1) to
determine amplicon sequence variants (ASVs) from raw MiSeq data.
Demultiplexed sequence reads were obtained from the UC Davis
Genome Center, and forward and reverse reads were trimmed to 140
nt and 120 nt, respectively, filtered (maxEE = 2, truncQ = 2, maxN
= 0), and denoised using the DADA algorithm. The DADA error model
was parameterized for each MiSeq run using at least 108 bases.
Paired reads were then merged, overhanging sequences were trimmed,
and chimeras were removed using the “consensus” method (Callahan
et al. 2016). ASVs less than 90 nt or greater than 180 nt in
length (target amplicon is 120–130 nt) were discarded.
Initial taxonomic assignments were predicted with the RDP Bayesian
classifier (Wang et al. 2007), the DECIPHER idtaxa algorithm
(Murali et al. 2018), and the Lowest Common Ancestor algorithm
implemented in MEGAN6 (Huson et al. 2007) that analyzes BLASTN
(Altschul et al. 1990) results. The Bayesian classifier and idtaxa
algorithms used a bootstrap cutoff of 60% and 50%, respectively,
and the Lowest Common Ancestor algorithm was implemented with
default parameters. All three algorithms were implemented against
both the Protistan Ribosomal Reference database (v4.12.0; Guillou
et al. 2012) and the Silva SSU reference database (v138; Quast et
al. 2012) available for the DADA2 pipeline
(https://benjjneb.github.io/dada2/training.html).
The ensembleTax R package (v1.1.1; Catlett et al. 2021) was used
to determine ensemble taxonomic assignments based on the six
individual taxonomic assignment methods. We first mapped the
taxonomic assignments generated using the Silva reference database
and the Lowest Common Ancestor algorithm onto the Protistan
Ribosomal Reference database taxonomic nomenclature. We then
computed two sets of ensemble taxonomic assignments: the first was
used to identify prokaryotic ASVs, while the second was used for
the remainder of our analyses. All ensemble taxonomic assignments
were determined by finding the highest frequency assignment across
the (mapped, if necessary) individual taxonomic assignment
methods, excluding non-assignments. If conflicting taxonomic
assignments were found at equivalent maximum frequencies across
the six individual methods, assignments predicted by the idtaxa
algorithm, or if the highest frequency assignment was not
predicted by the idtaxa method the Bayesian classifier, were
prioritized. To identify prokaryotic ASVs, the Bayesian
classifier-Protistan Ribosomal Reference taxonomic assignments
were omitted from ensemble determinations, and taxonomic
assignments predicted with the Silva database were prioritized in
the event multiple assignments were found at equivalent maximum
frequencies. After discarding prokaryotic ASVs, a second set of
ensemble taxonomic assignments was computed following the same
procedure but considering all taxonomy predictions from all six
individual methods and prioritizing those determined with the
Protistan Ribosomal Reference over those determined with Silva.
ASVs assigned as Bacteria,
Archaea, Metazoa,
Fungi, Streptophyta,
Rhodophyta, Ulvophyceae,
or Phaeophyceae and those that were not
assigned to a kingdom or supergroup, or assigned as
Eukaryota_XX,
Opisthokonta_X,
Opisthokonta, or
Archaeplastida with unknown taxonomy at lower
ranks, were discarded. Sequence counts of each protistan ASV were
normalized to the total protistan sequence counts within each
sample to determine ASV relative sequence abundances. Where
duplicate or triplicate samples were available, mean relative
sequence abundance values were computed.
Trophic mode and phytoplankton classifications
We classified ASVs into one of four trophic modes (phototroph,
heterotroph, and constitutive or non-constitutive mixotroph) based
on their ensemble taxonomic assignments. We compiled a collection
of taxonomic names with corresponding trophic modes using the
information available in Adl et al. (2019) and following the
definitions of Mitra et al. (2016). Where a trophic mode was not
clearly defined for a particular lineage in Adl et al. (2019), we
considered additional published compilations of protistan trophic
modes and traits (Dumack et al. 2019; Ramond et al. 2019;
Schneider et al. 2020). Additional searches of both refereed
(Burki et al. 2009; Chomerat and Bilien 2014; Glucksman 2011;
Okamoto and Inouye 2005; Riisberg et al. 2009; Skovgaard et al.
2012) and non-refereed (UC Santa Cruz Ocean Data Center,
http://oceandatacenter.ucsc.edu/PhytoGallery/phytolist.html;
AlgaeBase, Guiry and Guiry 2021; and Wikipedia) sources resulted
in an additional 29 lineages in our data set assigned to trophic
functional groups.
ASVs assigned to taxonomic groups that only include
photoautotrophs, constitutive mixotrophs, or both, were assigned
as phytoplankton, while ASVs assigned to lineages comprised of
heterotrophs and/or non-constitutive mixotrophs (protists that
acquire photosynthetic capabilities through symbiosis or
horizontal transfer of chloroplasts) were assigned as
non-phytoplankton. Those lineages thought to contain
representatives of both phytoplankton and non-phytoplankton were
assigned “unknown”.
References
Adl, S. M., D. Bass, C. E. Lane, J. Lukeš, C. L. Schoch, A.
Smirnov, S. Agatha, C. Berney, and others. 2019. Revisions to the
classification, nomenclature, and diversity of eukaryotes. J.
Eukaryotic Microbiol. 66: 4–119.
https://doi.org/10.1111/jeu.12691.
Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J.
Lipman. 1990. Basic local alignment search tool. J. Mol. Biol.
215: 403–410. https://doi.org/10.1016/S0022-2836(05)80360-2.
Callahan, B. J., P. J. McMurdie, M. J. Rosen, A. W. Han, A. J. A.
Johnson, and S. P. Holmes. 2016. DADA2: high-resolution sample
inference from Illumina amplicon data. Nat. Methods. 13: 581.
https://doi.org/10.1038/nmeth.3869.
Catlett, D., P. G. Matson, C. A. Carlson, E. G. Wilbanks, D. A.
Siegel, and M. D. Iglesias‐Rodriguez. 2020. Evaluation of accuracy
and precision in an amplicon sequencing workflow for marine
protist communities. Limnol. Oceanogr.: Methods. 18(1): 20-40.
https://doi.org/10.1002/lom3.10343.
Catlett D., K. Son, and C. Liang. 2021. ensembleTax: an R package
for determinations of ensemble taxonomic assignments of
phylogenetically-informative marker gene
sequences. PeerJ. 9:e11865. https://doi.org/10.7717/peerj.11865.
Chomerat, N., and G. Bilien. 2014. Madanidinium loirii gen. et sp.
nov.(Dinophyceae), a new marine benthic dinoflagellate from
Martinique Island, Eastern Caribbean. Eur. J. Phycol. 49(2):
165-178. https://doi.org/10.1080/09670262.2014.898797.
Dumack, K., A. M. Fiore‐Donno, D. Bass, and M. Bonkowski. 2020.
Making sense of environmental sequencing data: ecologically
important functional traits of the protistan groups Cercozoa and
Endomyxa (Rhizaria). Mol. Ecol. Resour. 20(2): 398-403.
https://doi.org/10.1111/1755-0998.13112.
Glücksman, E. 2011. Taxonomy, biodiversity, and ecology of
Apusozoa (Protozoa). Doctoral dissertation. Oxford University, UK.
Guillou, L., D. Bachar, S. Audic, D. Bass, C. Berney, L. Bittner,
C. Boutte, G. Burgaud and others. 2012. The Protist Ribosomal
Reference database (PR2): a catalog of unicellular eukaryote small
sub-unit rRNA sequences with curated taxonomy. Nucleic Acids Res.
41: D597–D604. https://doi.org/10.1093/nar/gks1160.
Guiry, M.D. in Guiry, M.D. & Guiry, G.M. 2021. AlgaeBase.
World-wide electronic publication, National University of Ireland,
Galway. http://www.algaebase.org; searched on 14 June 2021.
Huson, D. H., A. F. Auch, J. Qi, and S. C. Schuster. 2007. MEGAN
analysis of metagenomic data. Genome Res. 17: 377–386.
https://doi.org/10.1101/gr.5969107.
Kozich, J. J., S. L. Westcott, N. T. Baxter, S. K. Highlander, and
P. D. Schloss. 2013. Development of a dual-index sequencing
strategy and curation pipeline for analyzing amplicon sequence
data on the MiSeq Illumina sequencing platform. Appl. Environ.
Microbiol. 79: 5112–5120.
Mitra, A., K.J. Flynn, U. Tillmann, J.A. Raven, D. Caron, D.K.
Stoecker, F. Not, P.J. Hansen, and others. 2016. Defining
planktonic protist functional groups on mechanisms for energy and
nutrient acquisition: incorporation of diverse mixotrophic
strategies. Protist. 167(2): 106-120.
https://doi.org/10.1016/j.protis.2016.01.003.
Murali, A., A. Bhargava, and E. S. Wright. 2018. IDTAXA: a novel
approach for accurate taxonomic classification of microbiome
sequences. Microbiome. 6(1): 1-14.
https://doi.org/10.1186/s40168-018-0521-5.
Okamoto, N., and I. Inouye. 2005. The katablepharids are a distant
sister group of the Cryptophyta: a proposal for
Katablepharidophyta divisio nova/Kathablepharida phylum novum
based on SSU rDNA and beta-tubulin phylogeny. Protist. 156(2):
163-179. https://doi.org/10.1016/j.protis.2004.12.003.
Quast, C., E. Pruesse, P. Yilmaz, J. Gerken, T. Schweer, P. Yarza,
J. Peplies, and F. O. Glöckner. 2012. The SILVA ribosomal RNA gene
database project: improved data processing and web-based tools.
Nucleic Acids Res. 41: D590–D596.
https://doi.org/10.1093/nar/gks1219.
Ramond, P., M. Sourisseau, N. Simon, S. Romac, S. Schmitt, F.
Rigaut‐Jalabert, N. Henry, C. De Vargas, and R. Siano. 2019.
Coupling between taxonomic and functional diversity in protistan
coastal communities. Environ. Microbiol. 21(2): 730-749.
https://doi.org/10.1111/1462-2920.14537.
Riisberg, I., R.J. Orr, R. Kluge, K. Shalchian-Tabrizi, H.A.
Bowers, V. Patil, B. Edvardsen, and K.S. Jakobsen. 2009. Seven
gene phylogeny of heterokonts. Protist. 160(2): 191-204.
https://doi.org/10.1016/j.protis.2008.11.004.
Schneider, L.K., K. Anestis, J. Mansour, A.A. Anschütz, N. Gypens,
P.J. Hansen, U. John, K. Klemm, and others. 2020. A dataset on
trophic modes of aquatic protists. Biodiversity Data Journal. 8.
https://doi.org/10.3897/BDJ.8.e56648.
Stoeck, T., D. Bass, M. Nebel, R. Christen, M. D. Jones, H.
BREINER, and T. A. Richards. 2010. Multiple marker parallel tag
environmental DNA sequencing reveals a highly complex eukaryotic
community in marine anoxic water. Molecular ecology 19: 21–31.
Skovgaard, A., S. A. Karpov, and L. Guillou. 2012. The parasitic
dinoflagellates Blastodinium spp. inhabiting the gut of marine,
planktonic copepods: morphology, ecology, and unrecognized species
diversity. Front. Microbiol. 3: 305.
https://doi.org/10.3389/fmicb.2012.00305.
Wang, Q., G. M. Garrity, J. M. Tiedje, and J. R. Cole. 2007. Naive
Bayesian classifier for rapid assignment of rRNA sequences into
the new bacterial taxonomy. Appl. Environ. Microbiol. 73:
5261–5267. https://doi.org/10.1128/AEM.00062-07.