Central Valley Project – Genetic Determination of Population of Origin
Chinook Salmon Tissue Collection
The U.S. Bureau of Reclamation screens fish from the Central Valley Project Jones Pumping Plant with the Tracy Fish Collection Facility. The National Marine Fisheries Service requires Reclamation to monitor and calculate “salvage” and “loss” for winter-run Chinook salmon, Central Valley spring-run Chinook salmon, Central Valley fall-run Chinook salmon, and Central Valley late fall-run Chinook salmon, at the Tracy Fish Collection Facility. Given requirements, Reclamation undertakes tissue sampling programs from natural-origin salmonids at the Tracy Fish Collection Facility for genetic analysis.
The procedures Reclamation uses for fish handling and data collection are described in “Standard Operating Procedures for Fish Handling Related to the Collection, Sampling, Transport, and Release of Salvaged Fish at the Central Valley Project’s Tracy Fish Collection Facility”
https://www.usbr.gov/mp/bdo/docs/lto/2020/appendix-f-tracy-fish-collection-facility-sop.pdf
Genotyping
Over the duration of this project (2011-ongoing), genotyping has occurred using two different hardware platforms and three configurations of genetic locus.
2011-2017
Genotypes consisted of multi-locus single nucleotide polymorphisms (SNP). The methods used to determine SNP genotypes were allele-specific polymerase chain reaction (ASP). Specific assays for each locus were developed by NOAA Southwest Fisheries Science Center (Clemento et al. 2011) and SNPType™ assays were obtained from Fluidigm Corp. (South San Francisco, CA) when conducting ASP. The genetic loci used were predominantly those markers that comprised the reference baseline constructed by NOAA Southwest Fisheries Science Center (Clemento et al. 2011; 2014). In total, 91 genetic loci overlap between the SNPType™ marker set and published reference population genetic baselines. Pre-amplification was performed on each locus following manufacturer recommendations. ASP was conducted following manufacture protocols using the FC1 Cycler (Fluidigm), which is specially designed for thermal cycling of Fluidigm Integrated Fluidic Circuit (IFC) arrays. ICF arrays were visualized using the BioMark (Fluidigm). Analysis of BioMark output was performed using Fluidigm genotyping analysis software. SNP designations from SNPType™ assays were standardized to reference baselines and all genotypes were translated into HapMap nucleotide standards (A=1, C=2, G=3, T=4, insertion/deletion=5, and no data=0).
2017-2019
Genotyping was conducted using amplicon sequencing (e.g. GTSeq; Campbell et al. 2014) procedures on Illumina hardware (Illumina, San Diego, CA). Genotypes consisted of multi-locus single nucleotide polymorphisms (SNP). The specific assays used for population assignment were those approved by National Marine Fisheries Service for said purpose in California, the SNP panel developed by NOAA Southwest Fisheries Science Center (Clemento et al. 2011; 2014) for population assignment. Note that alterations of published loci for amplicon sequencing process have occurred. All genotypes were translated into HapMap nucleotide standards (A=1, C=2, G=3, T=4, insertion/deletion=5, and no data=0).
2020-2021
Genotyping was conducted using amplicon sequencing (e.g. GTSeq; Campbell et al. 2014) procedures on Illumina hardware (Illumina, San Diego, CA). Genotypes consisted of multi-locus single nucleotide polymorphisms (SNP). The SNP panel used includes loci developed by NOAA Southwest Fisheries Science Center (Clemento et al. 2011; 2014) and additional loci. These additional loci are those associated with adult return time (to freshwater) located on chromosome 28 following Kock and Narum (2020) (see Adult Return Time section below). All genotypes were translated into HapMap nucleotide standards (A=1, C=2, G=3, T=4, insertion/deletion=5, and no data=0). A sex determination locus based on a Y-chromosome pseudogene described by Brunelli et al. (2008) is also present on genotyping panel.
Population Assignment
The process that sampled individuals of unknown origin were assigned to known populations was a process that compares the genotypes of an individual to those from reference populations (i.e., a genetic baseline). The likelihood that an individual’s genotype originated from a reference population is a probabilistic argument given the genotype of the unknown individual when compared to the baseline. The probability of the genotype, conditioned on the allele frequencies for each reference population, was derived following Rannala and Mountain (1997). Population composition of mixed collections were estimated by using a partial Bayesian procedure based on the likelihood of unknown-origin genotypes being derived from Clemento et al. (2011) genetic baseline reference populations given the allele frequencies for reference populations. The mixed stock analysis (MSA) procedure results in a maximum likelihood solution for stock composition (Millar 1987). Assignment posterior probabilities for a given genotype are estimated for each reference collection and reported by population aggregations represented by Evolutionary Significant Units (i.e., Winter; Spring; Fall/LateFall). Evolutionary Significant Units assignment is accomplished by extracting the assignment data from the MSA and summing the final posterior probabilities over reference populations within a reporting group. Population assignment was conducted using the ONCOR software (Steven Kalinowski unpublished, Montana State University).
The posterior probability is a relative metric of the “unknown” genotype likelihood when compared to the baseline containing individuals of “known” origin. While a higher posterior probability may denote higher “quality” (when baseline is comprehensive), currently there is no agreed upon absolute probability threshold of “correct” assignment. Rather, a tolerated error for a given application is generally determined. For population assignment, a probability stringent threshold of 0.90 was used for winter-run Chinook Salmon. To date, an assignment threshold for spring-run Chinook Salmon has not been agreed upon. At present, 0.80 was used for spring run, as 0.80 represents a statistically significant result comparing “best” to “second best” assignment. Of the true spring run that assign “most-likely” to spring reporting group (i.e., Posterior probability > 0.50), approximately 80% have a posterior probability > 0.80. Assignments between posterior probability 0.80-0.50 to any ESU were retained if ESU and reference baseline collection were the same run. If not, or if the posterior probability was < 0.50, then the sample is classified as “Unassigned” to ESU.
Adult Return Time
Chromosome 28 (Ots28) appears to have a region closely associated with migration timing for the genus Oncorhynchus (Pacific salmon, trout) and includes two gene regions (greb1L, rock1) studied extensively for associations between genotype (SNP alleles) and phenotype (behavior). As part of this line of research, genetic variation has been observed to differ substantially between adults that return from marine environment to freshwater seasonally “early” (i.e., winter-run Chinook Salmon, spring-run Chinook Salmon) and those adults returning “late” (i.e., fall-run Chinook Salmon) (Quinn et al. 2015; Prince et al. 2017; Narum et al. 2018; Meek et al. 2019; Thompson et al. 2019; Koch and Narum 2020; Thompson et al. 2020). Including genetic loci from Chromosome 28 within assignment process (see Population Assignment section above) to improve data quality requires updating reference baselines. To date, there has not been institutional support for updating reference baselines. Therefore, an independent analysis of Ots28 data, in parallel with standard population assignment has been implemented to report the status of Ots28 for each individual analyzed. This independent analysis does not alter the population assignment procedure described above but was intended to facilitate discussion about appropriate use of adult return timing information.
Adult return time genotypes consisted of eighteen Ots28 SNP loci developed by Koch and Narum (2020) (genome assembly in NCBI accession GCA_002831465.1) with divergent frequencies among adults (stocks) that return seasonally “early” (e.g., winter run) and “late” (e.g., fall run). Discriminant analysis of principle components was used to describe genetic data and infer group membership (i.e., early vs. late). Analysis was conducted in R using the adegenet package, with k-means clustering procedure constrained by using k=2, as the interest was genetic variation related to early versus late genotypes. Membership probabilities were based on retained discriminant functions stored within the adegenet dapc object. A posterior probability > 0.80 was required for assignment to a cluster, with individuals being designated as “unknown” if cluster probability was < 0.80.
Literature Cited
Brunelli, J.P., Wertzler, K.J., Sundin, K. and G.H. Thorgaard. 2008. Y-specific sequences and polymorphisms in rainbow trout and Chinook salmon. Genome, 51(9), pp.739-748.
Campbell, N.R., S.A. Harmon, and S.R. Narum. 2014. Genotyping-in-Thousands by sequencing (GT-seq): A cost effective SNP genotyping method based on custom amplicon sequencing. Mol. Ecol. Resour. 15(4): 855–867. doi: 10.1111/1755-0998.12357.
Clemento, A.J., A. Abadia-Cardoso, H.A. Starks, and J.C. Garza. 2011. Discovery and characterization of single nucleotide polymorphisms in Chinook salmon, Oncorhynchus tshawytscha. Mol. Ecol. Resour. 11: 50–66. doi: 10.1111/j.1755-0998.2010.02972.x.
Clemento, A.J., E.D. Crandall, J.C. Garza, and E.C. Anderson. 2014. Evaluation of a single nucleotide polymorphism baseline for genetic stock identification of Chinook Salmon (Oncorhynchus tshawytscha) in the California Current large marine ecosystem. Fish. Bull. 112(2–3): 112–130. doi: 10.7755/FB.112.2-3.2.
Koch, I.J. and S.R. Narum. 2020. Validation and association of candidate markers for adult migration timing and fitness in Chinook Salmon. Evolutionary applications, 13(9), pp.2316-2332.
Meek, M. H., M. R. Stephens, A. Goodbla, B. May, and M. R. Baerwald. 2019. Identifying hidden biocomplexity and genomic diversity in Chinook salmon, an imperiled species with a history of anthropogenic influence. Canadian Journal of Fisheries and Aquatic Sciences 77(3):534–547. NRC Research Press.
Millar, R.B. 1987. Maximum likelihood estimation of mixed stock fishery composition. Candian J. Fish. Aquat. Sci. 44: 583–590.
Narum, S. R., A. Di Genova, S. J. Micheletti, and A. Maass. 2018. Genomic variation underlying complex life-history traits revealed by genome sequencing in Chinook salmon. Proceedings of the Royal Society B: Biological Sciences 285(1883):20180935.
Prince, D. J., S. M. O’Rourke, T. Q. Thompson, O. A. Ali, H. S. Lyman, I. K. Saglam, T. J. Hotaling, A. P. Spidle, and M. R. Miller. 2017. The evolutionary basis of premature migration in Pacific salmon highlights the utility of genomics for informing conservation. Science Advances 3(8):e1603198.
Quinn, T. P., P. McGinnity, and T. E. Reed. 2015. The paradox of “premature migration” by adult anadromous salmonid fishes: patterns and hypotheses. Canadian Journal of Fisheries and Aquatic Sciences 73(7):1015–1030.
Rannala, B., and J.L. Mountain. 1997. Detecting immigration by using multilocus genotypes. Proceedings of the National Academy of Sciences of the United States of America 94: 9197–9201.
Thompson, T. Q., M. R. Bellinger, S. M. O’Rourke, D. J. Prince, A. E. Stevenson, A. T. Rodrigues, M. R. Sloat, C. F. Speller, D. Y. Yang, V. L. Butler, M. A. Banks, and M. R. Miller. 2019. Anthropogenic habitat alteration leads to rapid loss of adaptive variation and restoration potential in wild salmon populations. Proceedings of the National Academy of Sciences 116(1):177.
Thompson, N.F., Anderson, E.C., Clemento, A.J., Campbell, M.A., Pearse, D.E., Hearsey, J.W., Kinziger, A.P. and J.C. Garza. 2020. A complex phenotype in salmon controlled by a simple change in migratory timing. Science, 370(6516), pp.609-613.