ResearchPad - population-and-evolutionary-genetics https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Space is the Place: Effects of Continuous Spatial Structure on Analysis of Population Genetic Data]]> https://www.researchpad.co/article/N56825d15-18d8-4537-b87e-f85c12d5e5b8 Real geography is continuous, but standard models in population genetics are based on discrete, well-mixed populations. As a result, many methods of analyzing genetic data assume that samples are a random draw from a well-mixed population, but are applied to clustered samples from populations that are structured clinally over space. Here, we use simulations of populations living in continuous geography to study the impacts of dispersal and sampling strategy on population genetic summary statistics, demographic inference, and genome-wide association studies (GWAS). We find that most common summary statistics have distributions that differ substantially from those seen in well-mixed populations, especially when Wright’s neighborhood size is < 100 and sampling is spatially clustered. “Stepping-stone” models reproduce some of these effects, but discretizing the landscape introduces artifacts that in some cases are exacerbated at higher resolutions. The combination of low dispersal and clustered sampling causes demographic inference from the site frequency spectrum to infer more turbulent demographic histories, but averaged results across multiple simulations revealed surprisingly little systematic bias. We also show that the combination of spatially autocorrelated environments and limited dispersal causes GWAS to identify spurious signals of genetic association with purely environmentally determined phenotypes, and that this bias is only partially corrected by regressing out principal components of ancestry. Last, we discuss the relevance of our simulation results for inference from genetic variation in real organisms.

]]>
<![CDATA[Complex History and Differentiation Patterns of the t-Haplotype, a Mouse Meiotic Driver]]> https://www.researchpad.co/article/5c3544bfd5eed0c484d8efab

The t-haplotype, a mouse meiotic driver found on chromosome 17, has been a model for autosomal segregation distortion for close to a century, but several questions remain regarding its biology and evolutionary history. A recently published set of population genomics resources for wild mice includes several individuals heterozygous for the t-haplotype, which we use to characterize this selfish element at the genomic and transcriptomic level. Our results show that large sections of the t-haplotype have been replaced by standard homologous sequences, possibly due to occasional events of recombination, and that this complicates the inference of its history. As expected for a long genomic segment of very low recombination, the t-haplotype carries an excess of fixed nonsynonymous mutations compared to the standard chromosome. This excess is stronger for regions that have not undergone recent recombination, suggesting that occasional gene flow between the t and the standard chromosome may provide a mechanism to regenerate coding sequences that have accumulated deleterious mutations. Finally, we find that t-complex genes with altered expression largely overlap with deleted or amplified regions, and that carrying a t-haplotype alters the testis expression of genes outside of the t-complex, providing new leads into the pathways involved in the biology of this segregation distorter.

]]>
<![CDATA[Performing Parentage Analysis in the Presence of Inbreeding and Null Alleles]]> https://www.researchpad.co/article/5c22c6f8d5eed0c484aa2f9d

Parentage analysis is an important method that is used widely in zoological and ecological studies. Current mathematical models of parentage analyses usually assume that a population has a uniform genetic structure and that mating is panmictic. In a natural population, the geographic or social structure of a population, and/or nonrandom mating, usually leads to a genetic structure and results in genotypic frequencies deviating from those expected under the Hardy-Weinberg equilibrium (HWE). In addition, in the presence of null alleles, an observed genotype represents one of several possible true genotypes. The true father of a given offspring may thus be erroneously excluded in parentage analyses, or may have a low or negative LOD score. Here, we present a new mathematical model to estimate parentage that includes simultaneously the effects of inbreeding, null alleles, and negative amplification. The influences of these three factors on previous model are evaluated by Monte-Carlo simulations and empirical data, and the performance of our new model is compared under controlled conditions. We found that, for both simulated and empirical data, our new model outperformed other methods in many situations. We make available our methods in a new, free software package entitled parentage. This can be downloaded via http://github.com/huangkang1987/parentage.

]]>
<![CDATA[Exploring Evolutionary Relationships Across the Genome Using Topology Weighting]]> https://www.researchpad.co/article/5bfe7b1cd5eed0c48493557a

We introduce the concept of topology weighting, a method for quantifying relationships between taxa that are not necessarily monophyletic, and visualizing how these relationships change across the genome. A given set of taxa can be related in a limited number of ways, but if each taxon is represented by multiple sequences, the number of possible topologies becomes very large. Topology weighting reduces this complexity by quantifying the contribution of each taxon topology to the full tree. We describe our method for topology weighting by iterative sampling of subtrees (Twisst), and test it on both simulated and real genomic data. Overall, we show that this is an informative and versatile approach, suitable for exploring relationships in almost any genomic dataset. Scripts to implement the method described are available at http://github.com/simonhmartin/twisst.

]]>
<![CDATA[The Relationship Between FST and the Frequency of the Most Frequent Allele]]> https://www.researchpad.co/article/5ac97a58463d7e11b1efdbf3

FST is frequently used as a summary of genetic differentiation among groups. It has been suggested that FST depends on the allele frequencies at a locus, as it exhibits a variety of peculiar properties related to genetic diversity: higher values for biallelic single-nucleotide polymorphisms (SNPs) than for multiallelic microsatellites, low values among high-diversity populations viewed as substantially distinct, and low values for populations that differ primarily in their profiles of rare alleles. A full mathematical understanding of the dependence of FST on allele frequencies, however, has been elusive. Here, we examine the relationship between FST and the frequency of the most frequent allele, demonstrating that the range of values that FST can take is restricted considerably by the allele-frequency distribution. For a two-population model, we derive strict bounds on FST as a function of the frequency M of the allele with highest mean frequency between the pair of populations. Using these bounds, we show that for a value of M chosen uniformly between 0 and 1 at a multiallelic locus whose number of alleles is left unspecified, the mean maximum FST is ∼0.3585. Further, FST is restricted to values much less than 1 when M is low or high, and the contribution to the maximum FST made by the most frequent allele is on average ∼0.4485. Using bounds on homozygosity that we have previously derived as functions of M, we describe strict bounds on FST in terms of the homozygosity of the total population, finding that the mean maximum FST given this homozygosity is 1 − ln 2 ≈ 0.3069. Our results provide a conceptual basis for understanding the dependence of FST on allele frequencies and genetic diversity and for interpreting the roles of these quantities in computations of FST from population-genetic data. Further, our analysis suggests that many unusual observations of FST, including the relatively low FST values in high-diversity human populations from Africa and the relatively low estimates of FST for microsatellites compared to SNPs, can be understood not as biological phenomena associated with different groups of populations or classes of markers but rather as consequences of the intrinsic mathematical dependence of FST on the properties of allele-frequency distributions.

]]>
<![CDATA[Population Genetics Inference for Longitudinally-Sampled Mutants Under Strong Selection]]> https://www.researchpad.co/article/5bc38ad840307c2419d7d231

Longitudinal allele frequency data are becoming increasingly prevalent. Such samples permit statistical inference of the population genetics parameters that influence the fate of mutant variants. To infer these parameters by maximum likelihood, the mutant frequency is often assumed to evolve according to the Wright–Fisher model. For computational reasons, this discrete model is commonly approximated by a diffusion process that requires the assumption that the forces of natural selection and mutation are weak. This assumption is not always appropriate. For example, mutations that impart drug resistance in pathogens may evolve under strong selective pressure. Here, we present an alternative approximation to the mutant-frequency distribution that does not make any assumptions about the magnitude of selection or mutation and is much more computationally efficient than the standard diffusion approximation. Simulation studies are used to compare the performance of our method to that of the Wright–Fisher and Gaussian diffusion approximations. For large populations, our method is found to provide a much better approximation to the mutant-frequency distribution when selection is strong, while all three methods perform comparably when selection is weak. Importantly, maximum-likelihood estimates of the selection coefficient are severely attenuated when selection is strong under the two diffusion models, but not when our method is used. This is further demonstrated with an application to mutant-frequency data from an experimental study of bacteriophage evolution. We therefore recommend our method for estimating the selection coefficient when the effective population size is too large to utilize the discrete Wright–Fisher model.

]]>
<![CDATA[MicroRNAs Influence Reproductive Responses by Females to Male Sex Peptide in Drosophila melanogaster]]> https://www.researchpad.co/article/5adb18e8463d7e61f0a22fd7

Across taxa, female behavior and physiology change significantly following the receipt of ejaculate molecules during mating. For example, receipt of sex peptide (SP) in female Drosophila melanogaster significantly alters female receptivity, egg production, lifespan, hormone levels, immunity, sleep, and feeding patterns. These changes are underpinned by distinct tissue- and time-specific changes in diverse sets of mRNAs. However, little is yet known about the regulation of these gene expression changes, and hence the potential role of microRNAs (miRNAs), in female postmating responses. A preliminary screen of genomic responses in females to receipt of SP suggested that there were changes in the expression of several miRNAs. Here we tested directly whether females lacking four of the candidate miRNAs highlighted (miR-279, miR-317, miR-278, and miR-184) showed altered fecundity, receptivity, and lifespan responses to receipt of SP, when mated once or continually to SP null or control males. The results showed that miRNA-lacking females mated to SP null males exhibited altered receptivity, but not reproductive output, in comparison to controls. However, these effects interacted significantly with the genetic background of the miRNA-lacking females. No significant survival effects were observed in miRNA-lacking females housed continually with SP null or control males. However, continual exposure to control males that transferred SP resulted in significantly higher variation in miRNA-lacking female lifespan than did continual exposure to SP null males. The results provide the first insight into the effects and importance of miRNAs in regulating postmating responses in females.

]]>
<![CDATA[A Maximum-Likelihood Method to Correct for Allelic Dropout in Microsatellite Data with No Replicate Genotypes]]> https://www.researchpad.co/article/5acbcdd9463d7e2b3df485a2

Allelic dropout is a commonly observed source of missing data in microsatellite genotypes, in which one or both allelic copies at a locus fail to be amplified by the polymerase chain reaction. Especially for samples with poor DNA quality, this problem causes a downward bias in estimates of observed heterozygosity and an upward bias in estimates of inbreeding, owing to mistaken classifications of heterozygotes as homozygotes when one of the two copies drops out. One general approach for avoiding allelic dropout involves repeated genotyping of homozygous loci to minimize the effects of experimental error. Existing computational alternatives often require replicate genotyping as well. These approaches, however, are costly and are suitable only when enough DNA is available for repeated genotyping. In this study, we propose a maximum-likelihood approach together with an expectation-maximization algorithm to jointly estimate allelic dropout rates and allele frequencies when only one set of nonreplicated genotypes is available. Our method considers estimates of allelic dropout caused by both sample-specific factors and locus-specific factors, and it allows for deviation from Hardy–Weinberg equilibrium owing to inbreeding. Using the estimated parameters, we correct the bias in the estimation of observed heterozygosity through the use of multiple imputations of alleles in cases where dropout might have occurred. With simulated data, we show that our method can (1) effectively reproduce patterns of missing data and heterozygosity observed in real data; (2) correctly estimate model parameters, including sample-specific dropout rates, locus-specific dropout rates, and the inbreeding coefficient; and (3) successfully correct the downward bias in estimating the observed heterozygosity. We find that our method is fairly robust to violations of model assumptions caused by population structure and by genotyping errors from sources other than allelic dropout. Because the data sets imputed under our model can be investigated in additional subsequent analyses, our method will be useful for preparing data for applications in diverse contexts in population genetics and molecular ecology.

]]>
<![CDATA[The Recombination Landscape in Wild House Mice Inferred Using Population Genomic Data]]> https://www.researchpad.co/article/5b43165e463d7e2266e75d2c

Characterizing variation in the rate of recombination across the genome is important for understanding several evolutionary processes. Previous analysis of the recombination landscape in laboratory mice has revealed that the different subspecies have different suites of recombination hotspots. It is unknown, however, whether hotspots identified in laboratory strains reflect the hotspot diversity of natural populations or whether broad-scale variation in the rate of recombination is conserved between subspecies. In this study, we constructed fine-scale recombination rate maps for a natural population of the Eastern house mouse, Mus musculus castaneus. We performed simulations to assess the accuracy of recombination rate inference in the presence of phase errors, and we used a novel approach to quantify phase error. The spatial distribution of recombination events is strongly positively correlated between our castaneus map, and a map constructed using inbred lines derived predominantly from M. m. domesticus. Recombination hotspots in wild castaneus show little overlap, however, with the locations of double-strand breaks in wild-derived house mouse strains. Finally, we also find that genetic diversity in M. m. castaneus is positively correlated with the rate of recombination, consistent with pervasive natural selection operating in the genome. Our study suggests that recombination rate variation is conserved at broad scales between house mouse subspecies, but it is not strongly conserved at fine scales.

]]>