ResearchPad - genome-evolution https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Comparative analysis of plastid genomes within the Campanulaceae and phylogenetic implications]]> https://www.researchpad.co/article/elastic_article_14639 The conflicts exist between the phylogeny of Campanulaceae based on nuclear ITS sequence and plastid markers, particularly in the subdivision of Cyanantheae (Campanulaceae). Besides, various and complicated plastid genome structures can be found in species of the Campanulaceae. However, limited availability of genomic information largely hinders the studies of molecular evolution and phylogeny of Campanulaceae. We reported the complete plastid genomes of three Cyanantheae species, compared them to eight published Campanulaceae plastomes, and shed light on a deeper understanding of the applicability of plastomes. We found that there were obvious differences among gene order, GC content, gene compositions and IR junctions of LSC/IRa. Almost all protein-coding genes and amino acid sequences showed obvious codon preferences. We identified 14 genes with highly positively selected sites and branch-site model displayed 96 sites under potentially positive selection on the three lineages of phylogenetic tree. Phylogenetic analyses showed that Cyananthus was more closely related to Codonopsis compared with Cyclocodon and also clearly illustrated the relationship among the Cyanantheae species. We also found six coding regions having high nucleotide divergence value. Hotpot regions were considered to be useful molecular markers for resolving phylogenetic relationships and species authentication of Campanulaceae.

]]>
<![CDATA[Chloroplast genomes of Rubiaceae: Comparative genomics and molecular phylogeny in subfamily Ixoroideae]]> https://www.researchpad.co/article/elastic_article_11231 In Rubiaceae phylogenetics, the number of markers often proved a limitation with authors failing to provide well-supported trees at tribal and generic levels. A robust phylogeny is a prerequisite to study the evolutionary patterns of traits at different taxonomic levels. Advances in next-generation sequencing technologies have revolutionized biology by providing, at reduced cost, huge amounts of data for an increased number of species. Due to their highly conserved structure, generally recombination-free, and mostly uniparental inheritance, chloroplast DNA sequences have long been used as choice markers for plant phylogeny reconstruction. The main objectives of this study are: 1) to gain insight in chloroplast genome evolution in the Rubiaceae (Ixoroideae) through efficient methodology for de novo assembly of plastid genomes; and, 2) to test the efficiency of mining SNPs in the nuclear genome of Ixoroideae based on the use of a coffee reference genome to produce well-supported nuclear trees. We assembled whole chloroplast genome sequences for 27 species of the Rubiaceae subfamily Ixoroideae using next-generation sequences. Analysis of the plastid genome structure reveals a relatively good conservation of gene content and order. Generally, low variation was observed between taxa in the boundary regions with the exception of the inverted repeat at both the large and short single copy junctions for some taxa. An average of 79% of the SNP determined in the Coffea genus are transferable to Ixoroideae, with variation ranging from 35% to 96%. In general, the plastid and the nuclear genome phylogenies are congruent with each other. They are well-resolved with well-supported branches. Generally, the tribes form well-identified clades but the tribe Sherbournieae is shown to be polyphyletic. The results are discussed relative to the methodology used and the chloroplast genome features in Rubiaceae and compared to previous Rubiaceae phylogenies.

]]>
<![CDATA[The draft mitochondrial genome of Magnolia biondii and mitochondrial phylogenomics of angiosperms]]> https://www.researchpad.co/article/N1f661d3e-d0c0-407e-92c0-bb72cd78029d

The mitochondrial genomes of flowering plants are well known for their large size, variable coding-gene set and fluid genome structure. The available mitochondrial genomes of the early angiosperms show extreme genetic diversity in genome size, structure, and sequences, such as rampant HGTs in Amborella mt genome, numerous repeated sequences in Nymphaea mt genome, and conserved gene evolution in Liriodendron mt genome. However, currently available early angiosperm mt genomes are still limited, hampering us from obtaining an overall picture of the mitogenomic evolution in angiosperms. Here we sequenced and assembled the draft mitochondrial genome of Magnolia biondii Pamp. from Magnoliaceae (magnoliids) using Oxford Nanopore sequencing technology. We recovered a single linear mitochondrial contig of 967,100 bp with an average read coverage of 122 × and a GC content of 46.6%. This draft mitochondrial genome contains a rich 64-gene set, similar to those of Liriodendron and Nymphaea, including 41 protein-coding genes, 20 tRNAs, and 3 rRNAs. Twenty cis-spliced and five trans-spliced introns break ten protein-coding genes in the Magnolia mt genome. Repeated sequences account for 27% of the draft genome, with 17 out of the 1,145 repeats showing recombination evidence. Although partially assembled, the approximately 1-Mb mt genome of Magnolia is still among the largest in angiosperms, which is possibly due to the expansion of repeated sequences, retention of ancestral mtDNAs, and the incorporation of nuclear genome sequences. Mitochondrial phylogenomic analysis of the concatenated datasets of 38 conserved protein-coding genes from 91 representatives of angiosperm species supports the sister relationship of magnoliids with monocots and eudicots, which is congruent with plastid evidence.

]]>
<![CDATA[Parallelism in eco-morphology and gene expression despite variable evolutionary and genomic backgrounds in a Holarctic fish]]> https://www.researchpad.co/article/N4fc7d71e-6de4-4251-8df9-22327ccf5952

Understanding the extent to which ecological divergence is repeatable is essential for predicting responses of biodiversity to environmental change. Here we test the predictability of evolution, from genotype to phenotype, by studying parallel evolution in a salmonid fish, Arctic charr (Salvelinus alpinus), across eleven replicate sympatric ecotype pairs (benthivorous-planktivorous and planktivorous-piscivorous) and two evolutionary lineages. We found considerable variability in eco-morphological divergence, with several traits related to foraging (eye diameter, pectoral fin length) being highly parallel even across lineages. This suggests repeated and predictable adaptation to environment. Consistent with ancestral genetic variation, hundreds of loci were associated with ecotype divergence within lineages of which eight were shared across lineages. This shared genetic variation was maintained despite variation in evolutionary histories, ranging from postglacial divergence in sympatry (ca. 10-15kya) to pre-glacial divergence (ca. 20-40kya) with postglacial secondary contact. Transcriptome-wide gene expression (44,102 genes) was highly parallel across replicates, involved biological processes characteristic of ecotype morphology and physiology, and revealed parallelism at the level of regulatory networks. This expression divergence was not only plastic but in part genetically controlled by parallel cis-eQTL. Lastly, we found that the magnitude of phenotypic divergence was largely correlated with the genetic differentiation and gene expression divergence. In contrast, the direction of phenotypic change was mostly determined by the interplay of adaptive genetic variation, gene expression, and ecosystem size. Ecosystem size further explained variation in putatively adaptive, ecotype-associated genomic patterns within and across lineages, highlighting the role of environmental variation and stochasticity in parallel evolution. Together, our findings demonstrate the parallel evolution of eco-morphology and gene expression within and across evolutionary lineages, which is controlled by the interplay of environmental stochasticity and evolutionary contingencies, largely overcoming variable evolutionary histories and genomic backgrounds.

]]>
<![CDATA[Designing and running an advanced Bioinformatics and genome analyses course in Tunisia]]> https://www.researchpad.co/article/5c58d660d5eed0c484031d37

Genome data, with underlying new knowledge, are accumulating at exponential rate thanks to ever-improving sequencing technologies and the parallel development of dedicated efficient Bioinformatics methods and tools. Advanced Education in Bioinformatics and Genome Analyses is to a large extent not accessible to students in developing countries where endeavors to set up Bioinformatics courses concern most often only basic levels. Here, we report a pioneering pilot experience concerning the design and implementation, from scratch, of a three-months advanced and extensive course in Bioinformatics and Genome Analyses in the Institut Pasteur de Tunis. Most significantly the outcome of the course was upgrading the participants’ skills in Bioinformatics and Genome Analyses to recognized international standards. Here we detail the different steps involved in the implementation of this course as well as the topics covered in the program. The description of this pilot experience might be helpful for the implementation of other similar educational projects, notably in developing countries, aiming to go beyond basics and providing young researchers with high-level skills.

]]>
<![CDATA[Detecting useful genetic markers and reconstructing the phylogeny of an important medicinal resource plant, Artemisia selengensis, based on chloroplast genomics]]> https://www.researchpad.co/article/5c61e90ed5eed0c48496f746

Artemisia selengenesis is not only a health food, but also a well-known traditional Chinese medicine. Only a fraction of the chloroplast (cp) genome data of Artemisia has been reported and chloroplast genomic materials have been widely used in genomic evolution studies, molecular marker development, and phylogenetic analysis of the genus Artemisia, which makes evolutionary studies, genetic improvement, and phylogenetic identification very difficult. In this study, the complete chloroplast genome of A. selengensis was compared with that of other species within Artemisia and phylogenetic analyses was conducted with other genera in the Asteraceae family. The results showed that A. selengensis is an AT-rich species and has a typical quadripartite structure that is 151,215 bp in length. Comparative genome analyses demonstrated that the available chloroplast genomes of species of Artemisia were well conserved in terms of genomic length, GC contents, and gene organization and order. However, some differences, which may indicate evolutionary events, were found, such as a re-inversion event within the Artemisia genus, an unequal duplicate phenomenon of the ycf1 gene because of the expansion and contraction of the IR region, and the fast-evolving regions. Repeated sequences analysis showed that Artemisia chloroplast genomes presented a highly similar pattern of SSR or LDR distribution. A total of 257 SSRs and 42 LDRs were identified in the A. selengensis chloroplast genome. The phylogenetic analysis showed that A. selengensis was sister to A. gmelinii. The findings of this study will be valuable in further studies to understand the genetic diversity and evolutionary history of Asteraceae.

]]>
<![CDATA[A likelihood approach to testing hypotheses on the co-evolution of epigenome and genome]]> https://www.researchpad.co/article/5c2d2ebfd5eed0c484d9b67f

Central questions to epigenome evolution include whether interspecies changes of histone modifications are independent of evolutionary changes of DNA, and if there is dependence whether they depend on any specific types of DNA sequence changes. Here, we present a likelihood approach for testing hypotheses on the co-evolution of genome and histone modifications. The gist of this approach is to convert evolutionary biology hypotheses into probabilistic forms, by explicitly expressing the joint probability of multispecies DNA sequences and histone modifications, which we refer to as a class of Joint Evolutionary Model for the Genome and the Epigenome (JEMGE). JEMGE can be summarized as a mixture model of four components representing four evolutionary hypotheses, namely dependence and independence of interspecies epigenomic variations to underlying sequence substitutions and to underlying sequence insertions and deletions (indels). We implemented a maximum likelihood method to fit the models to the data. Based on comparison of likelihoods, we inferred whether interspecies epigenomic variations depended on substitution or indels in local genomic sequences based on DNase hypersensitivity and spermatid H3K4me3 ChIP-seq data from human and rhesus macaque. Approximately 5.5% of homologous regions in the genomes exhibited H3K4me3 modification in either species, among which approximately 67% homologous regions exhibited local-sequence-dependent interspecies H3K4me3 variations. Substitutions accounted for less local-sequence-dependent H3K4me3 variations than indels. Among transposon-mediated indels, ERV1 insertions and L1 insertions were most strongly associated with H3K4me3 gains and losses, respectively. By initiating probabilistic formulation on the co-evolution of genomes and epigenomes, JEMGE helps to bring evolutionary biology principles to comparative epigenomic studies.

]]>
<![CDATA[Homology and linkage in crossover for linear genomes of variable length]]> https://www.researchpad.co/article/5c37b7bdd5eed0c484490b2f

The use of variable-length genomes in evolutionary computation has applications in optimisation when the size of the search space is unknown, and provides a unique environment to study the evolutionary dynamics of genome structure. Here, we revisit crossover for linear genomes of variable length, identifying two crucial attributes of successful recombination algorithms: the ability to retain homologous structure, and to reshuffle variant information. We introduce direct measures of these properties—homology score and linkage score—and use them to review existing crossover algorithms, as well as two novel ones. In addition, we measure the performance of these crossover methods on three different benchmark problems, and find that variable-length genomes out-perform fixed-length variants in all three cases. Our homology and linkage scores successfully explain the difference in performance between different crossover methods, providing a simple and insightful framework for crossover in a variable-length setting.

]]>
<![CDATA[Microtubules in Bacteria: Ancient Tubulins Build a Five-Protofilament Homolog of the Eukaryotic Cytoskeleton]]> https://www.researchpad.co/article/5989db0fab0ee8fa60bcbb67

The unequivocal identification of microtubules in bacteria throws light on the evolution of modern eukaryotic microtubules from a primordial structure.

]]>
<![CDATA[Nucleosomes Shape DNA Polymorphism and Divergence]]> https://www.researchpad.co/article/5989da16ab0ee8fa60b7b2e6

An estimated 80% of genomic DNA in eukaryotes is packaged as nucleosomes, which, together with the remaining interstitial linker regions, generate higher order chromatin structures [1]. Nucleosome sequences isolated from diverse organisms exhibit ∼10 bp periodic variations in AA, TT and GC dinucleotide frequencies. These sequence elements generate intrinsically curved DNA and help establish the histone-DNA interface. We investigated an important unanswered question concerning the interplay between chromatin organization and genome evolution: do the DNA sequence preferences inherent to the highly conserved histone core exert detectable natural selection on genomic divergence and polymorphism? To address this hypothesis, we isolated nucleosomal DNA sequences from Drosophila melanogaster embryos and examined the underlying genomic variation within and between species. We found that divergence along the D. melanogaster lineage is periodic across nucleosome regions with base changes following preferred nucleotides, providing new evidence for systematic evolutionary forces in the generation and maintenance of nucleosome-associated dinucleotide periodicities. Further, Single Nucleotide Polymorphism (SNP) frequency spectra show striking periodicities across nucleosomal regions, paralleling divergence patterns. Preferred alleles occur at higher frequencies in natural populations, consistent with a central role for natural selection. These patterns are stronger for nucleosomes in introns than in intergenic regions, suggesting selection is stronger in transcribed regions where nucleosomes undergo more displacement, remodeling and functional modification. In addition, we observe a large-scale (∼180 bp) periodic enrichment of AA/TT dinucleotides associated with nucleosome occupancy, while GC dinucleotide frequency peaks in linker regions. Divergence and polymorphism data also support a role for natural selection in the generation and maintenance of these super-nucleosomal patterns. Our results demonstrate that nucleosome-associated sequence periodicities are under selective pressure, implying that structural interactions between nucleosomes and DNA sequence shape sequence evolution, particularly in introns.

]]>
<![CDATA[Clonal Architecture of Secondary Acute Myeloid Leukemia Defined by Single-Cell Sequencing]]> https://www.researchpad.co/article/5989da79ab0ee8fa60b97da4

Next-generation sequencing has been used to infer the clonality of heterogeneous tumor samples. These analyses yield specific predictions—the population frequency of individual clones, their genetic composition, and their evolutionary relationships—which we set out to test by sequencing individual cells from three subjects diagnosed with secondary acute myeloid leukemia, each of whom had been previously characterized by whole genome sequencing of unfractionated tumor samples. Single-cell mutation profiling strongly supported the clonal architecture implied by the analysis of bulk material. In addition, it resolved the clonal assignment of single nucleotide variants that had been initially ambiguous and identified areas of previously unappreciated complexity. Accordingly, we find that many of the key assumptions underlying the analysis of tumor clonality by deep sequencing of unfractionated material are valid. Furthermore, we illustrate a single-cell sequencing strategy for interrogating the clonal relationships among known variants that is cost-effective, scalable, and adaptable to the analysis of both hematopoietic and solid tumors, or any heterogeneous population of cells.

]]>
<![CDATA[An Immunity-Triggering Effector from the Barley Smut Fungus Ustilago hordei Resides in an Ustilaginaceae-Specific Cluster Bearing Signs of Transposable Element-Assisted Evolution]]> https://www.researchpad.co/article/5989daa7ab0ee8fa60ba8115

The basidiomycete smut fungus Ustilago hordei was previously shown to comprise isolates that are avirulent on various barley host cultivars. Through genetic crosses we had revealed that a dominant avirulence locus UhAvr1 which triggers immunity in barley cultivar Hannchen harboring resistance gene Ruh1, resided within an 80-kb region. DNA sequence analysis of this genetically delimited region uncovered the presence of 7 candidate secreted effector proteins. Sequence comparison of their coding sequences among virulent and avirulent parental and field isolates could not distinguish UhAvr1 candidates. Systematic deletion and complementation analyses revealed that UhAvr1 is UHOR_10022 which codes for a small effector protein of 171 amino acids with a predicted 19 amino acid signal peptide. Virulence in the parental isolate is caused by the insertion of a fragment of 5.5 kb with similarity to a common U. hordei transposable element (TE), interrupting the promoter of UhAvr1 and thereby changing expression and hence recognition of UhAVR1p. This rearrangement is likely caused by activities of TEs and variation is seen among isolates. Using GFP-chimeric constructs we show that UhAvr1 is induced only in mated dikaryotic hyphae upon sensing and infecting barley coleoptile cells. When infecting Hannchen, UhAVR1p causes local callose deposition and the production of reactive oxygen species and necrosis indicative of the immune response. UhAvr1 does not contribute significantly to overall virulence. UhAvr1 is located in a cluster of ten effectors with several paralogs and over 50% of TEs. This cluster is syntenous with clusters in closely-related U. maydis and Sporisorium reilianum. In these corn-infecting species, these clusters harbor however more and further diversified homologous effector families but very few TEs. This increased variability may have resulted from past selection pressure by resistance genes since U. maydis is not known to trigger immunity in its corn host.

]]>
<![CDATA[8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage]]> https://www.researchpad.co/article/5989db01ab0ee8fa60bc6d2b

Ten years on from the finishing of the human reference genome sequence, it remains unclear what fraction of the human genome confers function, where this sequence resides, and how much is shared with other mammalian species. When addressing these questions, functional sequence has often been equated with pan-mammalian conserved sequence. However, functional elements that are short-lived, including those contributing to species-specific biology, will not leave a footprint of long-lasting negative selection. Here, we address these issues by identifying and characterising sequence that has been constrained with respect to insertions and deletions for pairs of eutherian genomes over a range of divergences. Within noncoding sequence, we find increasing amounts of mutually constrained sequence as species pairs become more closely related, indicating that noncoding constrained sequence turns over rapidly. We estimate that half of present-day noncoding constrained sequence has been gained or lost in approximately the last 130 million years (half-life in units of divergence time, d1/2 = 0.25–0.31). While enriched with ENCODE biochemical annotations, much of the short-lived constrained sequences we identify are not detected by models optimized for wider pan-mammalian conservation. Constrained DNase 1 hypersensitivity sites, promoters and untranslated regions have been more evolutionarily stable than long noncoding RNA loci which have turned over especially rapidly. By contrast, protein coding sequence has been highly stable, with an estimated half-life of over a billion years (d1/2 = 2.1–5.0). From extrapolations we estimate that 8.2% (7.1–9.2%) of the human genome is presently subject to negative selection and thus is likely to be functional, while only 2.2% has maintained constraint in both human and mouse since these species diverged. These results reveal that the evolutionary history of the human genome has been highly dynamic, particularly for its noncoding yet biologically functional fraction.

]]>
<![CDATA[Comparative Phylogenomics Uncovers the Impact of Symbiotic Associations on Host Genome Evolution]]> https://www.researchpad.co/article/5989db49ab0ee8fa60bd981e

Mutualistic symbioses between eukaryotes and beneficial microorganisms of their microbiome play an essential role in nutrition, protection against disease, and development of the host. However, the impact of beneficial symbionts on the evolution of host genomes remains poorly characterized. Here we used the independent loss of the most widespread plant–microbe symbiosis, arbuscular mycorrhization (AM), as a model to address this question. Using a large phenotypic approach and phylogenetic analyses, we present evidence that loss of AM symbiosis correlates with the loss of many symbiotic genes in the Arabidopsis lineage (Brassicales). Then, by analyzing the genome and/or transcriptomes of nine other phylogenetically divergent non-host plants, we show that this correlation occurred in a convergent manner in four additional plant lineages, demonstrating the existence of an evolutionary pattern specific to symbiotic genes. Finally, we use a global comparative phylogenomic approach to track this evolutionary pattern among land plants. Based on this approach, we identify a set of 174 highly conserved genes and demonstrate enrichment in symbiosis-related genes. Our findings are consistent with the hypothesis that beneficial symbionts maintain purifying selection on host gene networks during the evolution of entire lineages.

]]>
<![CDATA[Eukaryotic Evolutionary Transitions Are Associated with Extreme Codon Bias in Functionally-Related Proteins]]> https://www.researchpad.co/article/5989d9ecab0ee8fa60b6cc78

Codon bias in the genome of an organism influences its phenome by changing the speed and efficiency of mRNA translation and hence protein abundance. We hypothesized that differences in codon bias, either between-species differences in orthologous genes, or within-species differences between genes, may play an evolutionary role. To explore this hypothesis, we compared the genome-wide codon bias in six species that occupy vital positions in the Eukaryotic Tree of Life. We acquired the entire protein coding sequences for these organisms, computed the codon bias for all genes in each organism and explored the output for relationships between codon bias and protein function, both within- and between-lineages. We discovered five notable coordinated patterns, with extreme codon bias most pronounced in traits considered highly characteristic of a given lineage. Firstly, the Homo sapiens genome had stronger codon bias for DNA-binding transcription factors than the Saccharomyces cerevisiae genome, whereas the opposite was true for ribosomal proteins – perhaps underscoring transcriptional regulation in the origin of complexity. Secondly, both mammalian species examined possessed extreme codon bias in genes relating to hair – a tissue unique to mammals. Thirdly, Arabidopsis thaliana showed extreme codon bias in genes implicated in cell wall formation and chloroplast function – which are unique to plants. Fourthly, Gallus gallus possessed strong codon bias in a subset of genes encoding mitochondrial proteins – perhaps reflecting the enhanced bioenergetic efficiency in birds that co-evolved with flight. And lastly, the G. gallus genome had extreme codon bias for the Ciliary Neurotrophic Factor – which may help to explain their spontaneous recovery from deafness. We propose that extreme codon bias in groups of genes that encode functionally related proteins has a pathway-level energetic explanation.

]]>
<![CDATA[Molecular Evolution and Spatial Transmission of Severe Fever with Thrombocytopenia Syndrome Virus Based on Complete Genome Sequences]]> https://www.researchpad.co/article/5989da8bab0ee8fa60b9e0be

Severe fever with thrombocytopenia syndrome virus (SFTSV) was a novel tick-borne bunyavirus that caused hemorrhagic fever with a high fatality rate in East Asia. In this study we analyzed the complete genome sequences of 122 SFTSV strains to determine the phylogeny, evolution and reassortment of the virus. We revealed that the evolutionary rate of three genome segments were different, with highest in the S segment and lowest in the L segment. The SFTSV strains were phylogenetically classified into 5 lineages (A, B, C, D and E) with each genome segment. SFTSV strains from China were classified in all 5 lineages, strains from South Korea were classified into 3 lineages (A, D, and E), and all strains from Japan were classified in only linage E. Using the average evolutionary rate of the three genome segments, we found that the extant SFTSV originated 20–87 years ago in the Dabie Mountain area in central China. The viruses were then transmitted to other areas of China, Japan and South Korea. We also found that six SFTSV strains were reassortants. Selection pressure analysis suggested that SFTSV was under purifying selection according to the four genes (RNA-dependent RNA polymerase, glycoprotein, nucleocapsid protein, non-structural protein), and two sites (37, 1033) of glycoproteins were identified as being under strong positive selection. We concluded that SFTSV originated in central China and spread to other places recently and the virus was under purifying selection with high frequency of reassortment.

]]>
<![CDATA[The Genome Sequence of the Leaf-Cutter Ant Atta cephalotes Reveals Insights into Its Obligate Symbiotic Lifestyle]]> https://www.researchpad.co/article/5989dadeab0ee8fa60bbaf02

Leaf-cutter ants are one of the most important herbivorous insects in the Neotropics, harvesting vast quantities of fresh leaf material. The ants use leaves to cultivate a fungus that serves as the colony's primary food source. This obligate ant-fungus mutualism is one of the few occurrences of farming by non-humans and likely facilitated the formation of their massive colonies. Mature leaf-cutter ant colonies contain millions of workers ranging in size from small garden tenders to large soldiers, resulting in one of the most complex polymorphic caste systems within ants. To begin uncovering the genomic underpinnings of this system, we sequenced the genome of Atta cephalotes using 454 pyrosequencing. One prediction from this ant's lifestyle is that it has undergone genetic modifications that reflect its obligate dependence on the fungus for nutrients. Analysis of this genome sequence is consistent with this hypothesis, as we find evidence for reductions in genes related to nutrient acquisition. These include extensive reductions in serine proteases (which are likely unnecessary because proteolysis is not a primary mechanism used to process nutrients obtained from the fungus), a loss of genes involved in arginine biosynthesis (suggesting that this amino acid is obtained from the fungus), and the absence of a hexamerin (which sequesters amino acids during larval development in other insects). Following recent reports of genome sequences from other insects that engage in symbioses with beneficial microbes, the A. cephalotes genome provides new insights into the symbiotic lifestyle of this ant and advances our understanding of host–microbe symbioses.

]]>
<![CDATA[Selective Constraint on the Upstream Open Reading Frames That Overlap with Coding Sequences in Animals]]> https://www.researchpad.co/article/5989d9f1ab0ee8fa60b6e77c

Upstream open reading frames (uORFs) are translational regulatory elements located in 5′ untranslated regions. They can significantly repress the translation of the downstream coding sequences (CDS), and participate in the spatio-temporal regulations of protein translation. Notwithstanding this biological significance, the selective constraint on uORFs remains underexplored. Particularly, the uORFs that partially overlap with CDS with a different reading frame (overlapping uORFs, or “VuORFs”) may lead to strong translational inhibition or N-terminal truncation of the peptides encoded by the affected CDS. By analyzing VuORF-containing transcripts (designated as “VuORF transcripts”) in human, mouse, and fruit fly, we demonstrate that VuORFs are in general slightly deleterious - the proportion of genes that encode at least one VuORF transcript is significantly smaller than expected in all of the three examined species. In addition, this proportion is significantly smaller in fruit fly than in mammals, indicating a higher efficiency of removing VuORFs in the former species because of its larger effective population size. Furthermore, the deleterious effect of a VuORF depends on the sequence context of its start codon (VuAUG). VuORFs with an optimal VuAUG context are more strongly disfavored than those with a suboptimal context in all of the three examined species. And the propensity to remove optimal-context VuAUGs is stronger in fruit fly than in mammals. Intriguingly, however, the currently observable optimal-context VuAUGs (but not suboptimal-context VuAUGs) are more conserved than expected. These observations suggest that the regulatory functions of VuORFs may have been gained fortuitously in organisms with a small effective population size because the slightly deleterious effect of these elements can be better tolerated in these organisms, thus allowing opportunities for the development of novel biological functions. Nevertheless, once the functions of VuORFs were established, they became subject to negative selection.

]]>
<![CDATA[Mutagen-Specific Mutation Signature Determines Global microRNA Binding]]> https://www.researchpad.co/article/5989da25ab0ee8fa60b80686

Micro-RNAs (miRNAs) are small non-coding RNAs that regulate gene products at the post-transcriptional level. It is thought that loss of cell regulation by miRNAs supports cancer development. Based on whole genome sequencing of a melanoma tumor, we predict, using three different computational algorithms, that the melanoma somatic mutations globally reduce binding of miRNAs to the mutated 3′UTRs. This phenomenon reflects the nature of the characteristic UV-induced mutation, C-to-T. Furthermore, we show that seed regions are enriched with Guanine, thus rendering miRNAs prone to reduced binding to UV-mutated 3′UTRs. Accordingly, mutation patterns in non UV-induced malignancies e.g. lung cancer and leukemia do not yield similar predictions. It is suggested that UV-induced disruption of miRNA-mediated gene regulation plays a carcinogenic role. Remarkably, dark-skinned populations have significantly higher GC content in 3′UTR SNPs than light-skinned populations, which implies on evolutionary pressure to preserve regulation by trans-acting oligonucleotides under conditions with excess UV radiation.

]]>
<![CDATA[Coevolution in RNA Molecules Driven by Selective Constraints: Evidence from 5S rRNA]]> https://www.researchpad.co/article/5989d9e0ab0ee8fa60b6945b

Understanding intra-molecular coevolution helps to elucidate various structural and functional constraints acting on molecules and might have practical applications in predicting molecular structure and interactions. In this study, we used 5S rRNA as a template to investigate how selective constraints have shaped the RNA evolution. We have observed the nonrandom occurrence of paired differences along the phylogenetic trees, the high rate of compensatory evolution, and the high TIR scores (the ratio of the numbers of terminal to intermediate states), all of which indicate that significant positive selection has driven the evolution of 5S rRNA. We found three mechanisms of compensatory evolution: Watson-Crick interaction (the primary one), complex interactions between multiple sites within a stem, and interplay of stems and loops. Coevolutionary interactions between sites were observed to be highly dependent on the structural and functional environment in which they occurred. Coevolution occurred mostly in those sites closest to loops or bulges within structurally or functionally important helices, which may be under weaker selective constraints than other stem positions. Breaking these pairs would directly increase the size of the adjoining loop or bulge, causing a partial or total structural rearrangement. In conclusion, our results indicate that sequence coevolution is a direct result of maintaining optimal structural and functional integrity.

]]>