ResearchPad - gene-prediction https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Mitochondrial genome sequence of <i>Phytophthora sansomeana</i> and comparative analysis of <i>Phytophthora</i> mitochondrial genomes]]> https://www.researchpad.co/article/elastic_article_14567 Phytophthora sansomeana infects soybean and causes root rot. It was recently separated from the species complex P. megasperma sensu lato. In this study, we sequenced and annotated its complete mitochondrial genome and compared it to that of nine other Phytophthora species. The genome was assembled into a circular molecule of 39,618 bp with a 22.03% G+C content. Forty-two protein coding genes, 25 tRNA genes and two rRNA genes were annotated in this genome. The protein coding genes include 14 genes in the respiratory complexes, four ATP synthase genes, 16 ribosomal proteins genes, a tatC translocase gene, six conserved ORFs and a unique orf402. The tRNA genes encode tRNAs for 19 amino acids. Comparison among mitochondrial genomes of 10 Phytophthora species revealed three inversions, each covering multiple genes. These genomes were conserved in gene content with few exceptions. A 3' truncated atp9 gene was found in P. nicotianae. All 10 Phytophthora species, as well as other oomycetes and stramenopiles, lacked tRNA genes for threonine in their mitochondria. Phylogenomic analysis using the mitochondrial genomes supported or enhanced previous findings of the phylogeny of Phytophthora spp.

]]>
<![CDATA[Unusual genome expansion and transcription suppression in ectomycorrhizal Tricholoma matsutake by insertions of transposable elements]]> https://www.researchpad.co/article/Nd7412b83-0508-48a9-959e-b3aa8ede7a25

Genome sequencing of Tricholoma matsutake revealed its unusually large size as 189.0 Mbp, which is a consequence of extraordinarily high transposable element (TE) content. We identified that 702 genes were surrounded by TEs, and 83.2% of these genes were not transcribed at any developmental stage. This observation indicated that the insertion of TEs alters the transcription of the genes neighboring these TEs. Repeat-induced point mutation, such as C to T hypermutation with a bias over “CpG” dinucleotides, was also recognized in this genome, representing a typical defense mechanism against TEs during evolution. Many transcription factor genes were activated in both the primordia and fruiting body stages, which indicates that many regulatory processes are shared during the developmental stages. Small secreted protein genes (<300 aa) were dominantly transcribed in the hyphae, where symbiotic interactions occur with the hosts. Comparative analysis with 37 Agaricomycetes genomes revealed that IstB-like domains (PF01695) were conserved across taxonomically diverse mycorrhizal genomes, where the T. matsutake genome contained four copies of this domain. Three of the IstB-like genes were overexpressed in the hyphae. Similar to other ectomycorrhizal genomes, the CAZyme gene set was reduced in T. matsutake, including losses in the glycoside hydrolase genes. The T. matsutake genome sequence provides insight into the causes and consequences of genome size inflation.

]]>
<![CDATA[Protein composition of the occlusion bodies of Epinotia aporema granulovirus]]> https://www.researchpad.co/article/5c6c75e6d5eed0c4843d0423

Within family Baculoviridae, members of the Betabaculovirus genus are employed as biocontrol agents against lepidopteran pests, either alone or in combination with selected members of the Alphabaculovirus genus. Epinotia aporema granulovirus (EpapGV) is a fast killing betabaculovirus that infects the bean shoot borer (E. aporema) and is a promising biopesticide. Because occlusion bodies (OBs) play a key role in baculovirus horizontal transmission, we investigated the composition of EpapGV OBs. Using mass spectrometry-based proteomics we could identify 56 proteins that are included in the OBs during the final stages of larval infection. Our data provides experimental validation of several annotated hypothetical coding sequences. Proteogenomic mapping against genomic sequence detected a previously unannotated ac110-like core gene and a putative translation fusion product of ORFs epap48 and epap49. Comparative studies of the proteomes available for the family Baculoviridae highlight the conservation of core gene products as parts of the occluded virion. Two proteins specific for betabaculoviruses (Epap48 and Epap95) are incorporated into OBs. Moreover, quantification based on emPAI values showed that Epap95 is one of the most abundant components of EpapGV OBs.

]]>
<![CDATA[A data-driven interactome of synergistic genes improves network-based cancer outcome prediction]]> https://www.researchpad.co/article/5c648d3fd5eed0c484c82364

Robustly predicting outcome for cancer patients from gene expression is an important challenge on the road to better personalized treatment. Network-based outcome predictors (NOPs), which considers the cellular wiring diagram in the classification, hold much promise to improve performance, stability and interpretability of identified marker genes. Problematically, reports on the efficacy of NOPs are conflicting and for instance suggest that utilizing random networks performs on par to networks that describe biologically relevant interactions. In this paper we turn the prediction problem around: instead of using a given biological network in the NOP, we aim to identify the network of genes that truly improves outcome prediction. To this end, we propose SyNet, a gene network constructed ab initio from synergistic gene pairs derived from survival-labelled gene expression data. To obtain SyNet, we evaluate synergy for all 69 million pairwise combinations of genes resulting in a network that is specific to the dataset and phenotype under study and can be used to in a NOP model. We evaluated SyNet and 11 other networks on a compendium dataset of >4000 survival-labelled breast cancer samples. For this purpose, we used cross-study validation which more closely emulates real world application of these outcome predictors. We find that SyNet is the only network that truly improves performance, stability and interpretability in several existing NOPs. We show that SyNet overlaps significantly with existing gene networks, and can be confidently predicted (~85% AUC) from graph-topological descriptions of these networks, in particular the breast tissue-specific network. Due to its data-driven nature, SyNet is not biased to well-studied genes and thus facilitates post-hoc interpretation. We find that SyNet is highly enriched for known breast cancer genes and genes related to e.g. histological grade and tamoxifen resistance, suggestive of a role in determining breast cancer outcome.

]]>
<![CDATA[Genomic content of a novel yeast species Hanseniaspora gamundiae sp. nov. from fungal stromata (Cyttaria) associated with a unique fermented beverage in Andean Patagonia, Argentina]]> https://www.researchpad.co/article/5c5b5290d5eed0c4842bcb8e

A novel yeast species was isolated from the sugar-rich stromata of Cyttaria hariotii collected from two different Nothofagus tree species in the Andean forests of Patagonia, Argentina. Phylogenetic analyses of the concatenated sequence of the rRNA gene sequences and the protein-coding genes for actin and translational elongation factor-1α indicated that the novel species belongs to the genus Hanseniaspora. De novo genome assembly of the strain CRUB 1928T yielded a 10.2-Mbp genome assembly predicted to encode 4452 protein-coding genes. The genome sequence data were compared to the genomes of other Hanseniaspora species using three different methods, an alignment-free distance measure, Kr, and two model-based estimations of DNA-DNA homology values, of which all provided indicative values to delineate species of Hanseniaspora. Given its potential role in a rare indigenous alcoholic beverage in which yeasts ferment sugars extracted from the stromata of Cytarria sp., we searched for the genes that may suggest adaptation of novel Hanseniaspora species to fermenting communities. The SSU1-like gene encoding a sulfite efflux pump, which, among Hanseniaspora, is present only in close relatives to the new species, was detected and analyzed, suggesting that this gene might be one factor that characterizes this novel species. We also discuss several candidate genes that likely underlie the physiological traits used for traditional taxonomic identification. Based on these results, a novel yeast species with the name Hanseniaspora gamundiae sp. nov. is proposed with CRUB 1928T (ex-types: ZIM 2545T = NRRL Y-63793T = PYCC 7262T; MycoBank number MB 824091) as the type strain. Furthermore, we propose the transfer of the Kloeckera species, K. hatyaiensis, K. lindneri and K. taiwanica to the genus Hanseniaspora as Hanseniaspora hatyaiensis comb. nov. (MB 828569), Hanseniaspora lindneri comb. nov. (MB 828566) and Hanseniaspora taiwanica comb. nov. (MB 828567).

]]>
<![CDATA[Integrating predicted transcriptome from multiple tissues improves association detection]]> https://www.researchpad.co/article/5c50c43bd5eed0c4845e8359

Integration of genome-wide association studies (GWAS) and expression quantitative trait loci (eQTL) studies is needed to improve our understanding of the biological mechanisms underlying GWAS hits, and our ability to identify therapeutic targets. Gene-level association methods such as PrediXcan can prioritize candidate targets. However, limited eQTL sample sizes and absence of relevant developmental and disease context restrict our ability to detect associations. Here we propose an efficient statistical method (MultiXcan) that leverages the substantial sharing of eQTLs across tissues and contexts to improve our ability to identify potential target genes. MultiXcan integrates evidence across multiple panels using multivariate regression, which naturally takes into account the correlation structure. We apply our method to simulated and real traits from the UK Biobank and show that, in realistic settings, we can detect a larger set of significantly associated genes than using each panel separately. To improve applicability, we developed a summary result-based extension called S-MultiXcan, which we show yields highly concordant results with the individual level version when LD is well matched. Our multivariate model-based approach allowed us to use the individual level results as a gold standard to calibrate and develop a robust implementation of the summary-based extension. Results from our analysis as well as software and necessary resources to apply our method are publicly available.

]]>
<![CDATA[A computational knowledge-base elucidates the response of Staphylococcus aureus to different media types]]> https://www.researchpad.co/article/5c3fa568d5eed0c484ca3f80

S. aureus is classified as a serious threat pathogen and is a priority that guides the discovery and development of new antibiotics. Despite growing knowledge of S. aureus metabolic capabilities, our understanding of its systems-level responses to different media types remains incomplete. Here, we develop a manually reconstructed genome-scale model (GEM-PRO) of metabolism with 3D protein structures for S. aureus USA300 str. JE2 containing 854 genes, 1,440 reactions, 1,327 metabolites and 673 3-dimensional protein structures. Computations were in 85% agreement with gene essentiality data from random barcode transposon site sequencing (RB-TnSeq) and 68% agreement with experimental physiological data. Comparisons of computational predictions with experimental observations highlight: 1) cases of non-essential biomass precursors; 2) metabolic genes subject to transcriptional regulation involved in Staphyloxanthin biosynthesis; 3) the essentiality of purine and amino acid biosynthesis in synthetic physiological media; and 4) a switch to aerobic fermentation upon exposure to extracellular glucose elucidated as a result of integrating time-course of quantitative exo-metabolomics data. An up-to-date GEM-PRO thus serves as a knowledge-based platform to elucidate S. aureus’ metabolic response to its environment.

]]>
<![CDATA[Network-based features enable prediction of essential genes across diverse organisms]]> https://www.researchpad.co/article/5c1c0b06d5eed0c484427271

Machine learning approaches to predict essential genes have gained a lot of traction in recent years. These approaches predominantly make use of sequence and network-based features to predict essential genes. However, the scope of network-based features used by the existing approaches is very narrow. Further, many of these studies focus on predicting essential genes within the same organism, which cannot be readily used to predict essential genes across organisms. Therefore, there is clearly a need for a method that is able to predict essential genes across organisms, by leveraging network-based features. In this study, we extract several sets of network-based features from protein–protein association networks available from the STRING database. Our network features include some common measures of centrality, and also some novel recursive measures recently proposed in social network literature. We extract hundreds of network-based features from networks of 27 diverse organisms to predict the essentiality of 87000+ genes. Our results show that network-based features are statistically significantly better at classifying essential genes across diverse bacterial species, compared to the current state-of-the-art methods, which use mostly sequence and a few ‘conventional’ network-based features. Our diverse set of network properties gave an AUROC of 0.847 and a precision of 0.320 across 27 organisms. When we augmented the complete set of network features with sequence-derived features, we achieved an improved AUROC of 0.857 and a precision of 0.335. We also constructed a reduced set of 100 sequence and network features, which gave a comparable performance. Further, we show that our features are useful for predicting essential genes in new organisms by using leave-one-species-out validation. Our network features capture the local, global and neighbourhood properties of the network and are hence effective for prediction of essential genes across diverse organisms, even in the absence of other complex biological knowledge. Our approach can be readily exploited to predict essentiality for organisms in interactome databases such as the STRING, where both network and sequence are readily available. All codes are available at https://github.com/RamanLab/nbfpeg.

]]>
<![CDATA[RNA virus evasion of nonsense-mediated decay]]> https://www.researchpad.co/article/5bfc6240d5eed0c484ec7b2b

Nonsense-mediated decay (NMD) is a host RNA control pathway that removes aberrant transcripts with long 3’ untranslated regions (UTRs) due to premature termination codons (PTCs) that arise through mutation or defective splicing. To maximize coding potential, RNA viruses often contain internally located stop codons that should also be prime targets for NMD. Using an agroinfiltration-based NMD assay in Nicotiana benthamiana, we identified two segments conferring NMD-resistance in the carmovirus Turnip crinkle virus (TCV) genome. The ribosome readthrough structure just downstream of the TCV p28 termination codon stabilized an NMD-sensitive reporter as did a frameshifting element from umbravirus Pea enation mosaic virus. In addition, a 51-nt unstructured region (USR) at the beginning of the TCV 3’ UTR increased NMD-resistance 3-fold when inserted into an unrelated NMD-sensitive 3’ UTR. Several additional carmovirus 3’ UTRs also conferred varying levels of NMD resistance depending on the construct despite no sequence similarity in the analogous region. Instead, these regions displayed a marked lack of RNA structure immediately following the NMD-targeted stop codon. NMD-resistance was only slightly reduced by conversion of 19 pyrimidines in the USR to purines, but resistance was abolished when a 2-nt mutation was introduced downstream of the USR that substantially increased the secondary structure in the USR through formation of a stable hairpin. The same 2-nt mutation also enhanced the NMD susceptibility of a subgenomic RNA expressed independently of the genomic RNA. The conserved lack of RNA structure among most carmoviruses at the 5’ end of their 3’ UTR could serve to enhance subgenomic RNA stability, which would increase expression of the encoded capsid protein that also functions as the RNA silencing suppressor. These results demonstrate that the TCV genome has features that are inherently NMD-resistant and these strategies could be widespread among RNA viruses and NMD-resistant host mRNAs with long 3’ UTRs.

]]>
<![CDATA[Developmental Transcriptomic Features of the Carcinogenic Liver Fluke, Clonorchis sinensis]]> https://www.researchpad.co/article/5989d9f0ab0ee8fa60b6e20a

Clonorchis sinensis is the causative agent of the life-threatening disease endemic to China, Korea, and Vietnam. It is estimated that about 15 million people are infected with this fluke. C. sinensis provokes inflammation, epithelial hyperplasia, and periductal fibrosis in bile ducts, and may cause cholangiocarcinoma in chronically infected individuals. Accumulation of a large amount of biological information about the adult stage of this liver fluke in recent years has advanced our understanding of the pathological interplay between this parasite and its hosts. However, no developmental gene expression profiles of C. sinensis have been published. In this study, we generated gene expression profiles of three developmental stages of C. sinensis by analyzing expressed sequence tags (ESTs). Complementary DNA libraries were constructed from the adult, metacercaria, and egg developmental stages of C. sinensis. A total of 52,745 ESTs were generated and assembled into 12,830 C. sinensis assembled EST sequences, and then these assemblies were further categorized into groups according to biological functions and developmental stages. Most of the genes that were differentially expressed in the different stages were consistent with the biological and physical features of the particular developmental stage; high energy metabolism, motility and reproduction genes were differentially expressed in adults, minimal metabolism and final host adaptation genes were differentially expressed in metacercariae, and embryonic genes were differentially expressed in eggs. The higher expression of glucose transporters, proteases, and antioxidant enzymes in the adults accounts for active uptake of nutrients and defense against host immune attacks. The types of ion channels present in C. sinensis are consistent with its parasitic nature and phylogenetic placement in the tree of life. We anticipate that the transcriptomic information on essential regulators of development, bile chemotaxis, and physico-metabolic pathways in C. sinensis that presented in this study will guide further studies to identify novel drug targets and diagnostic antigens.

]]>
<![CDATA[Characterization of Chemically Induced Liver Injuries Using Gene Co-Expression Modules]]> https://www.researchpad.co/article/5989daa8ab0ee8fa60ba86f2

Liver injuries due to ingestion or exposure to chemicals and industrial toxicants pose a serious health risk that may be hard to assess due to a lack of non-invasive diagnostic tests. Mapping chemical injuries to organ-specific damage and clinical outcomes via biomarkers or biomarker panels will provide the foundation for highly specific and robust diagnostic tests. Here, we have used DrugMatrix, a toxicogenomics database containing organ-specific gene expression data matched to dose-dependent chemical exposures and adverse clinical pathology assessments in Sprague Dawley rats, to identify groups of co-expressed genes (modules) specific to injury endpoints in the liver. We identified 78 such gene co-expression modules associated with 25 diverse injury endpoints categorized from clinical pathology, organ weight changes, and histopathology. Using gene expression data associated with an injury condition, we showed that these modules exhibited different patterns of activation characteristic of each injury. We further showed that specific module genes mapped to 1) known biochemical pathways associated with liver injuries and 2) clinically used diagnostic tests for liver fibrosis. As such, the gene modules have characteristics of both generalized and specific toxic response pathways. Using these results, we proposed three gene signature sets characteristic of liver fibrosis, steatosis, and general liver injury based on genes from the co-expression modules. Out of all 92 identified genes, 18 (20%) genes have well-documented relationships with liver disease, whereas the rest are novel and have not previously been associated with liver disease. In conclusion, identifying gene co-expression modules associated with chemically induced liver injuries aids in generating testable hypotheses and has the potential to identify putative biomarkers of adverse health effects.

]]>
<![CDATA[The Regulatory Roles of MicroRNA in Effects of 2,2'4,4'-Tetrabromodiphenyl Ether (BDE47) on the Transcriptome of Zebrafish Larvae]]> https://www.researchpad.co/article/5989dad9ab0ee8fa60bb93e2

The developmental neurotoxicity caused by environmental pollutants has received great concern; however, there were still barely known about the underlying toxic mechanisms, especially the influence of varieties of regulatory factors such as microRNA (miRNA). A representative flame retardant, 2,2′,4,4′-tetrabromodiphenyl ether (BDE47), was found to disrupt zebrafish development in visual perception and bone formation in previous study, thus here we investigated its effects on miRNA expression profiling of 6 days post fertilization (dpf) zebrafish larvae by deep sequencing. To overcome the shortage of zebrafish miRNA annotation, multiple data processing approaches, especially constructed network based on the interactions between miRNAs and enrichment terms, were adopted and helped us acquire several validated zebrafish miRNAs and two novel miRNAs in BDE47-induced effects, and identify corresponding biological processes of the miRNAs. Among them, miR-735 was supposed to play essential roles in larval sensory development according to analysis results. Our study also provided an effective strategy for analyzing biological effects on non-mammalian miRNAs with limited basic information.

]]>
<![CDATA[Evaluation of Candidate Genes from Orphan FEB and GEFS+ Loci by Analysis of Human Brain Gene Expression Atlases]]> https://www.researchpad.co/article/5989daedab0ee8fa60bbfc0b

Febrile seizures, or febrile convulsions (FEB), represent the most common form of childhood seizures and are believed to be influenced by variations in several susceptibility genes. Most of the associated loci, however, remain ‘orphan’, i.e. the susceptibility genes they contain still remain to be identified. Further orphan loci have been mapped for a related disorder, genetic (generalized) epilepsy with febrile seizures plus (GEFS+).

We show that both spatially mapped and ‘traditional’ gene expression data from the human brain can be successfully employed to predict the most promising candidate genes for FEB and GEFS+, apply our prediction method to the remaining orphan loci and discuss the validity of the predictions. For several of the orphan FEB/GEFS+ loci we propose excellent, and not always obvious, candidates for mutation screening in order to aid in gaining a better understanding of the genetic origin of the susceptibility to seizures.

]]>
<![CDATA[Genome Analysis of Bacillus amyloliquefaciens Subsp. plantarum UCMB5113: A Rhizobacterium That Improves Plant Growth and Stress Management]]> https://www.researchpad.co/article/5989db00ab0ee8fa60bc678f

The Bacillus amyloliquefaciens subsp. plantarum strain UCMB5113 is a Gram-positive rhizobacterium that can colonize plant roots and stimulate plant growth and defense based on unknown mechanisms. This reinforcement of plants may provide protection to various forms of biotic and abiotic stress. To determine the genetic traits involved in the mechanism of plant-bacteria association, the genome sequence of UCMB5113 was obtained by assembling paired-end Illumina reads. The assembled chromosome of 3,889,532 bp was predicted to encode 3,656 proteins. Genes that potentially contribute to plant growth promotion such as indole-3-acetic acid (IAA) biosynthesis, acetoin synthesis and siderophore production were identified. Moreover, annotation identified putative genes responsible for non-ribosomal synthesis of secondary metabolites and genes supporting environment fitness of UCMB5113 including drug and metal resistance. A large number of genes encoding a diverse set of secretory proteins, enzymes of primary and secondary metabolism and carbohydrate active enzymes were found which reflect a high capacity to degrade various rhizosphere macromolecules. Additionally, many predicted membrane transporters provides the bacterium with efficient uptake capabilities of several nutrients. Although, UCMB5113 has the possibility to produce antibiotics and biosurfactants, the protective effect of plants to pathogens seems to be indirect and due to priming of plant induced systemic resistance. The availability of the genome enables identification of genes and their function underpinning beneficial interactions of UCMB5113 with plants.

]]>
<![CDATA[Genetic Diversity, Morphological Uniformity and Polyketide Production in Dinoflagellates (Amphidinium, Dinoflagellata)]]> https://www.researchpad.co/article/5989dafcab0ee8fa60bc528b

Dinoflagellates are an intriguing group of eukaryotes, showing many unusual morphological and genetic features. Some groups of dinoflagellates are morphologically highly uniform, despite indications of genetic diversity. The species Amphidinium carterae is abundant and cosmopolitan in marine environments, grows easily in culture, and has therefore been used as a ‘model’ dinoflagellate in research into dinoflagellate genetics, polyketide production and photosynthesis. We have investigated the diversity of ‘cryptic’ species of Amphidinium that are morphologically similar to A. carterae, including the very similar species Amphidinium massartii, based on light and electron microscopy, two nuclear gene regions (LSU rDNA and ITS rDNA) and one mitochondrial gene region (cytochrome b). We found that six genetically distinct cryptic species (clades) exist within the species A. massartii and four within A. carterae, and that these clades differ from one another in molecular sequences at levels comparable to other dinoflagellate species, genera or even families. Using primers based on an alignment of alveolate ketosynthase sequences, we isolated partial ketosynthase genes from several Amphidinium species. We compared these genes to known dinoflagellate ketosynthase genes and investigated the evolution and diversity of the strains of Amphidinium that produce them.

]]>
<![CDATA[Exploring Pandora's Box: Potential and Pitfalls of Low Coverage Genome Surveys for Evolutionary Biology]]> https://www.researchpad.co/article/5989d9daab0ee8fa60b6724f

High throughput sequencing technologies are revolutionizing genetic research. With this “rise of the machines”, genomic sequences can be obtained even for unknown genomes within a short time and for reasonable costs. This has enabled evolutionary biologists studying genetically unexplored species to identify molecular markers or genomic regions of interest (e.g. micro- and minisatellites, mitochondrial and nuclear genes) by sequencing only a fraction of the genome. However, when using such datasets from non-model species, it is possible that DNA from non-target contaminant species such as bacteria, viruses, fungi, or other eukaryotic organisms may complicate the interpretation of the results. In this study we analysed 14 genomic pyrosequencing libraries of aquatic non-model taxa from four major evolutionary lineages. We quantified the amount of suitable micro- and minisatellites, mitochondrial genomes, known nuclear genes and transposable elements and searched for contamination from various sources using bioinformatic approaches. Our results show that in all sequence libraries with estimated coverage of about 0.02–25%, many appropriate micro- and minisatellites, mitochondrial gene sequences and nuclear genes from different KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways could be identified and characterized. These can serve as markers for phylogenetic and population genetic analyses. A central finding of our study is that several genomic libraries suffered from different biases owing to non-target DNA or mobile elements. In particular, viruses, bacteria or eukaryote endosymbionts contributed significantly (up to 10%) to some of the libraries analysed. If not identified as such, genetic markers developed from high-throughput sequencing data for non-model organisms may bias evolutionary studies or fail completely in experimental tests. In conclusion, our study demonstrates the enormous potential of low-coverage genome survey sequences and suggests bioinformatic analysis workflows. The results also advise a more sophisticated filtering for problematic sequences and non-target genome sequences prior to developing markers.

]]>
<![CDATA[Linkage Mapping and Comparative Genomics Using Next-Generation RAD Sequencing of a Non-Model Organism]]> https://www.researchpad.co/article/5989da0dab0ee8fa60b785f6

Restriction-site associated DNA (RAD) sequencing is a powerful new method for targeted sequencing across the genomes of many individuals. This approach has broad potential for genetic analysis of non-model organisms including genotype-phenotype association mapping, phylogeography, population genetics and scaffolding genome assemblies through linkage mapping. We constructed a RAD library using genomic DNA from a Plutella xylostella (diamondback moth) backcross that segregated for resistance to the insecticide spinosad. Sequencing of 24 individuals was performed on a single Illumina GAIIx lane (51 base paired-end reads). Taking advantage of the lack of crossing over in homologous chromosomes in female Lepidoptera, 3,177 maternally inherited RAD alleles were assigned to the 31 chromosomes, enabling identification of the spinosad resistance and W/Z sex chromosomes. Paired-end reads for each RAD allele were assembled into contigs and compared to the genome of Bombyx mori (n = 28) using BLAST, revealing 28 homologous matches plus 3 expected fusion/breakage events which account for the difference in chromosome number. A genome-wide linkage map (1292 cM) was inferred with 2,878 segregating RAD alleles inherited from the backcross father, producing chromosome and location specific sequenced RAD markers. Here we have used RAD sequencing to construct a genetic linkage map de novo for an organism that has no previous genome data. Comparative analysis of P. xyloxtella linkage groups with B. mori chromosomes shows for the first time, genetic synteny appears common beyond the Macrolepidoptera. RAD sequencing is a powerful system capable of rapidly generating chromosome specific data for non-model organisms.

]]>
<![CDATA[Chemical-Induced Read-Through at Premature Termination Codons Determined by a Rapid Dual-Fluorescence System Based on S. cerevisiae]]> https://www.researchpad.co/article/5989da83ab0ee8fa60b9b880

Nonsense mutations generate in-frame stop codons in mRNA leading to a premature arrest of translation. Functional consequences of premature termination codons (PTCs) include the synthesis of truncated proteins with loss of protein function causing severe inherited or acquired diseases. A therapeutic approach has been recently developed that is based on the use of chemical agents with the ability to suppress PTCs (read-through) restoring the synthesis of a functional full-length protein. Research interest for compounds able to induce read-through requires an efficient high throughput large scale screening system. We present a rapid, sensitive and quantitative method based on a dual-fluorescence reporter expressed in the yeast Saccharomyces cerevisiae to monitor and quantitate read-through at PTCs. We have shown that our novel system works equally well in detecting read-through at all three PTCs UGA, UAG and UAA.

]]>
<![CDATA[Cross-Species Rhesus Cytomegalovirus Infection of Cynomolgus Macaques]]> https://www.researchpad.co/article/5989db3aab0ee8fa60bd4a8e

Cytomegaloviruses (CMV) are highly species-specific due to millennia of co-evolution and adaptation to their host, with no successful experimental cross-species infection in primates reported to date. Accordingly, full genome phylogenetic analysis of multiple new CMV field isolates derived from two closely related nonhuman primate species, Indian-origin rhesus macaques (RM) and Mauritian-origin cynomolgus macaques (MCM), revealed distinct and tight lineage clustering according to the species of origin, with MCM CMV isolates mirroring the limited genetic diversity of their primate host that underwent a population bottleneck 400 years ago. Despite the ability of Rhesus CMV (RhCMV) laboratory strain 68–1 to replicate efficiently in MCM fibroblasts and potently inhibit antigen presentation to MCM T cells in vitro, RhCMV 68–1 failed to productively infect MCM in vivo, even in the absence of host CD8+ T and NK cells. In contrast, RhCMV clone 68–1.2, genetically repaired to express the homologues of the HCMV anti-apoptosis gene UL36 and epithelial cell tropism genes UL128 and UL130 absent in 68–1, efficiently infected MCM as evidenced by the induction of transgene-specific T cells and virus shedding. Recombinant variants of RhCMV 68–1 and 68–1.2 revealed that expression of either UL36 or UL128 together with UL130 enabled productive MCM infection, indicating that multiple layers of cross-species restriction operate even between closely related hosts. Cumulatively, these results implicate cell tropism and evasion of apoptosis as critical determinants of CMV transmission across primate species barriers, and extend the macaque model of human CMV infection and immunology to MCM, a nonhuman primate species with uniquely simplified host immunogenetics.

]]>
<![CDATA[Multitask Learning of Signaling and Regulatory Networks with Application to Studying Human Response to Flu]]> https://www.researchpad.co/article/5989db0bab0ee8fa60bca384

Reconstructing regulatory and signaling response networks is one of the major goals of systems biology. While several successful methods have been suggested for this task, some integrating large and diverse datasets, these methods have so far been applied to reconstruct a single response network at a time, even when studying and modeling related conditions. To improve network reconstruction we developed MT-SDREM, a multi-task learning method which jointly models networks for several related conditions. In MT-SDREM, parameters are jointly constrained across the networks while still allowing for condition-specific pathways and regulation. We formulate the multi-task learning problem and discuss methods for optimizing the joint target function. We applied MT-SDREM to reconstruct dynamic human response networks for three flu strains: H1N1, H5N1 and H3N2. Our multi-task learning method was able to identify known and novel factors and genes, improving upon prior methods that model each condition independently. The MT-SDREM networks were also better at identifying proteins whose removal affects viral load indicating that joint learning can still lead to accurate, condition-specific, networks. Supporting website with MT-SDREM implementation: http://sb.cs.cmu.edu/mtsdrem

]]>