ResearchPad - systems-biology https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Robust landscapes of ribosome dwell times and aminoacyl-tRNAs in response to nutrient stress in liver]]> https://www.researchpad.co/article/Nda03592f-cc78-44ee-ae52-ab18aa18c815 Translation depends on messenger RNA (mRNA)-specific initiation, elongation, and termination rates. While translation elongation is well studied in bacteria and yeast, less is known in higher eukaryotes. Here we combined ribosome and transfer RNA (tRNA) profiling to investigate the relations between translation elongation rates, (aminoacyl-) tRNA levels, and codon usage in mammals. We modeled codon-specific ribosome dwell times from ribosome profiling, considering codon pair interactions between ribosome sites. In mouse liver, the model revealed site- and codon-specific dwell times that differed from those in yeast, as well as pairs of adjacent codons in the P and A site that markedly slow down or speed up elongation. While translation efficiencies vary across diurnal time and feeding regimen, codon dwell times were highly stable and conserved in human. Measured tRNA levels correlated with codon usage and several tRNAs showed reduced aminoacylation, which was conserved in fasted mice. Finally, we uncovered that the longest codon dwell times could be explained by aminoacylation levels or high codon usage relative to tRNA abundance.

]]>
<![CDATA[Causal network perturbations for instance-specific analysis of single cell and disease samples]]> https://www.researchpad.co/article/N1cc2695a-94a2-4308-a8ed-78dbffed8cb0

Abstract

Motivation

Complex diseases involve perturbation in multiple pathways and a major challenge in clinical genomics is characterizing pathway perturbations in individual samples. This can lead to patient-specific identification of the underlying mechanism of disease thereby improving diagnosis and personalizing treatment. Existing methods rely on external databases to quantify pathway activity scores. This ignores the data dependencies and that pathways are incomplete or condition-specific.

Results

ssNPA is a new approach for subtyping samples based on deregulation of their gene networks. ssNPA learns a causal graph directly from control data. Sample-specific network neighborhood deregulation is quantified via the error incurred in predicting the expression of each gene from its Markov blanket. We evaluate the performance of ssNPA on liver development single-cell RNA-seq data, where the correct cell timing is recovered; and two TCGA datasets, where ssNPA patient clusters have significant survival differences. In all analyses ssNPA consistently outperforms alternative methods, highlighting the advantage of network-based approaches.

Availability and implementation

http://www.benoslab.pitt.edu/Software/ssnpa/.

Supplementary information

Supplementary data are available at Bioinformatics online.

]]>
<![CDATA[LiPLike: towards gene regulatory network predictions of high certainty]]> https://www.researchpad.co/article/Nf5f1f81e-ae82-4972-aed7-7e6d56102d72

Abstract

Motivation

High correlation in expression between regulatory elements is a persistent obstacle for the reverse-engineering of gene regulatory networks. If two potential regulators have matching expression patterns, it becomes challenging to differentiate between them, thus increasing the risk of false positive identifications.

Results

To allow for gene regulation predictions of high confidence, we propose a novel method, the Linear Profile Likelihood (LiPLike), that assumes a regression model and iteratively searches for interactions that cannot be replaced by a linear combination of other predictors. To compare the performance of LiPLike with other available inference methods, we benchmarked LiPLike using three independent datasets from the Dialogue on Reverse Engineering Assessment and Methods 5 (DREAM5) network inference challenge. We found that LiPLike could be used to stratify predictions of other inference tools, and when applied to the predictions of DREAM5 participants, we observed an average improvement in accuracy of >140% compared to individual methods. Furthermore, LiPLike was able to independently predict networks better than all DREAM5 participants when applied to biological data. When predicting the Escherichia coli network, LiPLike had an accuracy of 0.38 for the top-ranked 100 interactions, whereas the corresponding DREAM5 consensus model yielded an accuracy of 0.11.

Availability and implementation

We made LiPLike available to the community as a Python toolbox, available at https://gitlab.com/Gustafsson-lab/liplike. We believe that LiPLike will be used for high confidence predictions in studies where individual model interactions are of high importance, and to remove false positive predictions made by other state-of-the-art gene–gene regulation prediction tools.

Supplementary information

Supplementary data are available at Bioinformatics online.

]]>
<![CDATA[A molecular view on the escape of lipoplexed DNA from the endosome]]> https://www.researchpad.co/article/N1c729fed-af73-4608-980a-0b4b1332ff9e

The use of non-viral vectors for in vivo gene therapy could drastically increase safety, whilst reducing the cost of preparing the vectors. A promising approach to non-viral vectors makes use of DNA/cationic liposome complexes (lipoplexes) to deliver the genetic material. Here we use coarse-grained molecular dynamics simulations to investigate the molecular mechanism underlying efficient DNA transfer from lipoplexes. Our computational fusion experiments of lipoplexes with endosomal membrane models show two distinct modes of transfection: parallel and perpendicular. In the parallel fusion pathway, DNA aligns with the membrane surface, showing very quick release of genetic material shortly after the initial fusion pore is formed. The perpendicular pathway also leads to transfection, but release is slower. We further show that the composition and size of the lipoplex, as well as the lipid composition of the endosomal membrane, have a significant impact on fusion efficiency in our models.

]]>
<![CDATA[Model-based clustering of multi-tissue gene expression data]]> https://www.researchpad.co/article/N6e606405-bb04-46fa-8df5-7fe0062ed86c

Abstract

Motivation

Recently, it has become feasible to generate large-scale, multi-tissue gene expression data, where expression profiles are obtained from multiple tissues or organs sampled from dozens to hundreds of individuals. When traditional clustering methods are applied to this type of data, important information is lost, because they either require all tissues to be analyzed independently, ignoring dependencies and similarities between tissues, or to merge tissues in a single, monolithic dataset, ignoring individual characteristics of tissues.

Results

We developed a Bayesian model-based multi-tissue clustering algorithm, revamp, which can incorporate prior information on physiological tissue similarity, and which results in a set of clusters, each consisting of a core set of genes conserved across tissues as well as differential sets of genes specific to one or more subsets of tissues. Using data from seven vascular and metabolic tissues from over 100 individuals in the STockholm Atherosclerosis Gene Expression (STAGE) study, we demonstrate that multi-tissue clusters inferred by revamp are more enriched for tissue-dependent protein-protein interactions compared to alternative approaches. We further demonstrate that revamp results in easily interpretable multi-tissue gene expression associations to key coronary artery disease processes and clinical phenotypes in the STAGE individuals.

Availability and implementation

Revamp is implemented in the Lemon-Tree software, available at https://github.com/eb00/lemon-tree

Supplementary information

Supplementary data are available at Bioinformatics online.

]]>
<![CDATA[Theoretical relation between axon initial segment geometry and excitability]]> https://www.researchpad.co/article/Ndc077c28-2afe-4636-8075-98a814239d9b

In most vertebrate neurons, action potentials are triggered at the distal end of the axon initial segment (AIS). Both position and length of the AIS vary across and within neuron types, with activity, development and pathology. What is the impact of AIS geometry on excitability? Direct empirical assessment has proven difficult because of the many potential confounding factors. Here, we carried a principled theoretical analysis to answer this question. We provide a simple formula relating AIS geometry and sodium conductance density to the somatic voltage threshold. A distal shift of the AIS normally produces a (modest) increase in excitability, but we explain how this pattern can reverse if a hyperpolarizing current is present at the AIS, due to resistive coupling with the soma. This work provides a theoretical tool to assess the significance of structural AIS plasticity for electrical function.

]]>
<![CDATA[Repurposing Didanosine as a Potential Treatment for COVID-19 Using Single-Cell RNA Sequencing Data]]> https://www.researchpad.co/article/N0e73f668-a186-40b6-bda0-b7802cb37728

As of today (7 April 2020), more than 81,000 people around the world have died from the coronavirus disease 19 (COVID-19) pandemic. There is no approved drug or vaccine for COVID-19, although more than 10 clinical trials have been launched to test potential drugs. In an urgent response to this pandemic, I developed a bioinformatics pipeline to identify compounds and drug candidates to potentially treat COVID-19. This pipeline is based on publicly available single-cell RNA sequencing (scRNA-seq) data and the drug perturbation database “Library of Integrated Network-Based Cellular Signatures” (LINCS).

]]>
<![CDATA[MicroRNAs organize intrinsic variation into stem cell states]]> https://www.researchpad.co/article/N63ad2236-4f60-4b98-920b-852e0fe00c09

Significance

Understanding how mammalian organisms achieve the full diversity of cell types in the adult organism is a central goal of developmental cell biology. Recent work has shown that some embryonic precursor cells can self-organize into developmental structures but the mechanisms of gene regulation that contribute to this process remain unknown. Here we show embryonic stem cells self-organize into distinct gene expression states that resemble developmental gene programs. We find that microRNAs, small noncoding regulators of gene expression, play a critical role in organizing fluctuations across gene networks to help achieve this organization into distinct expression states.

]]>
<![CDATA[Measuring the distribution of fitness effects in somatic evolution by combining clonal dynamics with dN/dS ratios]]> https://www.researchpad.co/article/Nba8fc600-3667-4a12-bfcb-cbf3642af18f

The distribution of fitness effects (DFE) defines how new mutations spread through an evolving population. The ratio of non-synonymous to synonymous mutations (dN/dS) has become a popular method to detect selection in somatic cells. However the link, in somatic evolution, between dN/dS values and fitness coefficients is missing. Here we present a quantitative model of somatic evolutionary dynamics that determines the selective coefficients of individual driver mutations from dN/dS estimates. We then measure the DFE for somatic mutant clones in ostensibly normal oesophagus and skin. We reveal a broad distribution of fitness effects, with the largest fitness increases found for TP53 and NOTCH1 mutants (proliferative bias 1–5%). This study provides the theoretical link between dN/dS values and selective coefficients in somatic evolution, and measures the DFE of mutations in human tissues.

]]>
<![CDATA[A model for focal seizure onset, propagation, evolution, and progression]]> https://www.researchpad.co/article/N0fbb3522-dcfc-4670-a56e-5e00e4dada66

We developed a neural network model that can account for major elements common to human focal seizures. These include the tonic-clonic transition, slow advance of clinical semiology and corresponding seizure territory expansion, widespread EEG synchronization, and slowing of the ictal rhythm as the seizure approaches termination. These were reproduced by incorporating usage-dependent exhaustion of inhibition in an adaptive neural network that receives global feedback inhibition in addition to local recurrent projections. Our model proposes mechanisms that may underline common EEG seizure onset patterns and status epilepticus, and postulates a role for synaptic plasticity in the emergence of epileptic foci. Complex patterns of seizure activity and bi-stable seizure end-points arise when stochastic noise is included. With the rapid advancement of clinical and experimental tools, we believe that this model can provide a roadmap and potentially an in silico testbed for future explorations of seizure mechanisms and clinical therapies.

]]>
<![CDATA[Synthetic and genomic regulatory elements reveal aspects of cis-regulatory grammar in mouse embryonic stem cells]]> https://www.researchpad.co/article/Nae338634-3264-49a9-9c75-ac0d7eeb8497

In embryonic stem cells (ESCs), a core transcription factor (TF) network establishes the gene expression program necessary for pluripotency. To address how interactions between four key TFs contribute to cis-regulation in mouse ESCs, we assayed two massively parallel reporter assay (MPRA) libraries composed of binding sites for SOX2, POU5F1 (OCT4), KLF4, and ESRRB. Comparisons between synthetic cis-regulatory elements and genomic sequences with comparable binding site configurations revealed some aspects of a regulatory grammar. The expression of synthetic elements is influenced by both the number and arrangement of binding sites. This grammar plays only a small role for genomic sequences, as the relative activities of genomic sequences are best explained by the predicted occupancy of binding sites, regardless of binding site identity and positioning. Our results suggest that the effects of transcription factor binding sites (TFBS) are influenced by the order and orientation of sites, but that in the genome the overall occupancy of TFs is the primary determinant of activity.

]]>
<![CDATA[Wikidata as a knowledge graph for the life sciences]]> https://www.researchpad.co/article/N6bee1a31-fd7c-4a3b-b046-0aa356b217b3

Wikidata is a community-maintained knowledge base that has been assembled from repositories in the fields of genomics, proteomics, genetic variants, pathways, chemical compounds, and diseases, and that adheres to the FAIR principles of findability, accessibility, interoperability and reusability. Here we describe the breadth and depth of the biomedical knowledge contained within Wikidata, and discuss the open-source tools we have built to add information to Wikidata and to synchronize it with source databases. We also demonstrate several use cases for Wikidata, including the crowdsourced curation of biomedical ontologies, phenotype-based diagnosis of disease, and drug repurposing.

]]>
<![CDATA[The naive T-cell receptor repertoire has an extremely broad distribution of clone sizes]]> https://www.researchpad.co/article/Nc3e6a524-1bf9-48f0-a366-d2fea621ffea

The clone size distribution of the human naive T-cell receptor (TCR) repertoire is an important determinant of adaptive immunity. We estimated the abundance of TCR sequences in samples of naive T cells from blood using an accurate quantitative sequencing protocol. We observe most TCR sequences only once, consistent with the enormous diversity of the repertoire. However, a substantial number of sequences were observed multiple times. We detect abundant TCR sequences even after exclusion of methodological confounders such as sort contamination, and multiple mRNA sampling from the same cell. By combining experimental data with predictions from models we describe two mechanisms contributing to TCR sequence abundance. TCRα abundant sequences can be primarily attributed to many identical recombination events in different cells, while abundant TCRβ sequences are primarily derived from large clones, which make up a small percentage of the naive repertoire, and could be established early in the development of the T-cell repertoire.

]]>
<![CDATA[Selectivity to approaching motion in retinal inputs to the dorsal visual pathway]]> https://www.researchpad.co/article/N214cc77d-78f5-4533-bd5b-820928411f30

To efficiently navigate through the environment and avoid potential threats, an animal must quickly detect the motion of approaching objects. Current models of primate vision place the origins of this complex computation in the visual cortex. Here, we report that detection of approaching motion begins in the retina. Several ganglion cell types, the retinal output neurons, show selectivity to approaching motion. Synaptic current recordings from these cells further reveal that this preference for approaching motion arises in the interplay between presynaptic excitatory and inhibitory circuit elements. These findings demonstrate how excitatory and inhibitory circuits interact to mediate an ethologically relevant neural function. Moreover, the elementary computations that detect approaching motion begin early in the visual stream of primates.

]]>
<![CDATA[Structure-based discovery of potent and selective melatonin receptor agonists]]> https://www.researchpad.co/article/N7d8f55b0-fe17-4881-99f8-4635a1c63635

Melatonin receptors MT1 and MT2 are involved in synchronizing circadian rhythms and are important targets for treating sleep and mood disorders, type-2 diabetes and cancer. Here, we performed large scale structure-based virtual screening for new ligand chemotypes using recently solved high-resolution 3D crystal structures of agonist-bound MT receptors. Experimental testing of 62 screening candidates yielded the discovery of 10 new agonist chemotypes with sub-micromolar potency at MT receptors, with compound 21 reaching EC50 of 0.36 nM. Six of these molecules displayed selectivity for MT2 over MT1. Moreover, two most potent agonists, including 21 and a close derivative of melatonin, 28, had dramatically reduced arrestin recruitment at MT2, while compound 37 was devoid of Gi signaling at MT1, implying biased signaling. This study validates the suitability of the agonist-bound orthosteric pocket in the MT receptor structures for the structure-based discovery of selective agonists.

]]>
<![CDATA[ MDiNE: a model to estimate differential co-occurrence networks in microbiome studies]]> https://www.researchpad.co/article/Nb13dcc37-910c-4969-ae3f-4a4b8cae54d8

Abstract

Motivation

The human microbiota is the collection of microorganisms colonizing the human body, and plays an integral part in human health. A growing trend in microbiome analysis is to construct a network to estimate the co-occurrence patterns among taxa through precision matrices. Existing methods do not facilitate investigation into how these networks change with respect to covariates.

Results

We propose a new model called Microbiome Differential Network Estimation (MDiNE) to estimate network changes with respect to a binary covariate. The counts of individual taxa in the samples are modeled through a multinomial distribution whose probabilities depend on a latent Gaussian random variable. A sparse precision matrix over all the latent terms determines the co-occurrence network among taxa. The model fit is obtained and evaluated using Hamiltonian Monte Carlo methods. The performance of our model is evaluated through an extensive simulation study and is shown to outperform existing methods in terms of estimation of network parameters. We also demonstrate an application of the model to estimate changes in the intestinal microbial network topology with respect to Crohn’s disease.

Availability and implementation

MDiNE is implemented in a freely available R package: https://github.com/kevinmcgregor/mdine.

Supplementary information

Supplementary data are available at Bioinformatics online.

]]>
<![CDATA[A model of collective behavior based purely on vision]]> https://www.researchpad.co/article/N4f858719-d740-4794-b3e8-886894e18547

From minimal visual information, organized collective behavior can emerge without spatial representation or collisions.

]]>
<![CDATA[Pan-mammalian analysis of molecular constraints underlying extended lifespan]]> https://www.researchpad.co/article/N2530030d-d510-46c7-b029-27d72017e239

Although lifespan in mammals varies over 100-fold, the precise evolutionary mechanisms underlying variation in longevity remain unknown. Species-specific genetic changes have been observed in long-lived species including the naked mole-rat, bats, and the bowhead whale, but these adaptations do not generalize to other mammals. We present a novel method to identify associations between rates of protein evolution and continuous phenotypes across the entire mammalian phylogeny. Unlike previous analyses that focused on individual species, we treat absolute and relative longevity as quantitative traits and demonstrate that these lifespan traits affect the evolutionary constraint on hundreds of genes. Specifically, we find that genes related to cell cycle, DNA repair, cell death, the IGF1 pathway, and immunity are under increased evolutionary constraint in large and long-lived mammals. For mammals exceptionally long-lived for their body size, we find increased constraint in inflammation, DNA repair, and NFKB-related pathways. Strikingly, these pathways have considerable overlap with those that have been previously reported to have potentially adaptive changes in single-species studies, and thus would be expected to show decreased constraint in our analysis. This unexpected finding of increased constraint in many longevity-associated pathways underscores the power of our quantitative approach to detect patterns that generalize across the mammalian phylogeny.

]]>
<![CDATA[Deep learning models predict regulatory variants in pancreatic islets and refine type 2 diabetes association signals]]> https://www.researchpad.co/article/N58478868-7449-4201-be66-1c52cfa528a5

Genome-wide association analyses have uncovered multiple genomic regions associated with T2D, but identification of the causal variants at these remains a challenge. There is growing interest in the potential of deep learning models - which predict epigenome features from DNA sequence - to support inference concerning the regulatory effects of disease-associated variants. Here, we evaluate the advantages of training convolutional neural network (CNN) models on a broad set of epigenomic features collected in a single disease-relevant tissue – pancreatic islets in the case of type 2 diabetes (T2D) - as opposed to models trained on multiple human tissues. We report convergence of CNN-based metrics of regulatory function with conventional approaches to variant prioritization – genetic fine-mapping and regulatory annotation enrichment. We demonstrate that CNN-based analyses can refine association signals at T2D-associated loci and provide experimental validation for one such signal. We anticipate that these approaches will become routine in downstream analyses of GWAS.

]]>
<![CDATA[Gene regulatory network reconstruction using single-cell RNA sequencing of barcoded genotypes in diverse environments]]> https://www.researchpad.co/article/Ndf737c90-6364-4eac-8435-672a41ec85d6

Understanding how gene expression programs are controlled requires identifying regulatory relationships between transcription factors and target genes. Gene regulatory networks are typically constructed from gene expression data acquired following genetic perturbation or environmental stimulus. Single-cell RNA sequencing (scRNAseq) captures the gene expression state of thousands of individual cells in a single experiment, offering advantages in combinatorial experimental design, large numbers of independent measurements, and accessing the interaction between the cell cycle and environmental responses that is hidden by population-level analysis of gene expression. To leverage these advantages, we developed a method for scRNAseq in budding yeast (Saccharomyces cerevisiae). We pooled diverse transcriptionally barcoded gene deletion mutants in 11 different environmental conditions and determined their expression state by sequencing 38,285 individual cells. We benchmarked a framework for learning gene regulatory networks from scRNAseq data that incorporates multitask learning and constructed a global gene regulatory network comprising 12,228 interactions.

]]>