ResearchPad - computational-biology https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Chloroplast genomes of Rubiaceae: Comparative genomics and molecular phylogeny in subfamily Ixoroideae]]> https://www.researchpad.co/article/elastic_article_11231 In Rubiaceae phylogenetics, the number of markers often proved a limitation with authors failing to provide well-supported trees at tribal and generic levels. A robust phylogeny is a prerequisite to study the evolutionary patterns of traits at different taxonomic levels. Advances in next-generation sequencing technologies have revolutionized biology by providing, at reduced cost, huge amounts of data for an increased number of species. Due to their highly conserved structure, generally recombination-free, and mostly uniparental inheritance, chloroplast DNA sequences have long been used as choice markers for plant phylogeny reconstruction. The main objectives of this study are: 1) to gain insight in chloroplast genome evolution in the Rubiaceae (Ixoroideae) through efficient methodology for de novo assembly of plastid genomes; and, 2) to test the efficiency of mining SNPs in the nuclear genome of Ixoroideae based on the use of a coffee reference genome to produce well-supported nuclear trees. We assembled whole chloroplast genome sequences for 27 species of the Rubiaceae subfamily Ixoroideae using next-generation sequences. Analysis of the plastid genome structure reveals a relatively good conservation of gene content and order. Generally, low variation was observed between taxa in the boundary regions with the exception of the inverted repeat at both the large and short single copy junctions for some taxa. An average of 79% of the SNP determined in the Coffea genus are transferable to Ixoroideae, with variation ranging from 35% to 96%. In general, the plastid and the nuclear genome phylogenies are congruent with each other. They are well-resolved with well-supported branches. Generally, the tribes form well-identified clades but the tribe Sherbournieae is shown to be polyphyletic. The results are discussed relative to the methodology used and the chloroplast genome features in Rubiaceae and compared to previous Rubiaceae phylogenies.

]]>
<![CDATA[Active Notch signaling is required for arm regeneration in a brittle star]]> https://www.researchpad.co/article/elastic_article_7845 Cell signaling pathways play key roles in coordinating cellular events in development. The Notch signaling pathway is highly conserved across all multicellular animals and is known to coordinate a multitude of diverse cellular events, including proliferation, differentiation, fate specification, and cell death. Specific functions of the pathway are, however, highly context-dependent and are not well characterized in post-traumatic regeneration. Here, we use a small-molecule inhibitor of the pathway (DAPT) to demonstrate that Notch signaling is required for proper arm regeneration in the brittle star Ophioderma brevispina, a highly regenerative member of the phylum Echinodermata. We also employ a transcriptome-wide gene expression analysis (RNA-seq) to characterize the downstream genes controlled by the Notch pathway in the brittle star regeneration. We demonstrate that arm regeneration involves an extensive cross-talk between the Notch pathway and other cell signaling pathways. In the regrowing arm, Notch regulates the composition of the extracellular matrix, cell migration, proliferation, and apoptosis, as well as components of the innate immune response. We also show for the first time that Notch signaling regulates the activity of several transposable elements. Our data also suggests that one of the possible mechanisms through which Notch sustains its activity in the regenerating tissues is via suppression of Neuralized1.

]]>
<![CDATA[Medusa: Software to build and analyze ensembles of genome-scale metabolic network reconstructions]]> https://www.researchpad.co/article/elastic_article_7734 Uncertainty in the structure and parameters of networks is ubiquitous across computational biology. In constraint-based reconstruction and analysis of metabolic networks, this uncertainty is present both during the reconstruction of networks and in simulations performed with them. Here, we present Medusa, a Python package for the generation and analysis of ensembles of genome-scale metabolic network reconstructions. Medusa builds on the COBRApy package for constraint-based reconstruction and analysis by compressing a set of models into a compact ensemble object, providing functions for the generation of ensembles using experimental data, and extending constraint-based analyses to ensemble scale. We demonstrate how Medusa can be used to generate ensembles and perform ensemble simulations, and how machine learning can be used in conjunction with Medusa to guide the curation of genome-scale metabolic network reconstructions. Medusa is available under the permissive MIT license from the Python Packaging Index (https://pypi.org) and from github (https://github.com/opencobra/Medusa), and comprehensive documentation is available at https://medusa.readthedocs.io/en/latest.

]]>
<![CDATA[fRNAkenseq: a fully powered-by-CyVerse cloud integrated RNA-sequencing analysis tool]]> https://www.researchpad.co/article/elastic_article_8314 Decreasing costs make RNA sequencing technologies increasingly affordable for biologists. However, many researchers who can now afford sequencing lack access to resources necessary for downstream analysis. This means that even as algorithms to process RNA-Seq data improve, many biologists still struggle to manage the sheer volume of data produced by next generation sequencing (NGS) technologies. Scalable bioinformatics tools that exploit multiple platforms are needed to democratize bioinformatics resources in the sequencing era. This is essential for equipping many research groups in the life sciences with the tools to process the increasingly unwieldy datasets they produce.MethodsOne strategy to address this challenge is to develop a modern generation of sequence analysis tools capable of seamless data sharing and communication. Such tools will provide interoperability through offerings of interlinked resources. Systems of interlinked, scalable resources, which often incorporate cloud data storage, are broadly referred to as cyberinfrastructure. Cyberinfrastructure integrated tools will help researchers to robustly analyze large scale datasets by efficiently sharing data burdens across a distributed architecture. Additionally, interoperability will allow emerging tools to cross-adapt features of existing tools. It is important that these tools are designed to be easy to use for biologists.ResultsWe introduce fRNAkenseq, a powered-by-CyVerse RNA sequencing analysis tool that exhibits interoperability with other resources and meets the needs of biologists for comprehensive, easy to use RNA sequencing analysis. fRNAkenseq leverages a complex set of Application Programming Interfaces (APIs) associated with the NSF-funded cyberinfrastructure project, CyVerse, to execute FASTQ-to-differential expression RNA-Seq analyses. Integrating across bioinformatics platforms, fRNAkenseq also exploits cloud integration and cross-talk with another CyVerse associated tool, CoGe. fRNAkenseq offers novel features for the biologist such as more robust and comprehensive pipelines for enrichment than those currently available by default in a single tool, whether they are cloud-based or local installation. Importantly, cross-talk with CoGe allows fRNAkenseq users to execute RNA-Seq pipelines on an inventory of 47,000 archived genomes stored in CoGe or upload their own draft genome. ]]> <![CDATA[Understanding the computation of time using neural network models]]> https://www.researchpad.co/article/elastic_article_8305 To maximize future rewards in this ever-changing world, animals must be able to discover the temporal structure of stimuli and then anticipate or act correctly at the right time. How do animals perceive, maintain, and use time intervals ranging from hundreds of milliseconds to multiseconds in working memory? How is temporal information processed concurrently with spatial information and decision making? Why are there strong neuronal temporal signals in tasks in which temporal information is not required? A systematic understanding of the underlying neural mechanisms is still lacking. Here, we addressed these problems using supervised training of recurrent neural network models. We revealed that neural networks perceive elapsed time through state evolution along stereotypical trajectory, maintain time intervals in working memory in the monotonic increase or decrease of the firing rates of interval-tuned neurons, and compare or produce time intervals by scaling state evolution speed. Temporal and nontemporal information is coded in subspaces orthogonal with each other, and the state trajectories with time at different nontemporal information are quasiparallel and isomorphic. Such coding geometry facilitates the decoding generalizability of temporal and nontemporal information across each other. The network structure exhibits multiple feedforward sequences that mutually excite or inhibit depending on whether their preferences of nontemporal information are similar or not. We identified four factors that facilitate strong temporal signals in nontiming tasks, including the anticipation of coming events. Our work discloses fundamental computational principles of temporal processing, and it is supported by and gives predictions to a number of experimental phenomena.

]]>
<![CDATA[Molecular dysregulation of ciliary polycystin-2 channels caused by variants in the TOP domain]]> https://www.researchpad.co/article/elastic_article_8269 Genetic variants in PKD2 which encodes for the polycystin-2 ion channel are responsible for many clinical cases of autosomal dominant polycystic kidney disease (ADPKD). Despite our strong understanding of the genetic basis of ADPKD, we do not know how most variants impact channel function. Polycystin-2 is found in organelle membranes, including the primary cilium—an antennae-like structure on the luminal side of the collecting duct. In this study, we focus on the structural and mechanistic regulation of polycystin-2 by its TOP domain—a site with unknown function that is commonly altered by missense variants. We use direct cilia electrophysiology, cryogenic electron microscopy, and superresolution imaging to determine that variants of the TOP domain finger 1 motif destabilizes the channel structure and impairs channel opening without altering cilia localization and channel assembly. Our findings support the channelopathy classification of PKD2 variants associated with ADPKD, where polycystin-2 channel dysregulation in the primary cilia may contribute to cystogenesis.

]]>
<![CDATA[Quantitative analysis of amino acid metabolism in liver cancer links glutamate excretion to nucleotide synthesis]]> https://www.researchpad.co/article/elastic_article_8264 Many cancer cells consume glutamine at high rates; counterintuitively, they simultaneously excrete glutamate, the first intermediate in glutamine metabolism. Glutamine consumption has been linked to replenishment of tricarboxylic acid cycle (TCA) intermediates and synthesis of adenosine triphosphate (ATP), but the reason for glutamate excretion is unclear. Here, we dynamically profile the uptake and excretion fluxes of a liver cancer cell line (HepG2) and use genome-scale metabolic modeling for in-depth analysis. We find that up to 30% of the glutamine is metabolized in the cytosol, primarily for nucleotide synthesis, producing cytosolic glutamate. We hypothesize that excreting glutamate helps the cell to increase the nucleotide synthesis rate to sustain growth. Indeed, we show experimentally that partial inhibition of glutamate excretion reduces cell growth. Our integrative approach thus links glutamine addiction to glutamate excretion in cancer and points toward potential drug targets.

]]>
<![CDATA[The microcircuits of striatum in silico]]> https://www.researchpad.co/article/Nb554bc96-b428-4c19-ba40-2736d903683b The basal ganglia play an important role in decision making and selection of action primarily based on input from cortex, thalamus, and the dopamine system. Their main input structure, striatum, is central to this process. It consists of two types of projection neurons, together representing 95% of the neurons, and 5% of interneurons, among which are the cholinergic, fast-spiking, and low threshold-spiking subtypes. The membrane properties, soma–dendritic shape, and intrastriatal and extrastriatal synaptic interactions of these neurons are quite well described in the mouse, and therefore they can be simulated in sufficient detail to capture their intrinsic properties, as well as the connectivity. We focus on simulation at the striatal cellular/microcircuit level, in which the molecular/subcellular and systems levels meet. We present a nearly full-scale model of the mouse striatum using available data on synaptic connectivity, cellular morphology, and electrophysiological properties to create a microcircuit mimicking the real network. A striatal volume is populated with reconstructed neuronal morphologies with appropriate cell densities, and then we connect neurons together based on appositions between neurites as possible synapses and constrain them further with available connectivity data. Moreover, we simulate a subset of the striatum involving 10,000 neurons, with input from cortex, thalamus, and the dopamine system, as a proof of principle. Simulation at this biological scale should serve as an invaluable tool to understand the mode of operation of this complex structure. This platform will be updated with new data and expanded to simulate the entire striatum.

]]>
<![CDATA[Cavitation in soft matter]]> https://www.researchpad.co/article/Nd0a93384-098b-4855-abf9-29f74edc2c6d Cavitation is the sudden, unstable expansion of a void or bubble within a liquid or solid subjected to a negative hydrostatic stress. Cavitation rheology is a field emerging from the development of a suite of materials characterization, damage quantification, and therapeutic techniques that exploit the physical principles of cavitation. Cavitation rheology is inherently complex and broad in scope with wide-ranging applications in the biology, chemistry, materials, and mechanics communities. This perspective aims to drive collaboration among these communities and guide discussion by defining a common core of high-priority goals while highlighting emerging opportunities in the field of cavitation rheology. A brief overview of the mechanics and dynamics of cavitation in soft matter is presented. This overview is followed by a discussion of the overarching goals of cavitation rheology and an overview of common experimental techniques. The larger unmet needs and challenges of cavitation in soft matter are then presented alongside specific opportunities for researchers from different disciplines to contribute to the field.

]]>
<![CDATA[Redefining the heterogeneity of peripheral nerve cells in health and autoimmunity]]> https://www.researchpad.co/article/Nf11306cd-b1fa-4dea-b8f8-c518a6d7fffd Peripheral nerves contain axons and their enwrapping glia cells named Schwann cells (SCs) that are either myelinating (mySCs) or nonmyelinating (nmSCs). Our understanding of other cells in the peripheral nervous system (PNS) remains limited. Here, we provide an unbiased single cell transcriptomic characterization of the nondiseased rodent PNS. We identified and independently confirmed markers of previously underappreciated nmSCs and nerve-associated fibroblasts. We also found and characterized two distinct populations of nerve-resident homeostatic myeloid cells that transcriptionally differed from central nervous system microglia. In a model of chronic autoimmune neuritis, homeostatic myeloid cells were outnumbered by infiltrating lymphocytes which modulated the local cell–cell interactome and induced a specific transcriptional response in glia cells. This response was partially shared between the peripheral and central nervous system glia, indicating common immunological features across different parts of the nervous system. Our study thus identifies subtypes and cell-type markers of PNS cells and a partially conserved autoimmunity module induced in glia cells.

]]>
<![CDATA[New candidates for regulated gene integrity revealed through precise mapping of integrative genetic elements]]> https://www.researchpad.co/article/N249fead6-5cee-4399-8f76-356c51ff87d2 Integrative genetic elements (IGEs) are mobile multigene DNA units that integrate into and excise from host bacterial genomes. Each IGE usually targets a specific site within a conserved host gene, integrating in a manner that preserves target gene function. However, a small number of bacterial genes are known to be inactivated upon IGE integration and reactivated upon excision, regulating phenotypes of virulence, mutation rate, and terminal differentiation in multicellular bacteria. The list of regulated gene integrity (RGI) cases has been slow-growing because IGEs have been challenging to precisely and comprehensively locate in genomes. We present software (TIGER) that maps IGEs with unprecedented precision and without attB site bias. TIGER uses a comparative genomic, ping-pong BLAST approach, based on the principle that the IGE integration module (i.e. its int-attP region) is cohesive. The resultant IGEs from 2168 genomes, along with integrase phylogenetic analysis and gene inactivation tests, revealed 19 new cases of genes whose integrity is regulated by IGEs (including dut, eccCa1, gntT, hrpB, merA, ompN, prkA, tqsA, traG, yifB, yfaT and ynfE), as well as recovering previously known cases (in sigK, spsM, comK, mlrA and hlb genes). It also recovered known clades of site-promiscuous integrases and identified possible new ones.

]]>
<![CDATA[Epigenetic engineering of yeast reveals dynamic molecular adaptation to methylation stress and genetic modulators of specific DNMT3 family members]]> https://www.researchpad.co/article/N2343e3d1-81c8-485b-9565-e1b4c8638393 Cytosine methylation is a ubiquitous modification in mammalian DNA generated and maintained by several DNA methyltransferases (DNMTs) with partially overlapping functions and genomic targets. To systematically dissect the factors specifying each DNMT’s activity, we engineered combinatorial knock-in of human DNMT genes in Komagataella phaffii, a yeast species lacking endogenous DNA methylation. Time-course expression measurements captured dynamic network-level adaptation of cells to DNMT3B1-induced DNA methylation stress and showed that coordinately modulating the availability of S-adenosyl methionine (SAM), the essential metabolite for DNMT-catalyzed methylation, is an evolutionarily conserved epigenetic stress response, also implicated in several human diseases. Convolutional neural networks trained on genome-wide CpG-methylation data learned distinct sequence preferences of DNMT3 family members. A simulated annealing interpretation method resolved these preferences into individual flanking nucleotides and periodic poly(A) tracts that rotationally position highly methylated cytosines relative to phased nucleosomes. Furthermore, the nucleosome repeat length defined the spatial unit of methylation spreading. Gene methylation patterns were similar to those in mammals, and hypo- and hypermethylation were predictive of increased and decreased transcription relative to control, respectively, in the absence of mammalian readers of DNA methylation. Introducing controlled epigenetic perturbations in yeast thus enabled characterization of fundamental genomic features directing specific DNMT3 proteins.

]]>
<![CDATA[Using GARDEN-NET and <tt>ChAseR</tt> to explore human haematopoietic 3D chromatin interaction networks]]> https://www.researchpad.co/article/N9a775c2f-3d25-44aa-8025-43c3a9786708 We introduce an R package and a web-based visualization tool for the representation, analysis and integration of epigenomic data in the context of 3D chromatin interaction networks. GARDEN-NET allows for the projection of user-submitted genomic features on pre-loaded chromatin interaction networks, exploiting the functionalities of the ChAseR package to explore the features in combination with chromatin network topology properties. We demonstrate the approach using published epigenomic and chromatin structure datasets in haematopoietic cells, including a collection of gene expression, DNA methylation and histone modifications data in primary healthy myeloid cells from hundreds of individuals. These datasets allow us to test the robustness of chromatin assortativity, which highlights which epigenomic features, alone or in combination, are more strongly associated with 3D genome architecture. We find evidence for genomic regions with specific histone modifications, DNA methylation, and gene expression levels to be forming preferential contacts in 3D nuclear space, to a different extent depending on the cell type and lineage. Finally, we examine replication timing data and find it to be the genomic feature most strongly associated with overall 3D chromatin organization at multiple scales, consistent with previous results from the literature.

]]>
<![CDATA[Dynamics of genetic variation in transcription factors and its implications for the evolution of regulatory networks in Bacteria]]> https://www.researchpad.co/article/N185191d8-1add-4201-8a08-2a4575aa641a The evolution of regulatory networks in Bacteria has largely been explained at macroevolutionary scales through lateral gene transfer and gene duplication. Transcription factors (TF) have been found to be less conserved across species than their target genes (TG). This would be expected if TFs accumulate mutations faster than TGs. This hypothesis is supported by several lab evolution studies which found TFs, especially global regulators, to be frequently mutated. Despite these studies, the contribution of point mutations in TFs to the evolution of regulatory network is poorly understood. We tested if TFs show greater genetic variation than their TGs using whole-genome sequencing data from a large collection of Escherichia coli isolates. TFs were less diverse than their TGs across natural isolates, with TFs of large regulons being more conserved. In contrast, TFs showed higher mutation frequency in adaptive laboratory evolution experiments. However, over long-term laboratory evolution spanning 60 000 generations, mutation frequency in TFs gradually declined after a rapid initial burst. Extrapolating the dynamics of genetic variation from long-term laboratory evolution to natural populations, we propose that point mutations, conferring large-scale gene expression changes, may drive the early stages of adaptation but gene regulation is subjected to stronger purifying selection post adaptation.

]]>
<![CDATA[Dental characters used in phylogenetic analyses of mammals show higher rates of evolution, but not reduced independence]]> https://www.researchpad.co/article/Nf925bd92-5f9c-4397-b631-1d827df41b5c

Accurate reconstructions of phylogeny are essential for studying the evolution of a clade, and morphological characters are necessarily used for the reconstruction of the relationships of fossil organisms. However, variation in their evolutionary modes (for example rate variation and character non-independence) not accounted for in analyses may be leading to unreliable phylogenies. A recent study suggested that phylogenetic analyses of mammals may be suffering from a dominance of dental characters, which were shown to have lower phylogenetic signal than osteological characters and produced phylogenies less congruent with molecularly-derived benchmarks. Here we build on this previous work by testing five additional morphological partitions for phylogenetic signal and examining what aspects of dental and other character evolution may be affecting this, by fitting models of discrete character evolution to phylogenies inferred and time calibrated using molecular data. Results indicate that the phylogenetic signal of discrete characters correlate most strongly with rates of evolution, with increased rates driving increased homoplasy. In a dataset covering all Mammalia, dental characters have higher rates of evolution than other partitions. They do not, however, fit a model of independent character evolution any worse than other regions. Primates and marsupials show different patterns to other mammal clades, with dental characters evolving at slower rates and being more heavily integrated (less independent). While the dominance of dental characters in analyses of mammals could be leading to inaccurate phylogenies, the issue is not unique to dental characters and the results are not consistent across datasets. Molecular benchmarks (being entirely independent of the character data) provide a framework for examining each dataset individually to assess the evolution of the characters used.

]]>
<![CDATA[Discovery of potential targets of Triptolide through inverse docking in ovarian cancer cells]]> https://www.researchpad.co/article/N8c70d1b6-1722-4791-949d-b68bd90aa593

Triptolide (TPL) is proposed as an effective anticancer agent known for its anti-proliferation of a variety of cancer cells including ovarian cancer cells. Although some studies have been conducted, the mechanism by which TPL acts on ovarian cancer remains to be clearly described. Herein, systematic work based on bioinformatics was carried out to discover the potential targets of TPL in SKOV-3 cells. TPL induces the early apoptosis of SKOV-3 cells in a dose- and time-dependent manner with an IC50 = 40 ± 0.89 nM when cells are incubated for 48 h. Moreover, 20 nM TPL significantly promotes early apoptosis at a rate of 40.73%. Using a self-designed inverse molecular docking protocol, we fish the top 19 probable targets of TPL from the target library, which was built on 2,250 proteins extracted from the Protein Data Bank. The 2D-DIGE assay reveals that the expression of eight genes is affected by TPL. The results of western blotting and qRT-PCR assay suggest that 40 nM of TPL up-regulates the level of Annexin A5 (6.34 ± 0.07 fold) and ATP syn thase (4.08 ± 0.08 fold) and down-regulates the level of β-Tubulin (0.11 ± 0.12 fold) and HSP90 (0.21 ± 0.09 fold). More details of TPL affecting on Annexin A5 signaling pathway will be discovered in the future. Our results define some potential targets of TPL, with the hope that this agent could be used as therapy for the preclinical treatment of ovarian cancer.

]]>
<![CDATA[Mitochondrial genomes of Columbicola feather lice are highly fragmented, indicating repeated evolution of minicircle-type genomes in parasitic lice]]> https://www.researchpad.co/article/N1859479f-adf6-472d-b1b5-f88f77b9db88

Most animals have a conserved mitochondrial genome structure composed of a single chromosome. However, some organisms have their mitochondrial genes separated on several smaller circular or linear chromosomes. Highly fragmented circular chromosomes (“minicircles”) are especially prevalent in parasitic lice (Insecta: Phthiraptera), with 16 species known to have between nine and 20 mitochondrial minicircles per genome. All of these species belong to the same clade (mammalian lice), suggesting a single origin of drastic fragmentation. Nevertheless, other work indicates a lesser degree of fragmentation (2–3 chromosomes/genome) is present in some avian feather lice (Ischnocera: Philopteridae). In this study, we tested for minicircles in four species of the feather louse genus Columbicola (Philopteridae). Using whole genome shotgun sequence data, we applied three different bioinformatic approaches for assembling the Columbicola mitochondrial genome. We further confirmed these approaches by assembling the mitochondrial genome of Pediculus humanus from shotgun sequencing reads, a species known to have minicircles. Columbicola spp. genomes are highly fragmented into 15–17 minicircles between ∼1,100 and ∼3,100 bp in length, with 1–4 genes per minicircle. Subsequent annotation of the minicircles indicated that tRNA arrangements of minicircles varied substantially between species. These mitochondrial minicircles for species of Columbicola represent the first feather lice (Philopteridae) for which minicircles have been found in a full mitochondrial genome assembly. Combined with recent phylogenetic studies of parasitic lice, our results provide strong evidence that highly fragmented mitochondrial genomes, which are otherwise rare across the Tree of Life, evolved multiple times within parasitic lice.

]]>
<![CDATA[PigLeg: prediction of swine phenotype using machine learning]]> https://www.researchpad.co/article/N823fa3cb-5286-4b44-9d39-27d7bb6cdb07

Industrial pig farming is associated with negative technological pressure on the bodies of pigs. Leg weakness and lameness are the sources of significant economic loss in raising pigs. Therefore, it is important to identify the predictors of limb condition. This work presents assessments of the state of limbs using indicators of growth and meat characteristics of pigs based on machine learning algorithms. We have evaluated and compared the accuracy of prediction for nine ML classification algorithms (Random Forest, K-Nearest Neighbors, Artificial Neural Networks, C50Tree, Support Vector Machines, Naive Bayes, Generalized Linear Models, Boost, and Linear Discriminant Analysis) and have identified the Random Forest and K-Nearest Neighbors as the best-performing algorithms for predicting pig leg weakness using a small set of simple measurements that can be taken at an early stage of animal development. Measurements of Muscle Thickness, Back Fat amount, and Average Daily Gain were found to be significant predictors of the conformation of pig limbs. Our work demonstrates the utility and relative ease of using machine learning algorithms to assess the state of limbs in pigs based on growth rate and meat characteristics.

]]>
<![CDATA[Insight into the Structure of Amyloid Fibrils from the Analysis of Globular Proteins]]> https://www.researchpad.co/article/5b7bf959463d7e029d0112f3

The conversion from soluble states into cross-β fibrillar aggregates is a property shared by many different proteins and peptides and was hence conjectured to be a generic feature of polypeptide chains. Increasing evidence is now accumulating that such fibrillar assemblies are generally characterized by a parallel in-register alignment of β-strands contributed by distinct protein molecules. Here we assume a universal mechanism is responsible for β-structure formation and deduce sequence-specific interaction energies between pairs of protein fragments from a statistical analysis of the native folds of globular proteins. The derived fragment–fragment interaction was implemented within a novel algorithm, prediction of amyloid structure aggregation (PASTA), to investigate the role of sequence heterogeneity in driving specific aggregation into ordered self-propagating cross-β structures. The algorithm predicts that the parallel in-register arrangement of sequence portions that participate in the fibril cross-β core is favoured in most cases. However, the antiparallel arrangement is correctly discriminated when present in fibrils formed by short peptides. The predictions of the most aggregation-prone portions of initially unfolded polypeptide chains are also in excellent agreement with available experimental observations. These results corroborate the recent hypothesis that the amyloid structure is stabilised by the same physicochemical determinants as those operating in folded proteins. They also suggest that side chain–side chain interaction across neighbouring β-strands is a key determinant of amyloid fibril formation and of their self-propagating ability.

]]>
<![CDATA[FLORA: A Novel Method to Predict Protein Function from Structure in Diverse Superfamilies]]> https://www.researchpad.co/article/5989da9aab0ee8fa60ba35f1

Predicting protein function from structure remains an active area of interest, particularly for the structural genomics initiatives where a substantial number of structures are initially solved with little or no functional characterisation. Although global structure comparison methods can be used to transfer functional annotations, the relationship between fold and function is complex, particularly in functionally diverse superfamilies that have evolved through different secondary structure embellishments to a common structural core. The majority of prediction algorithms employ local templates built on known or predicted functional residues. Here, we present a novel method (FLORA) that automatically generates structural motifs associated with different functional sub-families (FSGs) within functionally diverse domain superfamilies. Templates are created purely on the basis of their specificity for a given FSG, and the method makes no prior prediction of functional sites, nor assumes specific physico-chemical properties of residues. FLORA is able to accurately discriminate between homologous domains with different functions and substantially outperforms (a 2–3 fold increase in coverage at low error rates) popular structure comparison methods and a leading function prediction method. We benchmark FLORA on a large data set of enzyme superfamilies from all three major protein classes (α, β, αβ) and demonstrate the functional relevance of the motifs it identifies. We also provide novel predictions of enzymatic activity for a large number of structures solved by the Protein Structure Initiative. Overall, we show that FLORA is able to effectively detect functionally similar protein domain structures by purely using patterns of structural conservation of all residues.

]]>