ResearchPad - genomic-databases https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Isolation of a novel species in the genus <i>Cupriavidus</i> from a patient with sepsis using whole genome sequencing]]> https://www.researchpad.co/article/elastic_article_14469 Whole genome sequencing (WGS) has become an accessible tool in clinical microbiology, and it allowed us to identify a novel Cupriavidus species. We isolated Gram-negative bacillus from the blood of an immunocompromised patient, and phenotypical and molecular identifications were performed. Phenotypic identification discrepancies were noted between the Vitek 2 (bioMérieux, Marcy-l’Étoile, France) and Vitek MS systems (bioMérieux). Using 16S rRNA gene sequencing, it was impossible to identify the pathogen to the species levels. WGS was performed using the Illumina MiSeq platform (Illumina, San Diego, CA), and genomic sequence database searching with a TrueBacTM ID-Genome system (ChunLab, Inc., Seoul, Republic of Korea) showed no strains with average nucleotide identity values higher than 95.0%, which is the cut-off for species-level identification. Phylogenetic analysis indicated that the bacteria was a new Cupriavidus species that formed a subcluster with Cupriavidus gilardii. WGS holds great promise for accurate molecular identification beyond 16S rRNA gene sequencing in clinical microbiology.

]]>
<![CDATA[Identification of NUDT15 gene variants in Amazonian Amerindians and admixed individuals from northern Brazil]]> https://www.researchpad.co/article/N0a09703b-e69a-40d3-8ae4-dfe23e56b45d

Introduction

The nudix hydrolase 15 (NUDT15) gene acts in the metabolism of thiopurine, by catabolizing its active metabolite thioguanosine triphosphate into its inactivated form, thioguanosine monophosphate. The frequency of alternative NUDT15 alleles, in particular those that cause a drastic loss of gene function, varies widely among geographically distinct populations. In the general population of northern Brazilian, high toxicity rates (65%) have been recorded in patients treated with the standard protocol for acute lymphoblastic leukemia, which involves thiopurine-based drugs. The present study characterized the molecular profile of the coding region of the NUDT15 gene in two groups, non-admixed Amerindians and admixed individuals from the Amazon region of northern Brazil.

Methods

The entire NUDT15 gene was sequenced in 64 Amerindians from 12 Amazonian groups and 82 admixed individuals from northern Brazil. The DNA was extracted using phenol-chloroform. The exome libraries were prepared using the Nextera Rapid Capture Exome (Illumina) and SureSelect Human All Exon V6 (Agilent) kits. The allelic variants were annotated in the ViVa® (Viewer of Variants) software.

Results

Four NUDT15 variants were identified: rs374594155, rs1272632214, rs147390019, andrs116855232. The variants rs1272632214 and rs116855232 were in complete linkage disequilibrium, and were assigned to the NUDT15*2 genotype. These variants had high frequencies in both our study populations in comparison with other populations catalogued in the 1000 Genomes database. We also identified the NUDT15*4 haplotype in our study populations, at frequencies similar to those reported in other populations from around the world.

Conclusion

Our findings indicate that Amerindian and admixed populations from northern Brazil have high frequencies of the NUDT15 haplotypes that alter the metabolism profile of thiopurines.

]]>
<![CDATA[All of gene expression (AOE): An integrated index for public gene expression databases]]> https://www.researchpad.co/article/N65b3f432-723a-4d59-a70d-2c0d696b62b7

Gene expression data have been archived as microarray and RNA-seq datasets in two public databases, Gene Expression Omnibus (GEO) and ArrayExpress (AE). In 2018, the DNA DataBank of Japan started a similar repository called the Genomic Expression Archive (GEA). These databases are useful resources for the functional interpretation of genes, but have been separately maintained and may lack RNA-seq data, while the original sequence data are available in the Sequence Read Archive (SRA). We constructed an index for those gene expression data repositories, called All Of gene Expression (AOE), to integrate publicly available gene expression data. The web interface of AOE can graphically query data in addition to the application programming interface. By collecting gene expression data from RNA-seq in the SRA, AOE also includes data not included in GEO and AE. AOE is accessible as a search tool from the GEA website and is freely available at https://aoe.dbcls.jp/.

]]>
<![CDATA[A novel nonsense variant in SUPT20H gene associated with Rheumatoid Arthritis identified by Whole Exome Sequencing of multiplex families]]> https://www.researchpad.co/article/5c8acceed5eed0c48499036b

The triggering and development of Rheumatoid Arthritis (RA) is conditioned by environmental and genetic factors. Despite the identification of more than one hundred genetic variants associated with the disease, not all the cases can be explained. Here, we performed Whole Exome Sequencing in 9 multiplex families (N = 30) to identify rare variants susceptible to play a role in the disease pathogenesis. We pre-selected 77 genes which carried rare variants with a complete segregation with RA in the studied families. Follow-up linkage and association analyses with pVAAST highlighted significant RA association of 43 genes (p-value < 0.05 after 106 permutations) and pinpointed their most likely causal variant. We re-sequenced the 10 most significant likely causal variants (p-value ≤ 3.78*10−3 after 106 permutations) in the extended pedigrees and 9 additional multiplex families (N = 110). Only one SNV in SUPT20H: c.73A>T (p.Lys25*), presented a complete segregation with RA in an extended pedigree with early-onset cases. In summary, we identified in this study a new variant associated with RA in SUPT20H gene. This gene belongs to several biological pathways like macro-autophagy and monocyte/macrophage differentiation, which contribute to RA pathogenesis. In addition, these results showed that analyzing rare variants using a family-based approach is a strategy that allows to identify RA risk loci, even with a small dataset.

]]>
<![CDATA[Analysis of genetic control and QTL mapping of essential wheat grain quality traits in a recombinant inbred population]]> https://www.researchpad.co/article/5c897730d5eed0c4847d2663

Wheat cultivars are genetically crossed to improve end-use quality for traits as per demands of baking industry and broad consumer preferences. The processing and baking qualities of bread wheat are influenced by a variety of genetic make-ups, environmental factors and their interactions. Two wheat cultivars, WL711 and C306, derived recombinant inbred lines (RILs) with a population of 206, were used for phenotyping of quality-related traits. The genetic analysis of quality traits showed considerable variation for measurable quality traits, with normal distribution and transgressive segregation across the years. From the 206 RILs, few RILs were found to be superior to those of the parental cultivars for key quality traits, indicating their potential use for the improvement of end-use quality and suggesting the probability of finding new alleles and allelic combinations from the RIL population. Mapping analysis identified 38 putative QTLs for 13 quality-related traits, with QTLs explaining 7.9–16.8% phenotypic variation spanning over 14 chromosomes, i.e., 1A, 1B, 1D, 2A, 2D, 3B, 3D, 4A, 4B, 4D, 5D, 6A, 7A and 7B. In-silico analysis based on homology to the annotated wheat genes present in database, identified six putative candidate genes within QTL for total grain protein content, qGPC.1B.1 region. Major QTL regions for other quality traits such as TKW have been identified on 1B, 2A, and 7A chromosomes in the studied RIL population. This study revealed the importance of the combination of stable QTLs with region-specific QTLs for better phenotyping, and the QTLs presented in our study will be useful for the improvement of wheat grain and bread-making quality.

]]>
<![CDATA[Apollo: Democratizing genome annotation]]> https://www.researchpad.co/article/5c648d41d5eed0c484c823a0

Genome annotation is the process of identifying the location and function of a genome's encoded features. Improving the biological accuracy of annotation is a complex and iterative process requiring researchers to review and incorporate multiple sources of information such as transcriptome alignments, predictive models based on sequence profiles, and comparisons to features found in related organisms. Because rapidly decreasing costs are enabling an ever-growing number of scientists to incorporate sequencing as a routine laboratory technique, there is widespread demand for tools that can assist in the deliberative analytical review of genomic information. To this end, we present Apollo, an open source software package that enables researchers to efficiently inspect and refine the precise structure and role of genomic features in a graphical browser-based platform. Some of Apollo’s newer user interface features include support for real-time collaboration, allowing distributed users to simultaneously edit the same encoded features while also instantly seeing the updates made by other researchers on the same region in a manner similar to Google Docs. Its technical architecture enables Apollo to be integrated into multiple existing genomic analysis pipelines and heterogeneous laboratory workflow platforms. Finally, we consider the implications that Apollo and related applications may have on how the results of genome research are published and made accessible.

]]>
<![CDATA[HuVarBase: A human variant database with comprehensive information at gene and protein levels]]> https://www.researchpad.co/article/5c5ca311d5eed0c48441f0af

Human variant databases could be better exploited if the variant data available in multiple resources is integrated in a single comprehensive resource along with sequence and structural features. Such integration would improve the analyses of variants for disease prediction, prevention or treatment. The HuVarBase (HUmanVARiantdataBASE) assimilates publicly available human variant data at protein level and gene level into a comprehensive resource. Protein level data such as amino acid sequence, secondary structure of the mutant residue, domain, function, subcellular location and post-translational modification are integrated with gene level data such as gene name, chromosome number & genome position, DNA mutation, mutation type origin and rs ID number. Disease class has been added for the disease causing variants. The database is publicly available at https://www.iitm.ac.in/bioinfo/huvarbase. A total of 774,863 variant records, integrated in the HuVarBase, can be searched with options to display, visualize and download the results.

]]>
<![CDATA[Aberrant FAM64A mRNA expression is an independent predictor of poor survival in pancreatic cancer]]> https://www.researchpad.co/article/5c59ff04d5eed0c484135981

FAM64A, a marker of cell proliferation, has been investigated as a potential biomarker in several cancers. In the present study, we examined the value of FAM64A expression in the diagnosis and prognosis of pancreatic cancer through bioinformatics analysis of data obtained from The Cancer Genome Atlas (TCGA) database. The diagnostic value of FAM64A expression in pancreatic cancer tissue was deteremined through receiver operating characteristic (ROC) curve analysis, and based on the obtained cut-off value, patients were divided into two groups (high FAM64A expression and low FAM64A expression). Chi-square and Fisher exact tests were applied to identify associations between FAM64A expression and clinical features. Moreover, the effect of FAM64A expression in the survival of pancreatic cancer patients was observed by Kaplan-Meier and Cox analyses. Gene set enrichment analysis (GSEA) was performed using the TCGA dataset. Our results showed that high FAM64A expression in pancreatic cancer was associated with survival status, overall survival (OS), and recurrence. The area under the ROC curve was 0.736, which indicated modest diagnostic value. Patients with higher FAM64A expression had significantly shorter OS and recurrence-free survival (RFS) times. Multivariate survival analysis demonstrated that high FAM64A expression was an independent risk factor for OS and RFS. GSEA identified mitotic spindles, myc targets, MTORC1 signaling, G2M checkpoint, E2F targets, DNA repair, glycolysis and unfolded protein response as differentially enriched with the high FAM64A expression phenotype. In conclusion, high FAM64A mRNA expression is an independent risk factor for poor prognosis in pancreatic cancer.

]]>
<![CDATA[A census-based estimate of Earth's bacterial and archaeal diversity]]> https://www.researchpad.co/article/5c61e8f2d5eed0c48496f4a9

The global diversity of Bacteria and Archaea, the most ancient and most widespread forms of life on Earth, is a subject of intense controversy. This controversy stems largely from the fact that existing estimates are entirely based on theoretical models or extrapolations from small and biased data sets. Here, in an attempt to census the bulk of Earth's bacterial and archaeal ("prokaryotic") clades and to estimate their overall global richness, we analyzed over 1.7 billion 16S ribosomal RNA amplicon sequences in the V4 hypervariable region obtained from 492 studies worldwide, covering a multitude of environments and using multiple alternative primers. From this data set, we recovered 739,880 prokaryotic operational taxonomic units (OTUs, 16S-V4 gene clusters at 97% similarity), a commonly used measure of microbial richness. Using several statistical approaches, we estimate that there exist globally about 0.8–1.6 million prokaryotic OTUs, of which we recovered somewhere between 47%–96%, representing >99.98% of prokaryotic cells. Consistent with this conclusion, our data set independently "recaptured" 91%–93% of 16S sequences from multiple previous global surveys, including PCR-independent metagenomic surveys. The distribution of relative OTU abundances is consistent with a log-normal model commonly observed in larger organisms; the total number of OTUs predicted by this model is also consistent with our global richness estimates. By combining our estimates with the ratio of full-length versus partial-length (V4) sequence diversity in the SILVA sequence database, we further estimate that there exist about 2.2–4.3 million full-length OTUs worldwide. When restricting our analysis to the Americas, while controlling for the number of studies, we obtain similar richness estimates as for the global data set, suggesting that most OTUs are globally distributed. Qualitatively similar results are also obtained for other 16S similarity thresholds (90%, 95%, and 99%). Our estimates constrain the extent of a poorly quantified rare microbial biosphere and refute recent predictions that there exist trillions of prokaryotic OTUs.

]]>
<![CDATA[Predictive genomic markers of response to VEGF targeted therapy in metastatic renal cell carcinoma]]> https://www.researchpad.co/article/5c644872d5eed0c484c2e6fb

Background

First-line treatment for metastatic renal cell carcinoma (mRCC) is rapidly changing. It currently includes VEGF targeted therapies (TT), multi-target tyrosine kinase inhibitors (TKIs), mTOR inhibitors, and immunotherapy. To optimize outcomes for individual patients, genomic markers of response to therapy are needed. Here, we aim to identify tumor-based genomic markers of response to VEGF TT to optimize treatment selection.

Methods

From an institutional database, primary tumor tissue was obtained from 79 patients with clear cell mRCC, and targeted sequencing was performed. Clinical outcomes were obtained retrospectively. Progression-free survival (PFS) on first-line VEGF TT was correlated to genomic alterations (GAs) using Kaplan-Meier methodology and Cox proportional hazard models. A composite model of significant GAs predicting PFS in the first-line setting was developed.

Results

Absence of VHL mutation was associated with inferior PFS on first-line VEGF TT. A trend for inferior PFS was observed with GAs in TP53 and FLT1 C/C variant. A composite model of these 3 GAs was associated with inferior PFS in a dose-dependent manner.

Conclusion

In mRCC, a composite model of TP53 mutation, wild type VHL, and FLT1 C/C variant strongly predicted PFS on first-line VEGF TT in a dose-dependent manner. These findings require external validation.

]]>
<![CDATA[Machine learning framework for assessment of microbial factory performance]]> https://www.researchpad.co/article/5c478c5fd5eed0c484bd1ec8

Metabolic models can estimate intrinsic product yields for microbial factories, but such frameworks struggle to predict cell performance (including product titer or rate) under suboptimal metabolism and complex bioprocess conditions. On the other hand, machine learning, complementary to metabolic modeling necessitates large amounts of data. Building such a database for metabolic engineering designs requires significant manpower and is prone to human errors and bias. We propose an approach to integrate data-driven methods with genome scale metabolic model for assessment of microbial bio-production (yield, titer and rate). Using engineered E. coli as an example, we manually extracted and curated a data set comprising about 1200 experimentally realized cell factories from ~100 papers. We furthermore augmented the key design features (e.g., genetic modifications and bioprocess variables) extracted from literature with additional features derived from running the genome-scale metabolic model iML1515 simulations with constraints that match the experimental data. Then, data augmentation and ensemble learning (e.g., support vector machines, gradient boosted trees, and neural networks in a stacked regressor model) are employed to alleviate the challenges of sparse, non-standardized, and incomplete data sets, while multiple correspondence analysis/principal component analysis are used to rank influential factors on bio-production. The hybrid framework demonstrates a reasonably high cross-validation accuracy for prediction of E.coli factory performance metrics under presumed bioprocess and pathway conditions (Pearson correlation coefficients between 0.8 and 0.93 on new data not seen by the model).

]]>
<![CDATA[Enabling precision medicine via standard communication of HTS provenance, analysis, and results]]> https://www.researchpad.co/article/5c605afed5eed0c4847cd976

A personalized approach based on a patient's or pathogen’s unique genomic sequence is the foundation of precision medicine. Genomic findings must be robust and reproducible, and experimental data capture should adhere to findable, accessible, interoperable, and reusable (FAIR) guiding principles. Moreover, effective precision medicine requires standardized reporting that extends beyond wet-lab procedures to computational methods. The BioCompute framework (https://w3id.org/biocompute/1.3.0) enables standardized reporting of genomic sequence data provenance, including provenance domain, usability domain, execution domain, verification kit, and error domain. This framework facilitates communication and promotes interoperability. Bioinformatics computation instances that employ the BioCompute framework are easily relayed, repeated if needed, and compared by scientists, regulators, test developers, and clinicians. Easing the burden of performing the aforementioned tasks greatly extends the range of practical application. Large clinical trials, precision medicine, and regulatory submissions require a set of agreed upon standards that ensures efficient communication and documentation of genomic analyses. The BioCompute paradigm and the resulting BioCompute Objects (BCOs) offer that standard and are freely accessible as a GitHub organization (https://github.com/biocompute-objects) following the “Open-Stand.org principles for collaborative open standards development.” With high-throughput sequencing (HTS) studies communicated using a BCO, regulatory agencies (e.g., Food and Drug Administration [FDA]), diagnostic test developers, researchers, and clinicians can expand collaboration to drive innovation in precision medicine, potentially decreasing the time and cost associated with next-generation sequencing workflow exchange, reporting, and regulatory reviews.

]]>
<![CDATA[SFPEL-LPI: Sequence-based feature projection ensemble learning for predicting LncRNA-protein interactions]]> https://www.researchpad.co/article/5c196699d5eed0c484b52590

LncRNA-protein interactions play important roles in post-transcriptional gene regulation, poly-adenylation, splicing and translation. Identification of lncRNA-protein interactions helps to understand lncRNA-related activities. Existing computational methods utilize multiple lncRNA features or multiple protein features to predict lncRNA-protein interactions, but features are not available for all lncRNAs or proteins; most of existing methods are not capable of predicting interacting proteins (or lncRNAs) for new lncRNAs (or proteins), which don’t have known interactions. In this paper, we propose the sequence-based feature projection ensemble learning method, “SFPEL-LPI”, to predict lncRNA-protein interactions. First, SFPEL-LPI extracts lncRNA sequence-based features and protein sequence-based features. Second, SFPEL-LPI calculates multiple lncRNA-lncRNA similarities and protein-protein similarities by using lncRNA sequences, protein sequences and known lncRNA-protein interactions. Then, SFPEL-LPI combines multiple similarities and multiple features with a feature projection ensemble learning frame. In computational experiments, SFPEL-LPI accurately predicts lncRNA-protein associations and outperforms other state-of-the-art methods. More importantly, SFPEL-LPI can be applied to new lncRNAs (or proteins). The case studies demonstrate that our method can find out novel lncRNA-protein interactions, which are confirmed by literature. Finally, we construct a user-friendly web server, available at http://www.bioinfotech.cn/SFPEL-LPI/.

]]>
<![CDATA[BRCA Challenge: BRCA Exchange as a global resource for variants in BRCA1 and BRCA2]]> https://www.researchpad.co/article/5c2d2eb3d5eed0c484d9b2c0

The BRCA Challenge is a long-term data-sharing project initiated within the Global Alliance for Genomics and Health (GA4GH) to aggregate BRCA1 and BRCA2 data to support highly collaborative research activities. Its goal is to generate an informed and current understanding of the impact of genetic variation on cancer risk across the iconic cancer predisposition genes, BRCA1 and BRCA2. Initially, reported variants in BRCA1 and BRCA2 available from public databases were integrated into a single, newly created site, www.brcaexchange.org. The purpose of the BRCA Exchange is to provide the community with a reliable and easily accessible record of variants interpreted for a high-penetrance phenotype. More than 20,000 variants have been aggregated, three times the number found in the next-largest public database at the project’s outset, of which approximately 7,250 have expert classifications. The data set is based on shared information from existing clinical databases—Breast Cancer Information Core (BIC), ClinVar, and the Leiden Open Variation Database (LOVD)—as well as population databases, all linked to a single point of access. The BRCA Challenge has brought together the existing international Evidence-based Network for the Interpretation of Germline Mutant Alleles (ENIGMA) consortium expert panel, along with expert clinicians, diagnosticians, researchers, and database providers, all with a common goal of advancing our understanding of BRCA1 and BRCA2 variation. Ongoing work includes direct contact with national centers with access to BRCA1 and BRCA2 diagnostic data to encourage data sharing, development of methods suitable for extraction of genetic variation at the level of individual laboratory reports, and engagement with participant communities to enable a more comprehensive understanding of the clinical significance of genetic variation in BRCA1 and BRCA2.

]]>
<![CDATA[Genetic exchanges are more frequent in bacteria encoding capsules]]> https://www.researchpad.co/article/5c269763d5eed0c48470f652

Capsules allow bacteria to colonize novel environments, to withstand numerous stresses, and to resist antibiotics. Yet, even though genetic exchanges with other cells should be adaptive under such circumstances, it has been suggested that capsules lower the rates of homologous recombination and horizontal gene transfer. We analysed over one hundred pan-genomes and thousands of bacterial genomes for the evidence of an association between genetic exchanges (or lack thereof) and the presence of a capsule system. We found that bacteria encoding capsules have larger pan-genomes, higher rates of horizontal gene transfer, and higher rates of homologous recombination in their core genomes. Accordingly, genomes encoding capsules have more plasmids, conjugative elements, transposases, prophages, and integrons. Furthermore, capsular loci are frequent in plasmids, and can be found in prophages. These results are valid for Bacteria, independently of their ability to be naturally transformable. Since we have shown previously that capsules are commonly present in nosocomial pathogens, we analysed their co-occurrence with antibiotic resistance genes. Genomes encoding capsules have more antibiotic resistance genes, especially those encoding efflux pumps, and they constitute the majority of the most worrisome nosocomial bacteria. We conclude that bacteria with capsule systems are more genetically diverse and have fast-evolving gene repertoires, which may further contribute to their success in colonizing novel niches such as humans under antibiotic therapy.

]]>
<![CDATA[Proteotyping bacteria: Characterization, differentiation and identification of pneumococcus and other species within the Mitis Group of the genus Streptococcus by tandem mass spectrometry proteomics]]> https://www.researchpad.co/article/5c181340d5eed0c484774934

A range of methodologies may be used for analyzing bacteria, depending on the purpose and the level of resolution needed. The capability for recognition of species distinctions within the complex spectrum of bacterial diversity is necessary for progress in microbiological research. In clinical settings, accurate, rapid and cost-effective methods are essential for early and efficient treatment of infections. Characterization and identification of microorganisms, using, bottom-up proteomics, or “proteotyping”, relies on recognition of species-unique or associated peptides, by tandem mass spectrometry analyses, dependent upon an accurate and comprehensive foundation of genome sequence data, allowing for differentiation of species, at amino acid-level resolution. In this study, the high resolution and accuracy of MS/MS-based proteotyping was demonstrated, through analyses of the three phylogenetically and taxonomically most closely-related species of the Mitis Group of the genus Streptococcus: i.e., the pathogenic species, Streptococcus pneumoniae (pneumococcus), and the commensal species, Streptococcus pseudopneumoniae and Streptococcus mitis. To achieve high accuracy, a genome sequence database used for matching peptides was created and carefully curated. Here, MS-based, bottom-up proteotyping was observed and confirmed to attain the level of resolution necessary for differentiating and identifying the most-closely related bacterial species, as demonstrated by analyses of species of the Streptococcus Mitis Group, even when S. pneumoniae were mixed with S. pseudopneumoniae and S. mitis, by matching and identifying more than 200 unique peptides for each species.

]]>
<![CDATA[An improved DNA array-based classification method for the identification of Salmonella serotypes shows high concordance between traditional and genotypic testing]]> https://www.researchpad.co/article/5c1028fed5eed0c484248bb4

Previously we developed and tested the Salmonella GenoSerotyping Array (SGSA), which utilized oligonucleotide probes for O- and H- antigen biomarkers to perform accurate molecular serotyping of 57 Salmonella serotypes. Here we describe the development and validation of the ISO 17025 accredited second version of the SGSA (SGSA v. 2) with reliable and unambiguous molecular serotyping results for 112 serotypes of Salmonella which were verified both in silico and in vitro. Improvements included an expansion of the probe sets along with a new classifier tool for prediction of individual antigens and overall serotype from the array probe intensity results. The array classifier and probe sequences were validated in silico to high concordance using 36,153 draft genomes of diverse Salmonella serotypes assembled from public repositories. We obtained correct and unambiguous serotype assignments for 31,924 (88.30%) of the tested samples and a further 3,916 (10.83%) had fully concordant antigen predictions but could not be assigned to a single serotype. The SGSA v. 2 can directly use bacterial colonies with a limit of detection of 860 CFU/mL or purified DNA template at a concentration of 1.0 x 10−1 ng/μl. The SGSA v. 2 was also validated in the wet laboratory and certified using panel of 406 samples representing 185 different serotypes with correct antigen and serotype determinations for 60.89% of the panel and 18.31% correctly identified but an ambiguous overall serotype determination.

]]>
<![CDATA[Metabolic models and gene essentiality data reveal essential and conserved metabolism in prokaryotes]]> https://www.researchpad.co/article/5bf86f5ed5eed0c48405a937

Essential metabolic reactions are shaping constituents of metabolic networks, enabling viable and distinct phenotypes across diverse life forms. Here we analyse and compare modelling predictions of essential metabolic functions with experimental data and thereby identify core metabolic pathways in prokaryotes. Simulations of 15 manually curated genome-scale metabolic models were integrated with 36 large-scale gene essentiality datasets encompassing a wide variety of species of bacteria and archaea. Conservation of metabolic genes was estimated by analysing 79 representative genomes from all the branches of the prokaryotic tree of life. We find that essentiality patterns reflect phylogenetic relations both for modelling and experimental data, which correlate highly at the pathway level. Genes that are essential for several species tend to be highly conserved as opposed to non-essential genes which may be conserved or not. The tRNA-charging module is highlighted as ancestral and with high centrality in the networks, followed closely by cofactor metabolism, pointing to an early information processing system supplied by organic cofactors. The results, which point to model improvements and also indicate faults in the experimental data, should be relevant to the study of centrality in metabolic networks and ancient metabolism but also to metabolic engineering with prokaryotes.

]]>
<![CDATA[Copy number variation in the susceptibility to systemic lupus erythematosus]]> https://www.researchpad.co/article/5c0841d2d5eed0c484fcad6f

Systemic lupus erythematosus (SLE) is an autoimmune disease with a strong genetic component and etiology characterized by chronic inflammation and autoantibody production. The purpose of this study was to ascertain copy number variation (CNV) in SLE using a case-control design in an admixed Brazilian population. The whole-genome detection of CNV was performed using Cytoscan HD array in SLE patients and healthy controls. The best CNV candidates were then evaluated by quantitative real-time PCR in a larger cohort or validated using droplet digital PCR. Logistic regression models adjusted for sex and ancestry covariates was applied to evaluate the association between CNV with SLE susceptibility. The data showed a synergistic effect between the FCGR3B and ADAM3A loci with the presence of deletions in both loci significantly increasing the risk to SLE (5.9-fold) compared to the deletion in the single FCGR3B locus (3.6-fold). In addition, duplications in these genes were indeed more frequent in healthy subjects, suggesting that high FCGR3B/ADAM3A gene copy numbers are protective factors against to disease development. Overall, 21 rare CNVs were identified in SLE patients using a four-step pipeline created for identification of rare variants. Furthermore, heterozygous deletions overlapping the CFHR4, CFHR5 and HLA-DPB2 genes were described for the first time in SLE patients. Here we present the first genome-wide CNV study of SLE patients in a tri-hybrid population. The results show that novel susceptibility loci to SLE can be found once the distribution of structural variants is analyzed throughout the whole genome.

]]>
<![CDATA[Genome-wide analysis of ATP binding cassette (ABC) transporters in tomato]]> https://www.researchpad.co/article/5b694664463d7e3867f4ad09

ATP binding cassette (ABC) transporters are proteins that actively mediate the transport of a wide range of molecules, such as organic acids, metal ions, phytohormones and secondary metabolites. Therefore, ABC transporters must play indispensable roles in growth and development of tomato, including fruit development. Most ABC transporters have transmembrane domains (TMDs) and belong to the ABC protein family, which includes not only ABC transporters but also soluble ABC proteins lacking TMDs. In this study, we performed a genome-wide identification and expression analysis of genes encoding ABC proteins in tomato (Solanum lycopersicum), which is a valuable horticultural crop and a model plant for studying fleshy fruits. In the tomato genome, a total of 154 genes putatively encoding ABC transporters, including 9 ABCAs, 29 ABCBs, 26 ABCCs, 2 ABCDs, 2 ABCEs, 6 ABCFs, 70 ABCGs and 10 ABCIs, were identified. Gene expression data from the eFP Browser and reverse transcription-semi-quantitative PCR analysis revealed their tissue-specific and development-specific expression profiles. This work suggests physiological roles of ABC transporters in tomato and provides fundamental information for future studies of ABC transporters not only in tomato but also in other Solanaceae species.

]]>