ResearchPad - database-tool https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[CellExpress: a comprehensive microarray-based cancer cell line and clinical sample gene expression analysis online system]]> https://www.researchpad.co/article/elastic_article_7258 With the advancement of high-throughput technologies, gene expression profiles in cell lines and clinical samples are widely available in the public domain for research. However, a challenge arises when trying to perform a systematic and comprehensive analysis across independent datasets. To address this issue, we developed a web-based system, CellExpress, for analyzing the gene expression levels in more than 4000 cancer cell lines and clinical samples obtained from public datasets and user-submitted data. First, a normalization algorithm can be utilized to reduce the systematic biases across independent datasets. Next, a similarity assessment of gene expression profiles can be achieved through a dynamic dot plot, along with a distance matrix obtained from principal component analysis. Subsequently, differentially expressed genes can be visualized using hierarchical clustering. Several statistical tests and analytical algorithms are implemented in the system for dissecting gene expression changes based on the groupings defined by users. Lastly, users are able to upload their own microarray and/or next-generation sequencing data to perform a comparison of their gene expression patterns, which can help classify user data, such as stem cells, into different tissue types. In conclusion, CellExpress is a user-friendly tool that provides a comprehensive analysis of gene expression levels in both cell lines and clinical samples. The website is freely available at http://cellexpress.cgm.ntu.edu.tw/. Source code is available at https://github.com/LeeYiFang/Carkinos under the MIT License.

Database URL: http://cellexpress.cgm.ntu.edu.tw/

]]>
<![CDATA[Circad: a comprehensive manually curated resource of circular RNA associated with diseases]]> https://www.researchpad.co/article/Nb44c0e71-3ce4-41d9-b5fb-ea93c381452c

Abstract

Circular RNAs (circRNAs) are unique transcript isoforms characterized by back splicing of exon ends to form a covalently closed loop or circular conformation. These transcript isoforms are now known to be expressed in a variety of organisms across the kingdoms of life. Recent studies have shown the role of circRNAs in a number of diseases and increasing evidence points to their potential application as biomarkers in these diseases. We have created a comprehensive manually curated database of circular RNAs associated with diseases. This database is available at URL http://clingen.igib.res.in/circad/. The Database lists more than 1300 circRNAs associated with 150 diseases and mapping to 113 International Statistical Classification of Diseases (ICD) codes with evidence of association linked to published literature. The database is unique in many ways. Firstly, it provides ready-to-use primers to work with, in order to use circRNAs as biomarkers or to perform functional studies. It additionally lists the assay and PCR primer details including experimentally validated ones as a ready reference to researchers along with fold change and statistical significance. It also provides standard disease nomenclature as per the ICD codes. To the best of our knowledge, circad is the most comprehensive and updated database of disease associated circular RNAs.

Availability: http://clingen.igib.res.in/circad/

]]>
<![CDATA[AcetoBase: a functional gene repository and database for formyltetrahydrofolate synthetase sequences]]> https://www.researchpad.co/article/Nc22c6e73-0465-44b3-b576-0197490edf54

Abstract

Acetogenic bacteria are imperative to environmental carbon cycling and diverse biotechnological applications, but their extensive physiological and taxonomical diversity is an impediment to systematic taxonomic studies. Acetogens are chemolithoautotrophic bacteria that perform reductive carbon fixation under anaerobic conditions through the Wood–Ljungdahl pathway (WLP)/acetyl-coenzyme A pathway. The gene-encoding formyltetrahydrofolate synthetase (FTHFS), a key enzyme of this pathway, is highly conserved and can be used as a molecular marker to probe acetogenic communities. However, there is a lack of systematic collection of FTHFS sequence data at nucleotide and protein levels. In an attempt to streamline investigations on acetogens, we developed AcetoBase - a repository and database for systematically collecting and organizing information related to FTHFS sequences. AcetoBase also provides an opportunity to submit data and obtain accession numbers, perform homology searches for sequence identification and access a customized blast database of submitted sequences. AcetoBase provides the prospect to identify potential acetogenic bacteria, based on metadata information related to genome content and the WLP, supplemented with FTHFS sequence accessions, and can be an important tool in the study of acetogenic communities. AcetoBase can be publicly accessed at https://acetobase.molbio.slu.se.

]]>
<![CDATA[UPObase: an online database of unspecific peroxygenases]]> https://www.researchpad.co/article/N995456d9-4831-4a2b-b002-c2336d3cb42f

Abstract

There are many unspecific peroxygenases (UPOs) or UPO-like extracellular enzymes secreted by fungal species. These enzymes are considered special in their ways of catalyzing a wide variety of reactions such as epoxidation, peroxygenation and electron oxidations. This enzyme family exhibits diverse functions with thousands of UPOs and UPO-like sequences. These sequences are difficult to analyze without proper management tool and therefore desperately calls for a unified platform that can aide with annotation, classification, navigation and easy sequence retrieval. This prompted us to create an online database called Unspecific Peroxygenase Database (UPObase) (upobase.bioinformaticsreview.com) which currently includes 1948 peroxygenase-encoding protein sequences mined from more than 800 available fungal genomes. It provides information such as classification and motifs about each sequence and has functions such as homology search against UPObase sequence analyses such as multiple sequence alignments and phylogenetic trees. It also provides a new sequence submission facility. The database has been made user-friendly facilitating systematic search and filters. UPObase allows users to search for the sequences by organism name, cluster ID and accession number. Notably, in our previous study, 113 UPOs were classified into five subfamilies (I, II, III, IV and V) and an undetermined group (Pog) which remain established. In this study, using 1948 UPOs in our database, we were able to further identify six novel sub-superfamilies (Pog-a, Pog-b, Pog-c, Pog-d, Pog-e and Pog-f) with signature motifs and two distinct groups in Subfamily I and III, Ia and Ib, IIIa and IIIb, respectively. With the novel UPO-like sequences and classification, UPObase may serve for researchers working in the area of enzyme engineering and related fields.

]]>
<![CDATA[SELER: a database of super-enhancer-associated lncRNA- directed transcriptional regulation in human cancers]]> https://www.researchpad.co/article/5c929c46d5eed0c484385abc

Abstract

Super-enhancers (SEs) are enriched with a cluster of mediator binding sites, which are major contributors to cell-type-specific gene expression. Currently, a large quantity of long non-coding RNAs has been found to be transcribed from or to interact with SEs, which constitute super-enhancer associated long non-coding RNAs (SE-lncRNAs). These SE-lncRNAs play essential roles in transcriptional regulation through controlling SEs activity to regulate a broad range of physiological and pathological processes, especially tumorigenesis. However, the pathological functions of SE-lncRNAs in tumorigenesis are still obscure. In this paper, we characterized 5056 SE-lncRNAs and their associated genes by analysing 102 SE data sets. Then, we analysed their expression profiles and prognostic information derived from 19 cancer types to identify cancer-related SE-lncRNAs and to explore their potential functions. In total, 436 significantly differentially expressed SE-lncRNAs and 2035 SE-lncRNAs with high prognostic values were identified. Additionally, 3935 significant correlations between SE-lncRNAs and their regulatory genes were further validated by calculating their correlation coefficients in each cancer type. Finally, the SELER database incorporating the aforementioned data was provided for users to explore their physiological and pathological functions to comprehensively understand the blocks of living systems.

]]>
<![CDATA[PKAD: a database of experimentally measured pKa values of ionizable groups in proteins]]> https://www.researchpad.co/article/5c929beed5eed0c48438434d

Abstract

Ionizable residues play key roles in many biological phenomena including protein folding, enzyme catalysis and binding. We present PKAD, a database of experimentally measured pKas of protein residues reported in the literature or taken from existing databases. The database contains pKa data for 1350 residues in 157 wild-type proteins and for 232 residues in 45 mutant proteins. Most of these values are for Asp, Glu, His and Lys amino acids. The database is available as downloadable file as well as a web server (http://compbio.clemson.edu/pkad). The PKAD database can be used as a benchmarking source for development and improvement of pKa’s prediction methods. The web server provides additional information taken from the corresponding structures and amino acid sequences, which allows for easy search and grouping of the experimental pKas according to various biophysical characteristics, amino acid type and others.

]]>
<![CDATA[LIVE: a manually curated encyclopedia of experimentally validated interactions of lncRNAs]]> https://www.researchpad.co/article/5c82b74bd5eed0c484e611b6

Abstract

Advances in studies of long noncoding RNAs (lncRNAs) have provided data regarding the regulatory roles of lncRNAs, which perform functional roles through interactions with other functional elements. To track the underlying relationships among lncRNAs, various databases have been developed as repositories for lncRNA data. However, the ability to comprehensively explore the diverse interactions between lncRNAs and other functional elements is limited. To this end, we developed LIVE (LncRNA Interaction Validated Encyclopaedia), an interactive resource to integrate the diverse interactions of functional elements with lncRNAs. LIVE is a manually curated database of experimentally validated interactions of lncRNAs with genes, proteins and other various functional elements. By mining publications, we constructed LIVE with the following three interaction networks: a binding interaction network, a regulation network and a disease network; then, we combined them to form a comprehensive lncRNA interaction network. The current release of LIVE contains the validated interactions of 572 lncRNAs in humans and mice with 103 proteins, 209 genes, 56 transcription factors and 194 diseases. LIVE provides an interactive interface with charts and figures to aid users in searching and browsing interactions with lncRNAs. LIVE will greatly facilitate further investigation into the regulatory roles of lncRNAs and is freely available.

]]>
<![CDATA[ImmunoSPdb: an archive of immunosuppressive peptides]]> https://www.researchpad.co/article/5c801157d5eed0c484a9900a

Abstract

Immunosuppression proved as a captivating therapy in several autoimmune disorders, asthma as well as in organ transplantation. Immunosuppressive peptides are specific for reducing efficacy of immune system with wide range of therapeutic implementations. `ImmunoSPdb’ is a comprehensive, manually curated database of around 500 experimentally verified immunosuppressive peptides compiled from 79 research article and 32 patents. The current version comprises of 553 entries providing extensive information including peptide name, sequence, chirality, chemical modification, origin, nature of peptide, its target as well as mechanism of action, amino acid frequency and composition, etc. Data analysis revealed that most of the immunosuppressive peptides are linear (91%), are shorter in length i.e. up to 20 amino acids (62%) and have L form of amino acids (81%). About 30% peptide are either chemically modified or have end terminal modification. Most of the peptides either are derived from proteins (41%) or naturally (27%) exist. Blockage of potassium ion channel (24%) is one a major target for immunosuppressive peptides. In addition, we have annotated tertiary structure by using PEPstrMOD and I-TASSER. Many user-friendly, web-based tools have been integrated to facilitate searching, browsing and analyzing the data. We have developed a user-friendly responsive website to assist a wide range of users.

]]>
<![CDATA[IsopiRBank: a research resource for tracking piRNA isoforms]]> https://www.researchpad.co/article/5c08b660d5eed0c48414d27c

Abstract

PIWI-interacting RNAs (piRNAs) are essential for transcriptional and post-transcriptional regulation of transposons and coding genes in germline. With the development of sequencing technologies, length variations of piRNAs have been identified in several species. However, the extent to which, piRNA isoforms exist, and whether these isoforms are functionally distinct from canonical piRNAs remain uncharacterized. Through data mining from 2154 datasets of small RNA sequencing data from four species (Homo sapiens, Mus musculus, Danio rerio and Drosophila melanogaster), we have identified 8 749 139 piRNA isoforms from 175 454 canonical piRNAs, and classified them on the basis of variations on 5′ or 3′ end via the alignment of isoforms with canonical sequence. We thus established a database named IsopiRBank. Each isoforms has detailed annotation as follows: normalized expression data, classification, spatiotemporal expression data and genome origin. Users can also select interested isoforms for further analysis, including target prediction and Enrichment analysis. Taken together, IsopiRBank is an interactive database that aims to present the first integrated resource of piRNA isoforms, and broaden the research of piRNA biology. IsopiRBank can be accessed at http://mcg.ustc.edu.cn/bsc/isopir/index.html without any registration or log in requirement.

Database URL: http://mcg.ustc.edu.cn/bsc/isopir/index.html

]]>
<![CDATA[SDADB: a functional annotation database of protein structural domains]]> https://www.researchpad.co/article/5c08b65ed5eed0c48414d26d

Abstract

Annotating functional terms with individual domains is essential for understanding the functions of full-length proteins. We describe SDADB, a functional annotation database for structural domains. SDADB provides associations between gene ontology (GO) terms and SCOP domains calculated with an integrated framework. GO annotations are assigned probabilities of being correct, which are estimated with a Bayesian network by taking advantage of structural neighborhood mappings, SCOP-InterPro domain mapping information, position-specific scoring matrices (PSSMs) and sequence homolog features, with the most substantial contribution coming from high-coverage structure-based domain-protein mappings. The domain-protein mappings are computed using large-scale structure alignment. SDADB contains ontological terms with probabilistic scores for more than 214 000 distinct SCOP domains. It also provides additional features include 3D structure alignment visualization, GO hierarchical tree view, search, browse and download options.

Database URL: http://sda.denglab.org

]]>
<![CDATA[LnChrom: a resource of experimentally validated lncRNA–chromatin interactions in human and mouse]]> https://www.researchpad.co/article/5c024085d5eed0c4843a6fe4

Abstract

Long non-coding RNAs (lncRNAs) constitute an important layer of chromatin regulation that contributes to various biological processes and diseases. By interacting with chromatin, many lncRNAs can regulate that state of chromatin by recruiting chromatin-modifying complexes and thus control large-scale gene expression programs. However, the available information on interactions between lncRNAs and chromatin is hidden in a large amount of dispersed literature and has not been extensively collected. We established the LnChrom database, a manually curated resource of experimentally validated lncRNA–chromatin interactions. The current release of LnChrom includes 382 743 interactions in human and mouse. We also manually collected detailed metadata for each interaction pair, including those of chromatin modifying factors, epigenetic marks and disease associations. LnChrom provides a user-friendly interface to facilitate browsing, searching and retrieving of lncRNA–chromatin interaction data. Additionally, a large amount of multi-omics data was integrated into LnChrom to aid in characterizing the effects of lncRNA–chromatin interactions on epigenetic modifications and transcriptional expression. We believe that LnChrom is a timely and valuable resource that can greatly motivate mechanistic research into lncRNAs.

Database URL: http://biocc.hrbmu.edu.cn/LnChrom/

]]>
<![CDATA[Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text]]> https://www.researchpad.co/article/5afec618463d7e1bbc374519

Drug toxicity is a major concern for both regulatory agencies and the pharmaceutical industry. In this context, text-mining methods for the identification of drug side effects from free text are key for the development of up-to-date knowledge sources on drug adverse reactions. We present a new system for identification of drug side effects from the literature that combines three approaches: machine learning, rule- and knowledge-based approaches. This system has been developed to address the Task 3.B of Biocreative V challenge (BC5) dealing with Chemical-induced Disease (CID) relations. The first two approaches focus on identifying relations at the sentence-level, while the knowledge-based approach is applied both at sentence and abstract levels. The machine learning method is based on the BeFree system using two corpora as training data: the annotated data provided by the CID task organizers and a new CID corpus developed by crowdsourcing. Different combinations of results from the three strategies were selected for each run of the challenge. In the final evaluation setting, the system achieved the highest Recall of the challenge (63%). By performing an error analysis, we identified the main causes of misclassifications and areas for improving of our system, and highlighted the need of consistent gold standard data sets for advancing the state of the art in text mining of drug side effects.

Database URL: https://zenodo.org/record/29887?ln¼en#.VsL3yDLWR_V

]]>
<![CDATA[Co-LncRNA: investigating the lncRNA combinatorial effects in GO annotations and KEGG pathways based on human RNA-Seq data]]> https://www.researchpad.co/article/5af0f5a5463d7e2c5d7cd3cf

Long non-coding RNAs (lncRNAs) are emerging as key regulators of diverse biological processes and diseases. However, the combinatorial effects of these molecules in a specific biological function are poorly understood. Identifying co-expressed protein-coding genes of lncRNAs would provide ample insight into lncRNA functions. To facilitate such an effort, we have developed Co-LncRNA, which is a web-based computational tool that allows users to identify GO annotations and KEGG pathways that may be affected by co-expressed protein-coding genes of a single or multiple lncRNAs. LncRNA co-expressed protein-coding genes were first identified in publicly available human RNA-Seq datasets, including 241 datasets across 6560 total individuals representing 28 tissue types/cell lines. Then, the lncRNA combinatorial effects in a given GO annotations or KEGG pathways are taken into account by the simultaneous analysis of multiple lncRNAs in user-selected individual or multiple datasets, which is realized by enrichment analysis. In addition, this software provides a graphical overview of pathways that are modulated by lncRNAs, as well as a specific tool to display the relevant networks between lncRNAs and their co-expressed protein-coding genes. Co-LncRNA also supports users in uploading their own lncRNA and protein-coding gene expression profiles to investigate the lncRNA combinatorial effects. It will be continuously updated with more human RNA-Seq datasets on an annual basis. Taken together, Co-LncRNA provides a web-based application for investigating lncRNA combinatorial effects, which could shed light on their biological roles and could be a valuable resource for this community.

Database URL: http://www.bio-bigdata.com/Co-LncRNA/

]]>