ResearchPad - data-and-text-mining https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[The bioinformatics wealth of nations]]> https://www.researchpad.co/article/N5da40a0e-3f32-497d-bfe5-58f85b07ab10 <![CDATA[Discovery of disease- and drug-specific pathways through community structures of a literature network]]> https://www.researchpad.co/article/N736cb11d-1b4c-4a17-a746-c10702cd5baa

Abstract

Motivation

In light of the massive growth of the scientific literature, text mining is increasingly used to extract biological pathways. Though multiple tools explore individual connections between genes, diseases and drugs, few extensively synthesize pathways for specific diseases and drugs.

Results

Through community detection of a literature network, we extracted 3444 functional gene groups that represented biological pathways for specific diseases and drugs. The network linked Medical Subject Headings (MeSH) terms of genes, diseases and drugs that co-occurred in publications. The resulting communities detected highly associated genes, diseases and drugs. These significantly matched current knowledge of biological pathways and predicted future ones in time-stamped experiments. Likewise, disease- and drug-specific communities also recapitulated known pathways for those given diseases and drugs. Moreover, diseases sharing communities had high comorbidity with each other and drugs sharing communities had many common side effects, consistent with related mechanisms. Indeed, the communities robustly recovered mutual targets for drugs [area under Receiver Operating Characteristic curve (AUROC)=0.75] and shared pathogenic genes for diseases (AUROC=0.82). These data show that literature communities inform not only just known biological processes but also suggest novel disease- and drug-specific mechanisms that may guide disease gene discovery and drug repurposing.

Availability and implementation

Application tools are available at http://meteor.lichtargelab.org.

Supplementary information

Supplementary data are available at Bioinformatics online.

]]>
<![CDATA[Cell membrane proteins with high N-glycosylation, high expression and multiple interaction partners are preferred by mammalian viruses as receptors]]> https://www.researchpad.co/article/N79d675d7-8a6b-4e0f-bdef-685f1e1ca9a0

Abstract

Motivation

Receptor mediated entry is the first step for viral infection. However, the question of how viruses select receptors remains unanswered.

Results

Here, by manually curating a high-quality database of 268 pairs of mammalian virus–host receptor interaction, which included 128 unique viral species or sub-species and 119 virus receptors, we found the viral receptors are structurally and functionally diverse, yet they had several common features when compared to other cell membrane proteins: more protein domains, higher level of N-glycosylation, higher ratio of self-interaction and more interaction partners, and higher expression in most tissues of the host. This study could deepen our understanding of virus–receptor interaction.

Availability and implementation

The database of mammalian virus–host receptor interaction is available at http://www.computationalbiology.cn: 5000/viralReceptor.

Supplementary information

Supplementary data are available at Bioinformatics online.

]]>
<![CDATA[Detection of sputum by interpreting the time-frequency distribution of respiratory sound signal using image processing techniques]]> https://www.researchpad.co/article/5c8ef0bcd5eed0c484f03eea

Abstract

Motivation

Sputum in the trachea is hard to expectorate and detect directly for the patients who are unconscious, especially those in Intensive Care Unit. Medical staff should always check the condition of sputum in the trachea. This is time-consuming and the necessary skills are difficult to acquire. Currently, there are few automatic approaches to serve as alternatives to this manual approach.

Results

We develop an automatic approach to diagnose the condition of the sputum. Our approach utilizes a system involving a medical device and quantitative analytic methods. In this approach, the time-frequency distribution of respiratory sound signals, determined from the spectrum, is treated as an image. The sputum detection is performed by interpreting the patterns in the image through the procedure of preprocessing and feature extraction. In this study, 272 respiratory sound samples (145 sputum sound and 127 non-sputum sound samples) are collected from 12 patients. We apply the method of leave-one out cross-validation to the 12 patients to assess the performance of our approach. That is, out of the 12 patients, 11 are randomly selected and their sound samples are used to predict the sound samples in the remaining one patient. The results show that our automatic approach can classify the sputum condition at an accuracy rate of 83.5%.

Availability and implementation

The matlab codes and examples of datasets explored in this work are available at Bioinformatics online.

Supplementary information

Supplementary data are available at Bioinformatics online.

]]>
<![CDATA[dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering]]> https://www.researchpad.co/article/5afad25f463d7e2f2a373cc4

Summary: dendextend is an R package for creating and comparing visually appealing tree diagrams. dendextend provides utility functions for manipulating dendrogram objects (their color, shape and content) as well as several advanced methods for comparing trees to one another (both statistically and visually). As such, dendextend offers a flexible framework for enhancing R's rich ecosystem of packages for performing hierarchical clustering of items.

Availability and implementation: The dendextend R package (including detailed introductory vignettes) is available under the GPL-2 Open Source license and is freely available to download from CRAN at: (http://cran.r-project.org/package=dendextend)

Contact: Tal.Galili@math.tau.ac.il

]]>