ResearchPad - Software https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[ggroups: an R package for pedigree and genetic groups data]]> https://www.researchpad.co/article/elastic_article_12071 R is a multi-platform statistical software and an object oriented programming language. The package archive network for R provides CRAN repository that features over 15,000 free open source packages, at the time of writing this article (https://cran.r-project.org/web/packages, accessed in October 2019). The package ggroups is introduced in this article. The purpose of this package is providing functions for checking and processing the pedigree, calculation of the additive genetic relationship matrix and its inverse, which are used to study the population structure and predicting the genetic merit of animals. Calculation of the dominance relationship matrix and its inverse are also covered. A concept in animal breeding is genetic groups, which is about the inequality of the average genetic merits for groups of unknown parents. The package provides functions for the calculation of the matrix of genetic group contributions (Q). Calculating Q is computationally demanding, and depending on the size of the pedigree and the number of genetic groups, it might not be feasible using personal computers. Therefore, a computationally optimised function and its parallel processing alternative are provided in the package.ResultsUsing sample data, outputs from different functions of the package were presented to illustrate a real experience of working with the package.ConclusionsThe presented R package is a free and open source tool mainly for quantitative geneticists and ecologists, who deal with pedigree data. It provides numerous functions for handling pedigree data, and calculating various pedigree-based matrices. Some of the functions are computationally optimised for large-scale data. ]]> <![CDATA[Visualize omics data on networks with Omics Visualizer, a Cytoscape App]]> https://www.researchpad.co/article/elastic_article_10953 Cytoscape is an open-source software used to analyze and visualize biological networks. In addition to being able to import networks from a variety of sources, Cytoscape allows users to import tabular node data and visualize it onto networks. Unfortunately, such data tables can only contain one row of data per node, whereas omics data often have multiple rows for the same gene or protein, representing different post-translational modification sites, peptides, splice isoforms, or conditions. Here, we present a new app, Omics Visualizer, that allows users to import data tables with several rows referring to the same node, connect them to one or more networks, and visualize the connected data onto networks. Omics Visualizer uses the Cytoscape enhancedGraphics app to show the data either in the nodes (pie visualization) or around the nodes (donut visualization), where the colors of the slices represent the imported values. If the user does not provide a network, the app can retrieve one from the STRING database using the Cytoscape stringApp. The Omics Visualizer app is freely available at https://apps.cytoscape.org/apps/omicsvisualizer.

]]>
<![CDATA[NetConfer: a web application for comparative analysis of multiple biological networks]]> https://www.researchpad.co/article/elastic_article_9738 Most biological experiments are inherently designed to compare changes or transitions of state between conditions of interest. The advancements in data intensive research have in particular elevated the need for resources and tools enabling comparative analysis of biological data. The complexity of biological systems and the interactions of their various components, such as genes, proteins, taxa, and metabolites, have been inferred, represented, and visualized via graph theory-based networks. Comparisons of multiple networks can help in identifying variations across different biological systems, thereby providing additional insights. However, while a number of online and stand-alone tools exist for generating, analyzing, and visualizing individual biological networks, the utility to batch process and comprehensively compare multiple networks is limited.ResultsHere, we present a graphical user interface (GUI)-based web application which implements multiple network comparison methodologies and presents them in the form of organized analysis workflows. Dedicated comparative visualization modules are provided to the end-users for obtaining easy to comprehend, insightful, and meaningful comparisons of various biological networks. We demonstrate the utility and power of our tool using publicly available microbial and gene expression data.ConclusionNetConfer tool is developed keeping in mind the requirements of researchers working in the field of biological data analysis with limited programming expertise. It is also expected to be useful for advanced users from biological as well as other domains (working with association networks), benefiting from provided ready-made workflows, as they allow to focus directly on the results without worrying about the implementation. While the web version allows using this application without installation and dependency requirements, a stand-alone version has also been supplemented to accommodate the offline requirement of processing large networks. ]]> <![CDATA[MRI perfusion analysis using freeware, standard imaging software]]> https://www.researchpad.co/article/elastic_article_9559 Perfusion-weighted imaging is only scarcely used in veterinary medicine. The exact reasons are unclear. One reason might be the typically high costs of the software packages for image analysis. In addition, a great variability concerning available programs makes it hard to compare results between different studies. Moreover, these algorithms are tuned for their usage in human medicine and often difficult to adapt to veterinary studies.In order to address these issues, our aim is to deliver a free open source package for calculating quantitative perfusion parameters. We develop an “R package” calculating mean transit time, cerebral blood flow and cerebral blood volume from data obtained with freely imaging software (OsiriX Light®). We hope that the free availability, in combination with the fact that the underlying algorithm is open and adaptable, makes it easier for scientists in veterinary medicine to use, compare and adapt perfusion-weighted imaging analysis.In order to demonstrate the usage of our software package, we reviewed previously acquired perfusion-weighted images from a group of eight purpose-breed healthy beagle dogs and twelve client-owned dogs with idiopathic epilepsy. In order to obtain the data needed for our algorithm, the following steps were performed: First, regions of interest (ROI) were drawn around different, previously reported, brain regions and the middle cerebral artery. Second, a ROI enhancement curve was generated for each ROI using a freely available PlugIn. Third, the signal intensity curves were exported as a comma-separated-value file. These files constitute the input to our software package, which then calculates the PWI parameters.ResultsWe used our software package to re-assess perfusion weighted images from two previous studies. The clinical results were similar, showing a significant increase in the mean transit time and a significant decrease in cerebral blood flow for diseased dogs.ConclusionWe provide an “R package” for computing the main perfusion parameters from measurements taken with standard imaging software and describe in detail how to obtain these measurements. We hope that our contribution enables users in veterinary medicine to easily obtain perfusion parameters using standard Open Source software in a standard, adaptable and comparable way. ]]> <![CDATA[Quantitative phenotype scan statistic (QPSS) reveals rare variant associations with Alzheimer’s disease endophenotypes]]> https://www.researchpad.co/article/elastic_article_9219 Current sequencing technologies have provided for a more comprehensive genome-wide assessment and have increased genotyping accuracy of rare variants. Scan statistic approaches have previously been adapted to genetic sequencing data. Unlike currently-employed association tests, scan-statistic-based approaches can both localize clusters of disease-related variants and, subsequently, examine the phenotype association within the resulting cluster. In this study, we present a novel Quantitative Phenotype Scan Statistic (QPSS) that extends an approach for dichotomous phenotypes to continuous outcomes in order to identify genomic regions where rare quantitative-phenotype-associated variants cluster.ResultsWe demonstrate the performance and practicality of QPSS with extensive simulations and an application to a whole-genome sequencing (WGS) study of cerebrospinal fluid (CSF) biomarkers from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Using QPSS, we identify regions of rare variant enrichment associated with levels of AD-related proteins, CSF Aβ1–42 and p-tau181P.ConclusionsQPSS is implemented under the assumption that causal variants within a window have the same direction of effect. Typical self-contained tests employ a null hypothesis of no association between the target variant set and the phenotype. Therefore, an advantage of the proposed competitive test is that it is possible to refine a known region of interest to localize disease-associated clusters. The definition of clusters can be easily adapted based on variant function or annotation. ]]> <![CDATA[An integrated software for virus community sequencing data analysis]]> https://www.researchpad.co/article/elastic_article_9159 A virus community is the spectrum of viral strains populating an infected host, which plays a key role in pathogenesis and therapy response in viral infectious diseases. However automatic and dedicated pipeline for interpreting virus community sequencing data has not been developed yet.ResultsWe developed Quasispecies Analysis Package (QAP), an integrated software platform to address the problems associated with making biological interpretations from massive viral population sequencing data. QAP provides quantitative insight into virus ecology by first introducing the definition “virus OTU” and supports a wide range of viral community analyses and results visualizations. Various forms of QAP were developed in consideration of broader users, including a command line, a graphical user interface and a web server. Utilities of QAP were thoroughly evaluated with high-throughput sequencing data from hepatitis B virus, hepatitis C virus, influenza virus and human immunodeficiency virus, and the results showed highly accurate viral quasispecies characteristics related to biological phenotypes.ConclusionsQAP provides a complete solution for virus community high throughput sequencing data analysis, and it would facilitate the easy analysis of virus quasispecies in clinical applications. ]]> <![CDATA[CIPR: a web-based R/shiny app and R package to annotate cell clusters in single cell RNA sequencing experiments]]> https://www.researchpad.co/article/elastic_article_8965 Single cell RNA sequencing (scRNAseq) has provided invaluable insights into cellular heterogeneity and functional states in health and disease. During the analysis of scRNAseq data, annotating the biological identity of cell clusters is an important step before downstream analyses and it remains technically challenging. The current solutions for annotating single cell clusters generally lack a graphical user interface, can be computationally intensive or have a limited scope. On the other hand, manually annotating single cell clusters by examining the expression of marker genes can be subjective and labor-intensive. To improve the quality and efficiency of annotating cell clusters in scRNAseq data, we present a web-based R/Shiny app and R package, Cluster Identity PRedictor (CIPR), which provides a graphical user interface to quickly score gene expression profiles of unknown cell clusters against mouse or human references, or a custom dataset provided by the user. CIPR can be easily integrated into the current pipelines to facilitate scRNAseq data analysis.ResultsCIPR employs multiple approaches for calculating the identity score at the cluster level and can accept inputs generated by popular scRNAseq analysis software. CIPR provides 2 mouse and 5 human reference datasets, and its pipeline allows inter-species comparisons and the ability to upload a custom reference dataset for specialized studies. The option to filter out lowly variable genes and to exclude irrelevant reference cell subsets from the analysis can improve the discriminatory power of CIPR suggesting that it can be tailored to different experimental contexts. Benchmarking CIPR against existing functionally similar software revealed that our algorithm is less computationally demanding, it performs significantly faster and provides accurate predictions for multiple cell clusters in a scRNAseq experiment involving tumor-infiltrating immune cells.ConclusionsCIPR facilitates scRNAseq data analysis by annotating unknown cell clusters in an objective and efficient manner. Platform independence owing to Shiny framework and the requirement for a minimal programming experience allows this software to be used by researchers from different backgrounds. CIPR can accurately predict the identity of a variety of cell clusters and can be used in various experimental contexts across a broad spectrum of research areas. ]]> <![CDATA[Broad-coverage biomedical relation extraction with SemRep]]> https://www.researchpad.co/article/elastic_article_8505 In the era of information overload, natural language processing (NLP) techniques are increasingly needed to support advanced biomedical information management and discovery applications. In this paper, we present an in-depth description of SemRep, an NLP system that extracts semantic relations from PubMed abstracts using linguistic principles and UMLS domain knowledge. We also evaluate SemRep on two datasets. In one evaluation, we use a manually annotated test collection and perform a comprehensive error analysis. In another evaluation, we assess SemRep’s performance on the CDR dataset, a standard benchmark corpus annotated with causal chemical-disease relationships.ResultsA strict evaluation of SemRep on our manually annotated dataset yields 0.55 precision, 0.34 recall, and 0.42 F 1 score. A relaxed evaluation, which more accurately characterizes SemRep performance, yields 0.69 precision, 0.42 recall, and 0.52 F 1 score. An error analysis reveals named entity recognition/normalization as the largest source of errors (26.9%), followed by argument identification (14%) and trigger detection errors (12.5%). The evaluation on the CDR corpus yields 0.90 precision, 0.24 recall, and 0.38 F 1 score. The recall and the F 1 score increase to 0.35 and 0.50, respectively, when the evaluation on this corpus is limited to sentence-bound relationships, which represents a fairer evaluation, as SemRep operates at the sentence level.ConclusionsSemRep is a broad-coverage, interpretable, strong baseline system for extracting semantic relations from biomedical text. It also underpins SemMedDB, a literature-scale knowledge graph based on semantic relations. Through SemMedDB, SemRep has had significant impact in the scientific community, supporting a variety of clinical and translational applications, including clinical decision making, medical diagnosis, drug repurposing, literature-based discovery and hypothesis generation, and contributing to improved health outcomes. In ongoing development, we are redesigning SemRep to increase its modularity and flexibility, and addressing weaknesses identified in the error analysis. ]]> <![CDATA[Mechanism to prevent the abuse of IPv6 fragmentation in OpenFlow networks]]> https://www.researchpad.co/article/elastic_article_7717 OpenFlow makes a network highly flexible and fast-evolving by separating control and data planes. The control plane thus becomes responsive to changes in topology and load balancing requirements. OpenFlow also offers a new approach to handle security threats accurately and responsively. Therefore, it is used as an innovative firewall that acts as a first-hop security to protect networks against malicious users. However, the firewall provided by OpenFlow suffers from Internet protocol version 6 (IPv6) fragmentation, which can be used to bypass the OpenFlow firewall. The OpenFlow firewall cannot identify the message payload unless the switch implements IPv6 fragment reassembly. This study tests the IPv6 fragmented packets that can evade the OpenFlow firewall, and proposes a new mechanism to guard against attacks carried out by malicious users to exploit IPv6 fragmentation loophole in OpenFlow networks. The proposed mechanism is evaluated in a simulated environment by using six scenarios, and results exhibit that the proposed mechanism effectively fixes the loophole and successfully prevents the abuse of IPv6 fragmentation in OpenFlow networks.

]]>
<![CDATA[Blast2Fish: a reference-based annotation web tool for transcriptome analysis of non-model teleost fish]]> https://www.researchpad.co/article/Nd0ff4215-3994-4c2f-b465-292d15729fd5 Transcriptome analysis by next-generation sequencing has become a popular technique in recent years. This approach is quite suitable for non-model organism study, as de novo assembly is independent of prior genomic sequences of organisms. De novo sequencing has benefited many studies on commercially important fish species. However, to understand the functions of these assembled sequences, they still need to be annotated with existing sequence databases. By combining Basic Local Alignment Search Tool (BLAST) and Gene Ontology analysis, we were able to identify homologous sequences of assembled sequences and describe their characteristics using pre-defined tags for each gene, though the above conventional annotation results obtained for non-model assembled sequences was still associated with a lack of pre-defined tags and poorly documented records in the database.ResultsWe introduced Blast2Fish, a novel approach for performing functional enrichment analysis on non-model teleost fish transcriptome data. The Blast2Fish pipeline was designed to be a reference-based enrichment method. Instead of annotating the BLAST single top hit by a pre-defined gene-to-tag database, we included 500 hits to search related PubMed articles and parse biological terms. These descriptive terms were then sorted and recorded as annotations for the query. The results showed that Blast2Fish was capable of providing meaningful annotations on immunology topics for non-model fish transcriptome analysis.ConclusionBlast2Fish provides a novel approach for annotating sequences of non-model fish. The reference-based strategy allows annotation to be performed without pre-defined tags for each gene. This method strongly benefits non-model teleost fish studies for gene functional enrichment analysis. ]]> <![CDATA[The UCSF Mouse Inventory Database Application, an Open Source Web App for Sharing Mutant Mice Within a Research Community]]> https://www.researchpad.co/article/Nbb3b2ed7-43fd-4a80-9469-797d6b2ba821 The UCSF Mouse Inventory Database Application is an open-source Web App that provides information about the mutant alleles, transgenes, and inbred strains maintained by investigators at the university and facilitates sharing of these resources within the university community. The Application is designed to promote collaboration, decrease the costs associated with obtaining genetically-modified mice, and increase access to mouse lines that are difficult to obtain. An inventory of the genetically-modified mice on campus and the investigators who maintain them is compiled from records of purchases from external sources, transfers from researchers within and outside the university, and from data provided by users. These data are verified and augmented with relevant information harvested from public databases, and stored in a succinct, searchable database secured on the university network. Here we describe this resource and provide information about how to implement and maintain such a mouse inventory database application at other institutions.

]]>
<![CDATA[P finder: genomic and metagenomic annotation of RNase P RNA gene (<i>rnpB</i>)]]> https://www.researchpad.co/article/N629d8b19-00de-4bad-b771-26b5eafb968c The rnpB gene encodes for an essential catalytic RNA (RNase P). Like other essential RNAs, RNase P’s sequence is highly variable. However, unlike other essential RNAs (i.e. tRNA, 16 S, 6 S,...) its structure is also variable with at least 5 distinct structure types observed in prokaryotes. This structural variability makes it labor intensive and challenging to create and maintain covariance models for the detection of RNase P RNA in genomic and metagenomic sequences. The lack of a facile and rapid annotation algorithm has led to the rnpB gene being the most grossly under annotated essential gene in completed prokaryotic genomes with only a 24% annotation rate. Here we describe the coupling of the largest RNase P RNA database with the local alignment scoring algorithm to create the most sensitive and rapid prokaryote rnpB gene identification and annotation algorithm to date.ResultsOf the 2772 completed microbial genomes downloaded from GenBank only 665 genomes had an annotated rnpB gene. We applied P Finder to these genomes and were able to identify 2733 or nearly 99% of the 2772 microbial genomes examined. From these results four new rnpB genes that encode the minimal T-type P RNase P RNAs were identified computationally for the first time. In addition, only the second C-type RNase P RNA was identified in Sphaerobacter thermophilus. Of special note, no RNase P RNAs were detected in several obligate endosymbionts of sap sucking insects suggesting a novel evolutionary adaptation.ConclusionsThe coupling of the largest RNase P RNA database and associated structure class identification with the P Finder algorithm is both sensitive and rapid, yielding high quality results to aid researchers annotating either genomic or metagenomic data. It is the only algorithm to date that can identify challenging RNAse P classes such as C-type and the minimal T-type RNase P RNAs. P Finder is written in C# and has a user-friendly GUI that can run on multiple 64-bit windows platforms (Windows Vista/7/8/10). P Finder is free available for download at https://github.com/JChristopherEllis/P-Finder as well as a small sample RNase P RNA file for testing. ]]> <![CDATA[Guiseppe Paglia, Guiseppe Astarita (Eds.): Ion mobility-mass spectrometry—methods and protocols]]> https://www.researchpad.co/article/N34040093-70a5-4702-ab8c-7cd77508fcb3 <![CDATA[Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit]]> https://www.researchpad.co/article/5989daabab0ee8fa60ba94f6

Background

Scripting languages such as Python are ideally suited to common programming tasks in cheminformatics such as data analysis and parsing information from files. However, for reasons of efficiency, cheminformatics toolkits such as the OpenBabel toolkit are often implemented in compiled languages such as C++. We describe Pybel, a Python module that provides access to the OpenBabel toolkit.

Results

Pybel wraps the direct toolkit bindings to simplify common tasks such as reading and writing molecular files and calculating fingerprints. Extensive use is made of Python iterators to simplify loops such as that over all the molecules in a file. A Pybel Molecule can be easily interconverted to an OpenBabel OBMol to access those methods or attributes not wrapped by Pybel.

Conclusion

Pybel allows cheminformaticians to rapidly develop Python scripts that manipulate chemical information. It is open source, available cross-platform, and offers the power of the OpenBabel toolkit to Python programmers.

]]>
<![CDATA[miTarget: microRNA target gene prediction using a support vector machine]]> https://www.researchpad.co/article/5b7bc070463d7e74d0b4e90e

Background

MicroRNAs (miRNAs) are small noncoding RNAs, which play significant roles as posttranscriptional regulators. The functions of animal miRNAs are generally based on complementarity for their 5' components. Although several computational miRNA target-gene prediction methods have been proposed, they still have limitations in revealing actual target genes.

Results

We implemented miTarget, a support vector machine (SVM) classifier for miRNA target gene prediction. It uses a radial basis function kernel as a similarity measure for SVM features, categorized by structural, thermodynamic, and position-based features. The latter features are introduced in this study for the first time and reflect the mechanism of miRNA binding. The SVM classifier produces high performance with a biologically relevant data set obtained from the literature, compared with previous tools. We predicted significant functions for human miR-1, miR-124a, and miR-373 using Gene Ontology (GO) analysis and revealed the importance of pairing at positions 4, 5, and 6 in the 5' region of a miRNA from a feature selection experiment. We also provide a web interface for the program.

Conclusion

miTarget is a reliable miRNA target gene prediction tool and is a successful application of an SVM classifier. Compared with previous tools, its predictions are meaningful by GO analysis and its performance can be improved given more training examples.

]]>
<![CDATA[Visualize omics data on networks with Omics Visualizer, a Cytoscape App]]> https://www.researchpad.co/article/N9ec7b981-1580-4341-97c9-91419916279f

Cytoscape is an open-source software used to analyze and visualize biological networks. In addition to being able to import networks from a variety of sources, Cytoscape allows users to import tabular node data and visualize it onto networks. Unfortunately, such data tables can only contain one row of data per node, whereas omics data often have multiple rows for the same gene or protein, representing different post-translational modification sites, peptides, splice isoforms, or conditions. Here, we present a new app, Omics Visualizer, that allows users to import data tables with several rows referring to the same node, connect them to one or more networks, and visualize the connected data onto networks. Omics Visualizer uses the Cytoscape enhancedGraphics app to show the data either in the nodes (pie visualization) or around the nodes (donut visualization), where the colors of the slices represent the imported values. If the user does not provide a network, the app can retrieve one from the STRING database using the Cytoscape stringApp. The Omics Visualizer app is freely available at https://apps.cytoscape.org/apps/omicsvisualizer.

]]>
<![CDATA[wg-blimp: an end-to-end analysis pipeline for whole genome bisulfite sequencing data]]> https://www.researchpad.co/article/N11c6685f-5d17-4d8e-85f6-86040f6dbd34

Background

Analysing whole genome bisulfite sequencing datasets is a data-intensive task that requires comprehensive and reproducible workflows to generate valid results. While many algorithms have been developed for tasks such as alignment, comprehensive end-to-end pipelines are still sparse. Furthermore, previous pipelines lack features or show technical deficiencies, thus impeding analyses.

Results

We developed wg-blimp (whole genome bisulfite sequencing methylation analysis pipeline) as an end-to-end pipeline to ease whole genome bisulfite sequencing data analysis. It integrates established algorithms for alignment, quality control, methylation calling, detection of differentially methylated regions, and methylome segmentation, requiring only a reference genome and raw sequencing data as input. Comparing wg-blimp to previous end-to-end pipelines reveals similar setups for common sequence processing tasks, but shows differences for post-alignment analyses. We improve on previous pipelines by providing a more comprehensive analysis workflow as well as an interactive user interface. To demonstrate wg-blimp’s ability to produce correct results we used it to call differentially methylated regions for two publicly available datasets. We were able to replicate 112 of 114 previously published regions, and found results to be consistent with previous findings. We further applied wg-blimp to a publicly available sample of embryonic stem cells to showcase methylome segmentation. As expected, unmethylated regions were in close proximity of transcription start sites. Segmentation results were consistent with previous analyses, despite different reference genomes and sequencing techniques.

Conclusions

wg-blimp provides a comprehensive analysis pipeline for whole genome bisulfite sequencing data as well as a user interface for simplified result inspection. We demonstrated its applicability by analysing multiple publicly available datasets. Thus, wg-blimp is a relevant alternative to previous analysis pipelines and may facilitate future epigenetic research.

]]>
<![CDATA[DiSCount: computer vision for automated quantification of Striga seed germination]]> https://www.researchpad.co/article/N945aecc8-eb16-4482-9157-7d58c8ea30c3

Background

Plant parasitic weeds belonging to the genus Striga are a major threat for food production in Sub-Saharan Africa and Southeast Asia. The parasite’s life cycle starts with the induction of seed germination by host plant-derived signals, followed by parasite attachment, infection, outgrowth, flowering, reproduction, seed set and dispersal. Given the small seed size of the parasite (< 200 μm), quantification of the impact of new control measures that interfere with seed germination relies on manual, labour-intensive counting of seed batches under the microscope. Hence, there is a need for high-throughput assays that allow for large-scale screening of compounds or microorganisms that adversely affect Striga seed germination.

Results

Here, we introduce DiSCount (Digital Striga Counter): a computer vision tool for automated quantification of total and germinated Striga seed numbers in standard glass fibre filter assays. We developed the software using a machine learning approach trained with a dataset of 98 manually annotated images. Then, we validated and tested the model against a total dataset of 188 manually counted images. The results showed that DiSCount has an average error of 3.38 percentage points per image compared to the manually counted dataset. Most importantly, DiSCount achieves a 100 to 3000-fold speed increase in image analysis when compared to manual analysis, with an inference time of approximately 3 s per image on a single CPU and 0.1 s on a GPU.

Conclusions

DiSCount is accurate and efficient in quantifying total and germinated Striga seeds in a standardized germination assay. This automated computer vision tool enables for high-throughput, large-scale screening of chemical compound libraries and biological control agents of this devastating parasitic weed. The complete software and manual are hosted at https://gitlab.com/lodewijk-track32/discount_paper and the archived version is available at Zenodo with the DOI 10.5281/zenodo.3627138. The dataset used for testing is available at Zenodo with the DOI 10.5281/zenodo.3403956.

]]>
<![CDATA[Negative binomial additive model for RNA-Seq data analysis]]> https://www.researchpad.co/article/N14ca9b37-a8fc-4b5f-86a4-5069844e13da

Background

High-throughput sequencing experiments followed by differential expression analysis is a widely used approach for detecting genomic biomarkers. A fundamental step in differential expression analysis is to model the association between gene counts and covariates of interest. Existing models assume linear effect of covariates, which is restrictive and may not be sufficient for certain phenotypes.

Results

We introduce NBAMSeq, a flexible statistical model based on the generalized additive model and allows for information sharing across genes in variance estimation. Specifically, we model the logarithm of mean gene counts as sums of smooth functions with the smoothing parameters and coefficients estimated simultaneously within a nested iterative method. The variance is estimated by the Bayesian shrinkage approach to fully exploit the information across all genes.

Conclusions

Based on extensive simulations and case studies of RNA-Seq data, we show that NBAMSeq offers improved performance in detecting nonlinear effect and maintains equivalent performance in detecting linear effect compared to existing methods. The vignette and source code of NBAMSeq are available at http://bioconductor.org/packages/release/bioc/html/NBAMSeq.html.

]]>
<![CDATA[MFsim—an open Java all-in-one rich-client simulation environment for mesoscopic simulation]]> https://www.researchpad.co/article/Nb8d84e57-cca0-4d45-804e-454f1ce9aabb

MFsim is an open Java all-in-one rich-client computing environment for mesoscopic simulation with Jdpd as its default simulation kernel for Molecular Fragment (Dissipative Particle) Dynamics. The new environment comprises the complete preparation-simulation–evaluation triad of a mesoscopic simulation task and especially enables biomolecular simulation tasks with peptides and proteins. Productive highlights are a SPICES molecular structure editor, a PDB-to-SPICES parser for particle-based peptide/protein representations, a support of polymer definitions, a compartment editor for complex simulation box start configurations, interactive and flexible simulation box views including analytics, simulation movie generation or animated diagrams. As an open project, MFsim allows for customized extensions for different fields of research.

]]>