ResearchPad - Software https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Visualize omics data on networks with Omics Visualizer, a Cytoscape App]]> https://www.researchpad.co/product?articleinfo=N9ec7b981-1580-4341-97c9-91419916279f

Cytoscape is an open-source software used to analyze and visualize biological networks. In addition to being able to import networks from a variety of sources, Cytoscape allows users to import tabular node data and visualize it onto networks. Unfortunately, such data tables can only contain one row of data per node, whereas omics data often have multiple rows for the same gene or protein, representing different post-translational modification sites, peptides, splice isoforms, or conditions. Here, we present a new app, Omics Visualizer, that allows users to import data tables with several rows referring to the same node, connect them to one or more networks, and visualize the connected data onto networks. Omics Visualizer uses the Cytoscape enhancedGraphics app to show the data either in the nodes (pie visualization) or around the nodes (donut visualization), where the colors of the slices represent the imported values. If the user does not provide a network, the app can retrieve one from the STRING database using the Cytoscape stringApp. The Omics Visualizer app is freely available at https://apps.cytoscape.org/apps/omicsvisualizer.

]]>
<![CDATA[The UCSF Mouse Inventory Database Application, an Open Source Web App for Sharing Mutant Mice Within a Research Community]]> https://www.researchpad.co/product?articleinfo=Nbb3b2ed7-43fd-4a80-9469-797d6b2ba821

The UCSF Mouse Inventory Database Application is an open-source Web App that provides information about the mutant alleles, transgenes, and inbred strains maintained by investigators at the university and facilitates sharing of these resources within the university community. The Application is designed to promote collaboration, decrease the costs associated with obtaining genetically-modified mice, and increase access to mouse lines that are difficult to obtain. An inventory of the genetically-modified mice on campus and the investigators who maintain them is compiled from records of purchases from external sources, transfers from researchers within and outside the university, and from data provided by users. These data are verified and augmented with relevant information harvested from public databases, and stored in a succinct, searchable database secured on the university network. Here we describe this resource and provide information about how to implement and maintain such a mouse inventory database application at other institutions.

]]>
<![CDATA[ggroups: an R package for pedigree and genetic groups data]]> https://www.researchpad.co/product?articleinfo=N375fc0f1-ece6-4d23-9070-43e96f04a13e

Background

R is a multi-platform statistical software and an object oriented programming language. The package archive network for R provides CRAN repository that features over 15,000 free open source packages, at the time of writing this article (https://cran.r-project.org/web/packages, accessed in October 2019). The package ggroups is introduced in this article. The purpose of this package is providing functions for checking and processing the pedigree, calculation of the additive genetic relationship matrix and its inverse, which are used to study the population structure and predicting the genetic merit of animals. Calculation of the dominance relationship matrix and its inverse are also covered. A concept in animal breeding is genetic groups, which is about the inequality of the average genetic merits for groups of unknown parents. The package provides functions for the calculation of the matrix of genetic group contributions (Q). Calculating Q is computationally demanding, and depending on the size of the pedigree and the number of genetic groups, it might not be feasible using personal computers. Therefore, a computationally optimised function and its parallel processing alternative are provided in the package.

Results

Using sample data, outputs from different functions of the package were presented to illustrate a real experience of working with the package.

Conclusions

The presented R package is a free and open source tool mainly for quantitative geneticists and ecologists, who deal with pedigree data. It provides numerous functions for handling pedigree data, and calculating various pedigree-based matrices. Some of the functions are computationally optimised for large-scale data.

]]>
<![CDATA[Blast2Fish: a reference-based annotation web tool for transcriptome analysis of non-model teleost fish]]> https://www.researchpad.co/product?articleinfo=Nd0ff4215-3994-4c2f-b465-292d15729fd5

Background

Transcriptome analysis by next-generation sequencing has become a popular technique in recent years. This approach is quite suitable for non-model organism study, as de novo assembly is independent of prior genomic sequences of organisms. De novo sequencing has benefited many studies on commercially important fish species. However, to understand the functions of these assembled sequences, they still need to be annotated with existing sequence databases. By combining Basic Local Alignment Search Tool (BLAST) and Gene Ontology analysis, we were able to identify homologous sequences of assembled sequences and describe their characteristics using pre-defined tags for each gene, though the above conventional annotation results obtained for non-model assembled sequences was still associated with a lack of pre-defined tags and poorly documented records in the database.

Results

We introduced Blast2Fish, a novel approach for performing functional enrichment analysis on non-model teleost fish transcriptome data. The Blast2Fish pipeline was designed to be a reference-based enrichment method. Instead of annotating the BLAST single top hit by a pre-defined gene-to-tag database, we included 500 hits to search related PubMed articles and parse biological terms. These descriptive terms were then sorted and recorded as annotations for the query. The results showed that Blast2Fish was capable of providing meaningful annotations on immunology topics for non-model fish transcriptome analysis.

Conclusion

Blast2Fish provides a novel approach for annotating sequences of non-model fish. The reference-based strategy allows annotation to be performed without pre-defined tags for each gene. This method strongly benefits non-model teleost fish studies for gene functional enrichment analysis.

]]>
<![CDATA[MFsim—an open Java all-in-one rich-client simulation environment for mesoscopic simulation]]> https://www.researchpad.co/product?articleinfo=Nb8d84e57-cca0-4d45-804e-454f1ce9aabb

MFsim is an open Java all-in-one rich-client computing environment for mesoscopic simulation with Jdpd as its default simulation kernel for Molecular Fragment (Dissipative Particle) Dynamics. The new environment comprises the complete preparation-simulation–evaluation triad of a mesoscopic simulation task and especially enables biomolecular simulation tasks with peptides and proteins. Productive highlights are a SPICES molecular structure editor, a PDB-to-SPICES parser for particle-based peptide/protein representations, a support of polymer definitions, a compartment editor for complex simulation box start configurations, interactive and flexible simulation box views including analytics, simulation movie generation or animated diagrams. As an open project, MFsim allows for customized extensions for different fields of research.

]]>
<![CDATA[Negative binomial additive model for RNA-Seq data analysis]]> https://www.researchpad.co/product?articleinfo=N14ca9b37-a8fc-4b5f-86a4-5069844e13da

Background

High-throughput sequencing experiments followed by differential expression analysis is a widely used approach for detecting genomic biomarkers. A fundamental step in differential expression analysis is to model the association between gene counts and covariates of interest. Existing models assume linear effect of covariates, which is restrictive and may not be sufficient for certain phenotypes.

Results

We introduce NBAMSeq, a flexible statistical model based on the generalized additive model and allows for information sharing across genes in variance estimation. Specifically, we model the logarithm of mean gene counts as sums of smooth functions with the smoothing parameters and coefficients estimated simultaneously within a nested iterative method. The variance is estimated by the Bayesian shrinkage approach to fully exploit the information across all genes.

Conclusions

Based on extensive simulations and case studies of RNA-Seq data, we show that NBAMSeq offers improved performance in detecting nonlinear effect and maintains equivalent performance in detecting linear effect compared to existing methods. The vignette and source code of NBAMSeq are available at http://bioconductor.org/packages/release/bioc/html/NBAMSeq.html.

]]>
<![CDATA[wg-blimp: an end-to-end analysis pipeline for whole genome bisulfite sequencing data]]> https://www.researchpad.co/product?articleinfo=N11c6685f-5d17-4d8e-85f6-86040f6dbd34

Background

Analysing whole genome bisulfite sequencing datasets is a data-intensive task that requires comprehensive and reproducible workflows to generate valid results. While many algorithms have been developed for tasks such as alignment, comprehensive end-to-end pipelines are still sparse. Furthermore, previous pipelines lack features or show technical deficiencies, thus impeding analyses.

Results

We developed wg-blimp (whole genome bisulfite sequencing methylation analysis pipeline) as an end-to-end pipeline to ease whole genome bisulfite sequencing data analysis. It integrates established algorithms for alignment, quality control, methylation calling, detection of differentially methylated regions, and methylome segmentation, requiring only a reference genome and raw sequencing data as input. Comparing wg-blimp to previous end-to-end pipelines reveals similar setups for common sequence processing tasks, but shows differences for post-alignment analyses. We improve on previous pipelines by providing a more comprehensive analysis workflow as well as an interactive user interface. To demonstrate wg-blimp’s ability to produce correct results we used it to call differentially methylated regions for two publicly available datasets. We were able to replicate 112 of 114 previously published regions, and found results to be consistent with previous findings. We further applied wg-blimp to a publicly available sample of embryonic stem cells to showcase methylome segmentation. As expected, unmethylated regions were in close proximity of transcription start sites. Segmentation results were consistent with previous analyses, despite different reference genomes and sequencing techniques.

Conclusions

wg-blimp provides a comprehensive analysis pipeline for whole genome bisulfite sequencing data as well as a user interface for simplified result inspection. We demonstrated its applicability by analysing multiple publicly available datasets. Thus, wg-blimp is a relevant alternative to previous analysis pipelines and may facilitate future epigenetic research.

]]>
<![CDATA[DiSCount: computer vision for automated quantification of Striga seed germination]]> https://www.researchpad.co/product?articleinfo=N945aecc8-eb16-4482-9157-7d58c8ea30c3

Background

Plant parasitic weeds belonging to the genus Striga are a major threat for food production in Sub-Saharan Africa and Southeast Asia. The parasite’s life cycle starts with the induction of seed germination by host plant-derived signals, followed by parasite attachment, infection, outgrowth, flowering, reproduction, seed set and dispersal. Given the small seed size of the parasite (< 200 μm), quantification of the impact of new control measures that interfere with seed germination relies on manual, labour-intensive counting of seed batches under the microscope. Hence, there is a need for high-throughput assays that allow for large-scale screening of compounds or microorganisms that adversely affect Striga seed germination.

Results

Here, we introduce DiSCount (Digital Striga Counter): a computer vision tool for automated quantification of total and germinated Striga seed numbers in standard glass fibre filter assays. We developed the software using a machine learning approach trained with a dataset of 98 manually annotated images. Then, we validated and tested the model against a total dataset of 188 manually counted images. The results showed that DiSCount has an average error of 3.38 percentage points per image compared to the manually counted dataset. Most importantly, DiSCount achieves a 100 to 3000-fold speed increase in image analysis when compared to manual analysis, with an inference time of approximately 3 s per image on a single CPU and 0.1 s on a GPU.

Conclusions

DiSCount is accurate and efficient in quantifying total and germinated Striga seeds in a standardized germination assay. This automated computer vision tool enables for high-throughput, large-scale screening of chemical compound libraries and biological control agents of this devastating parasitic weed. The complete software and manual are hosted at https://gitlab.com/lodewijk-track32/discount_paper and the archived version is available at Zenodo with the DOI 10.5281/zenodo.3627138. The dataset used for testing is available at Zenodo with the DOI 10.5281/zenodo.3403956.

]]>
<![CDATA[Phylotastic: Improving Access to Tree-of-Life Knowledge With Flexible, on-the-Fly Delivery of Trees]]> https://www.researchpad.co/product?articleinfo=Ndaa8e48c-1a7e-41ec-8396-3b490df47aa2

A comprehensive phylogeny of species, i.e., a tree of life, has potential uses in a variety of contexts, including research, education, and public policy. Yet, accessing the tree of life typically requires special knowledge, complex software, or long periods of training. The Phylotastic project aims make it as easy to get a phylogeny of species as it is to get driving directions from mapping software. In prior work, we presented a design for an open system to validate and manage taxon names, find phylogeny resources, extract subtrees matching a user’s taxon list, scale trees to time, and integrate related resources such as species images. Here, we report the implementation of a set of tools that together represent a robust, accessible system for on-the-fly delivery of phylogenetic knowledge. This set of tools includes a web portal to execute several customizable workflows to obtain species phylogenies (scaled by geologic time and decorated with thumbnail images); more than 30 underlying web services (accessible via a common registry); and code toolkits in R and Python (allowing others to develop custom applications using Phylotastic services). The Phylotastic system, accessible via http://www.phylotastic.org, provides a unique resource to access the current state of phylogenetic knowledge, useful for a variety of cases in which a tree extracted quickly from online resources (as distinct from a tree custom-made from character data) is sufficient, as it is for many casual uses of trees identified here.

]]>
<![CDATA[YSMR: a video tracking and analysis program for bacterial motility]]> https://www.researchpad.co/product?articleinfo=N8dbcc9ee-e766-4508-9f30-d4ab0ad8b07f

Background

Motility in bacteria forms the basis for taxis and is in some pathogenic bacteria important for virulence. Video tracking of motile bacteria allows the monitoring of bacterial swimming behaviour and taxis on the level of individual cells, which is a prerequisite to study the underlying molecular mechanisms.

Results

The open-source python program YSMR (Your Software for Motility Recognition) was designed to simultaneously track a large number of bacterial cells on standard computers from video files in various formats. In order to cope with the high number of tracked objects, we use a simple detection and tracking approach based on grey-value and position, followed by stringent selection against suspicious data points. The generated data can be used for statistical analyses either directly with YSMR or with external programs.

Conclusion

In contrast to existing video tracking software, which either requires expensive computer hardware or only tracks a limited number of bacteria for a few seconds, YSMR is an open-source program which allows the 2-D tracking of several hundred objects over at least 5 minutes on standard computer hardware.

The code is freely available at https://github.com/schwanbeck/YSMR

]]>
<![CDATA[SpatialCPie: an R/Bioconductor package for spatial transcriptomics cluster evaluation]]> https://www.researchpad.co/product?articleinfo=N6199a066-fee1-46e5-84a8-e7e889a53459

Background

Technological developments in the emerging field of spatial transcriptomics have opened up an unexplored landscape where transcript information is put in a spatial context. Clustering commonly constitutes a central component in analyzing this type of data. However, deciding on the number of clusters to use and interpreting their relationships can be difficult.

Results

We introduce SpatialCPie, an R package designed to facilitate cluster evaluation for spatial transcriptomics data. SpatialCPie clusters the data at multiple resolutions. The results are visualized with pie charts that indicate the similarity between spatial regions and clusters and a cluster graph that shows the relationships between clusters at different resolutions. We demonstrate SpatialCPie on several publicly available datasets.

Conclusions

SpatialCPie provides intuitive visualizations of cluster relationships when dealing with Spatial Transcriptomics data.

]]>
<![CDATA[vivaGen – a survival data set generator for software testing]]> https://www.researchpad.co/product?articleinfo=N4e500d53-badd-43a1-9009-2e783f63359e

An amendment to this paper has been published and can be accessed via the original article.

]]>
<![CDATA[CYPminer: an automated cytochrome P450 identification, classification, and data analysis tool for genome data sets across kingdoms]]> https://www.researchpad.co/product?articleinfo=Nd1ec506c-7f4c-4b27-93d9-5ddfd63a1b3a

Background

Cytochrome P450 monooxygenases (termed CYPs or P450s) are hemoproteins ubiquitously found across all kingdoms, playing a central role in intracellular metabolism, especially in metabolism of drugs and xenobiotics. The explosive growth of genome sequencing brings a new set of challenges and issues for researchers, such as a systematic investigation of CYPs across all kingdoms in terms of identification, classification, and pan-CYPome analyses. Such investigation requires an automated tool that can handle an enormous amount of sequencing data in a timely manner.

Results

CYPminer was developed in the Python language to facilitate rapid, comprehensive analysis of CYPs from genomes of all kingdoms. CYPminer consists of two procedures i) to generate the Genome-CYP Matrix (GCM) that lists all occurrences of CYPs across the genomes, and ii) to perform analyses and visualization of the GCM, including pan-CYPomes (pan- and core-CYPome), CYP co-occurrence networks, CYP clouds, and genome clustering data. The performance of CYPminer was evaluated with three datasets from fungal and bacterial genome sequences.

Conclusions

CYPminer completes CYP analyses for large-scale genomes from all kingdoms, which allows systematic genome annotation and comparative insights for CYPs. CYPminer also can be extended and adapted easily for broader usage.

]]>
<![CDATA[circRNAprofiler: an R-based computational framework for the downstream analysis of circular RNAs]]> https://www.researchpad.co/product?articleinfo=N58b2d531-09f0-4952-97d6-804af407d86c

Background

Circular RNAs (circRNAs) are a newly appreciated class of non-coding RNA molecules. Numerous tools have been developed for the detection of circRNAs, however computational tools to perform downstream functional analysis of circRNAs are scarce.

Results

We present circRNAprofiler, an R-based computational framework that runs after circRNAs have been identified. It allows to combine circRNAs detected by multiple publicly available annotation-based circRNA detection tools and to analyze their expression, genomic context, evolutionary conservation, biogenesis and putative functions.

Conclusions

Overall, the circRNA analysis workflow implemented by circRNAprofiler is highly automated and customizable, and the results of the analyses can be used as starting point for further investigation in the role of specific circRNAs in any physiological or pathological condition.

]]>
<![CDATA[CCMetagen: comprehensive and accurate identification of eukaryotes and prokaryotes in metagenomic data]]> https://www.researchpad.co/product?articleinfo=N97139ba0-7407-4a67-a331-a715341e4d7d

There is an increasing demand for accurate and fast metagenome classifiers that can not only identify bacteria, but all members of a microbial community. We used a recently developed concept in read mapping to develop a highly accurate metagenomic classification pipeline named CCMetagen. The pipeline substantially outperforms other commonly used software in identifying bacteria and fungi and can efficiently use the entire NCBI nucleotide collection as a reference to detect species with incomplete genome data from all biological kingdoms. CCMetagen is user-friendly, and the results can be easily integrated into microbial community analysis software for streamlined and automated microbiome studies.

]]>
<![CDATA[GiniClust3: a fast and memory-efficient tool for rare cell type identification]]> https://www.researchpad.co/product?articleinfo=N0c4178e7-db4c-4b33-a5d8-2a83261d51f1

Background

With the rapid development of single-cell RNA sequencing technology, it is possible to dissect cell-type composition at high resolution. A number of methods have been developed with the purpose to identify rare cell types. However, existing methods are still not scalable to large datasets, limiting their utility. To overcome this limitation, we present a new software package, called GiniClust3, which is an extension of GiniClust2 and significantly faster and memory-efficient than previous versions.

Results

Using GiniClust3, it only takes about 7 h to identify both common and rare cell clusters from a dataset that contains more than one million cells. Cell type mapping and perturbation analyses show that GiniClust3 could robustly identify cell clusters.

Conclusions

Taken together, these results suggest that GiniClust3 is a powerful tool to identify both common and rare cell population and can handle large dataset. GiniCluster3 is implemented in the open-source python package and available at https://github.com/rdong08/GiniClust3.

]]>
<![CDATA[schema: an open-source, distributed mobile platform for deploying mHealth research tools and interventions]]> https://www.researchpad.co/product?articleinfo=Nbc27ccd6-b41e-46b4-b9dc-496a213408e2

Background

Mobile applications for health, also known as ‘mHealth apps’, have experienced increasing popularity over the past ten years. However, most publicly available mHealth apps are not clinically validated, and many do not utilise evidence-based strategies. Health researchers wishing to develop and evaluate mHealth apps may be impeded by cost and technical skillset barriers. As traditionally lab-based methods are translated onto mobile platforms, robust and accessible tools are needed to enable the development of quality, evidence-based programs by clinical experts.

Results

This paper introduces schema, an open-source, distributed, app-based platform for researchers to deploy behavior monitoring and health interventions onto mobile devices. The architecture and design features of the platform are discussed, including flexible scheduling, randomisation, a wide variety of survey and media elements, and distributed storage of data. The platform supports a range of research designs, including cross-sectional surveys, ecological momentary assessment, randomised controlled trials, and micro-randomised just-in-time adaptive interventions. Use cases for both researchers and participants are considered to demonstrate the flexibility and usefulness of the platform for mHealth research.

Conclusions

The paper concludes by considering the strengths and limitations of the platform, and a call for support from the research community in areas of technical development and evaluation. To get started with schema, please visit the GitHub repository: https://github.com/schema-app/schema.

]]>
<![CDATA[CReM: chemically reasonable mutations framework for structure generation]]> https://www.researchpad.co/product?articleinfo=N1a647cf1-00ee-41a8-9b19-2fd1ad2009ee

Structure generators are widely used in de novo design studies and their performance substantially influences an outcome. Approaches based on the deep learning models and conventional atom-based approaches may result in invalid structures and fail to address their synthetic feasibility issues. On the other hand, conventional reaction-based approaches result in synthetically feasible compounds but novelty and diversity of generated compounds may be limited. Fragment-based approaches can provide both better novelty and diversity of generated compounds but the issue of synthetic complexity of generated structure was not explicitly addressed before. Here we developed a new framework of fragment-based structure generation that, by design, results in the chemically valid structures and provides flexible control over diversity, novelty, synthetic complexity and chemotypes of generated compounds. The framework was implemented as an open-source Python module and can be used to create custom workflows for the exploration of chemical space.

]]>
<![CDATA[AutoGrow4: an open-source genetic algorithm for de novo drug design and lead optimization]]> https://www.researchpad.co/product?articleinfo=N14cd2782-f5e9-4db3-a5ae-69f0a790ca95

We here present AutoGrow4, an open-source program for semi-automated computer-aided drug discovery. AutoGrow4 uses a genetic algorithm to evolve predicted ligands on demand and so is not limited to a virtual library of pre-enumerated compounds. It is a useful tool for generating entirely novel drug-like molecules and for optimizing preexisting ligands. By leveraging recent computational and cheminformatics advancements, AutoGrow4 is faster, more stable, and more modular than previous versions. It implements new docking-program compatibility, chemical filters, multithreading options, and selection methods to support a wide range of user needs. To illustrate both de novo design and lead optimization, we here apply AutoGrow4 to the catalytic domain of poly(ADP-ribose) polymerase 1 (PARP-1), a well characterized DNA-damage-recognition protein. AutoGrow4 produces drug-like compounds with better predicted binding affinities than FDA-approved PARP-1 inhibitors (positive controls). The predicted binding modes of the AutoGrow4 compounds mimic those of the known inhibitors, even when AutoGrow4 is seeded with random small molecules. AutoGrow4 is available under the terms of the Apache License, Version 2.0. A copy can be downloaded free of charge from http://durrantlab.com/autogrow4.

]]>
<![CDATA[MI-MAAP: marker informativeness for multi-ancestry admixed populations]]> https://www.researchpad.co/product?articleinfo=N3397d53a-1344-439a-8519-47b239db6708

Background

Admixed populations arise when two or more previously isolated populations interbreed. A powerful approach to addressing the genetic complexity in admixed populations is to infer ancestry. Ancestry inference including the proportion of an individual’s genome coming from each population and its ancestral origin along the chromosome of an admixed population requires the use of ancestry informative markers (AIMs) from reference ancestral populations. AIMs exhibit substantial differences in allele frequency between ancestral populations. Given the huge amount of human genetic variation data available from diverse populations, a computationally feasible and cost-effective approach is becoming increasingly important to extract or filter AIMs with the maximum information content for ancestry inference, admixture mapping, forensic applications, and detecting genomic regions that have been under recent selection.

Results

To address this gap, we present MI-MAAP, an easy-to-use web-based bioinformatics tool designed to prioritize informative markers for multi-ancestry admixed populations by utilizing feature selection methods and multiple genomics resources including 1000 Genomes Project and Human Genome Diversity Project. Specifically, this tool implements a novel allele frequency-based feature selection algorithm, Lancaster Estimator of Independence (LEI), as well as other genotype-based methods such as Principal Component Analysis (PCA), Support Vector Machine (SVM), and Random Forest (RF). We demonstrated that MI-MAAP is a useful tool in prioritizing informative markers and accurately classifying ancestral populations. LEI is an efficient feature selection strategy to retrieve ancestry informative variants with different allele frequency/selection pressure among (or between) ancestries without requiring computationally expensive individual-level genotype data.

Conclusions

MI-MAAP has a user-friendly interface which provides researchers an easy and fast way to filter and identify AIMs. MI-MAAP can be accessed at https://research.cchmc.org/mershalab/MI-MAAP/login/.

]]>