ResearchPad - software-tool-article https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Visualize omics data on networks with Omics Visualizer, a Cytoscape App]]> https://www.researchpad.co/article/elastic_article_10953 Cytoscape is an open-source software used to analyze and visualize biological networks. In addition to being able to import networks from a variety of sources, Cytoscape allows users to import tabular node data and visualize it onto networks. Unfortunately, such data tables can only contain one row of data per node, whereas omics data often have multiple rows for the same gene or protein, representing different post-translational modification sites, peptides, splice isoforms, or conditions. Here, we present a new app, Omics Visualizer, that allows users to import data tables with several rows referring to the same node, connect them to one or more networks, and visualize the connected data onto networks. Omics Visualizer uses the Cytoscape enhancedGraphics app to show the data either in the nodes (pie visualization) or around the nodes (donut visualization), where the colors of the slices represent the imported values. If the user does not provide a network, the app can retrieve one from the STRING database using the Cytoscape stringApp. The Omics Visualizer app is freely available at https://apps.cytoscape.org/apps/omicsvisualizer.

]]>
<![CDATA[Visualize omics data on networks with Omics Visualizer, a Cytoscape App]]> https://www.researchpad.co/article/N9ec7b981-1580-4341-97c9-91419916279f

Cytoscape is an open-source software used to analyze and visualize biological networks. In addition to being able to import networks from a variety of sources, Cytoscape allows users to import tabular node data and visualize it onto networks. Unfortunately, such data tables can only contain one row of data per node, whereas omics data often have multiple rows for the same gene or protein, representing different post-translational modification sites, peptides, splice isoforms, or conditions. Here, we present a new app, Omics Visualizer, that allows users to import data tables with several rows referring to the same node, connect them to one or more networks, and visualize the connected data onto networks. Omics Visualizer uses the Cytoscape enhancedGraphics app to show the data either in the nodes (pie visualization) or around the nodes (donut visualization), where the colors of the slices represent the imported values. If the user does not provide a network, the app can retrieve one from the STRING database using the Cytoscape stringApp. The Omics Visualizer app is freely available at https://apps.cytoscape.org/apps/omicsvisualizer.

]]>
<![CDATA[ microbiomeDASim: Simulating longitudinal differential abundance for microbiome data]]> https://www.researchpad.co/article/Nba83061e-83d0-43ef-851f-d9a304b6ef17

An increasing emphasis on understanding the dynamics of microbial communities in various settings has led to the proliferation of longitudinal metagenomic sampling studies. Data from whole metagenomic shotgun sequencing and marker-gene survey studies have characteristics that drive novel statistical methodological development for estimating time intervals of differential abundance. In designing a study and the frequency of collection prior to a study, one may wish to model the ability to detect an effect, e.g., there may be issues with respect to cost, ease of access, etc. Additionally, while every study is unique, it is possible that in certain scenarios one statistical framework may be more appropriate than another. Here, we present a simulation paradigm implemented in the R Bioconductor software package microbiomeDASim available at http://bioconductor.org/packages/microbiomeDASim microbiomeDASim. microbiomeDASim allows investigators to simulate longitudinal differential abundant microbiome features with a variety of known functional forms with flexible parameters to control desired signal-to-noise ratio. We present metrics of success results on one particular method called metaSplines.

]]>
<![CDATA[ClinEpiDB: an open-access clinical epidemiology database resource encouraging online exploration of complex studies]]> https://www.researchpad.co/article/N3256acd6-005a-4f23-a9bb-a55c8eb6dcfa

The concept of open data has been gaining traction as a mechanism to increase data use, ensure that data are preserved over time, and accelerate discovery. While epidemiology data sets are increasingly deposited in databases and repositories, barriers to access still remain. ClinEpiDB was constructed as an open-access online resource for clinical and epidemiologic studies by leveraging the extensive web toolkit and infrastructure of the Eukaryotic Pathogen Database Resources (EuPathDB; a collection of databases covering 170+ eukaryotic pathogens, relevant related species, and select hosts) combined with a unified semantic web framework. Here we present an intuitive point-and-click website that allows users to visualize and subset data directly in the ClinEpiDB browser and immediately explore potential associations. Supporting study documentation aids contextualization, and data can be downloaded for advanced analyses. By facilitating access and interrogation of high-quality, large-scale data sets, ClinEpiDB aims to spur collaboration and discovery that improves global health.

]]>
<![CDATA[Pathfinder: open source software for analyzing spatial navigation search strategies]]> https://www.researchpad.co/article/N78f32f79-e71c-471f-a25a-f63a5005ea33

Spatial navigation is a universal behavior that varies depending on goals, experience and available sensory stimuli. Spatial navigational tasks are routinely used to study learning, memory and goal-directed behavior, in both animals and humans. One popular paradigm for testing spatial memory is the Morris water maze, where subjects learn the location of a hidden platform that offers escape from a pool of water. Researchers typically express learning as a function of the latency to escape, though this reveals little about the underlying navigational strategies. Recently, a number of studies have begun to classify water maze search strategies in order to clarify the precise spatial and mnemonic functions of different brain regions, and to identify which aspects of spatial memory are disrupted in disease models. However, despite their usefulness, strategy analyses have not been widely adopted due to the lack of software to automate analyses. To address this need we developed Pathfinder, an open source application for analyzing spatial navigation behaviors. In a representative dataset, we show that Pathfinder effectively characterizes the development of highly-specific spatial search strategies as male and female mice learn a standard spatial water maze. Pathfinder can read data files from commercially- and freely-available software packages, is optimized for classifying search strategies in water maze paradigms, and can also be used to analyze 2D navigation by other species, and in other tasks, as long as timestamped xy coordinates are available. Pathfinder is simple to use, can automatically determine pool and platform geometry, generates heat maps, analyzes navigation with respect to multiple goal locations, and can be updated to accommodate future developments in spatial behavioral analyses. Given these features, Pathfinder may be a useful tool for studying how navigational strategies are regulated by the environment, depend on specific neural circuits, and are altered by pathology.

]]>
<![CDATA[An interactive application for malaria elimination transmission and costing in the Asia-Pacific]]> https://www.researchpad.co/article/N0a86aaa0-309f-4217-baff-a528269072e0

Leaders in the Asia-Pacific have endorsed an ambitious target to eliminate malaria in the region by 2030. The emergence and spread of artemisinin drug resistance in the Greater Mekong Subregion makes elimination urgent and strategic for the global goal of malaria eradication. Mathematical modelling is a useful tool for assessing and comparing different elimination strategies and scenarios to inform policymakers. Mathematical models are especially relevant in this context because of the wide heterogeneity of regional, country and local settings, which means that different strategies are needed to eliminate malaria. However, models and their predictions can be seen as highly technical, limiting their use for decision making. Simplified applications of models are needed to allow policy makers to benefit from these valuable tools. This paper describes a method for communicating complex model results with a user-friendly and intuitive framework. Using open-source technologies, we designed and developed an interactive application to disseminate the modelling results for malaria elimination. The design was iteratively improved while the application was being piloted and extensively tested by a diverse range of researchers and decision makers. This application allows several target audiences to explore, navigate and visualise complex datasets and models generated in the context of malaria elimination. It allows widespread access, use of and interpretation of models, generated at great effort and expense as well as enabling them to remain relevant for a longer period of time. It has long been acknowledged that scientific results need to be repackaged for larger audiences. We demonstrate that modellers can include applications as part of the dissemination strategy of their findings. We highlight that there is a need for additional research in order to provide guidelines and direction for designing and developing effective applications for disseminating models.

]]>
<![CDATA[pdb-tools: a swiss army knife for molecular structures]]> https://www.researchpad.co/article/5c605bc9d5eed0c4847ce962

The pdb-tools are a collection of Python scripts for working with molecular structure data in the Protein Data Bank (PDB) format. They allow users to edit, convert, and validate PDB files, from the command-line, in a simple but efficient manner. The pdb-tools are implemented in Python, without any external dependencies, and are freely available under the open-source Apache License at https://github.com/haddocking/pdb-tools/ and on PyPI.

]]>
<![CDATA[Gene Annotation Easy Viewer (GAEV): Integrating KEGG’s Gene Function Annotations and Associated Molecular Pathways]]> https://www.researchpad.co/article/5c605bced5eed0c4847ce9fe

We developed a Gene Annotation Easy Viewer (GAEV) that integrates the gene annotation data from the KEGG (Kyoto Encyclopedia of Genes and Genomes) Automatic Annotation Server. GAEV generates an easy-to-read table that summarizes the query gene name, the KO (KEGG Orthology) number, name of gene orthologs, functional definition of the ortholog, and the functional pathways that query gene has been mapped to. Via links to KEGG pathway maps, users can directly examine the interaction between gene products involved in the same molecular pathway. We provide a usage example by annotating the newly published freshwater microcrustacean Daphnia pulex genome. This gene-centered view of gene function and pathways will greatly facilitate the genome annotation of non-model species and metagenomics data. GAEV runs on a Windows or Linux system equipped with Python 3 and provides easy accessibility to users with no prior Unix command line experience.

]]>
<![CDATA[Increasing workflow development speed and reproducibility with Vectools]]> https://www.researchpad.co/article/5c3fc23dd5eed0c484df0372

Despite advances in bioinformatics, custom scripts remain a source of difficulty, slowing workflow development and hampering reproducibility. Here, we introduce Vectools, a command-line tool-suite to reduce reliance on custom scripts and improve reproducibility by offering a wide range of common easy-to-use functions for table and vector manipulation. Vectools also offers a number of vector related functions to speed up workflow development, such as simple machine learning and common statistics functions.

]]>
<![CDATA[ranacapa: An R package and Shiny web app to explore environmental DNA data with exploratory statistics and interactive visualizations]]> https://www.researchpad.co/article/5c37bc74d5eed0c48449cdab

Environmental DNA (eDNA) metabarcoding is becoming a core tool in ecology and conservation biology, and is being used in a growing number of education, biodiversity monitoring, and public outreach programs in which professional research scientists engage community partners in primary research. Results from eDNA analyses can engage and educate natural resource managers, students, community scientists, and naturalists, but without significant training in bioinformatics, it can be difficult for this diverse audience to interact with eDNA results. Here we present the R package ranacapa, at the core of which is a Shiny web app that helps perform exploratory biodiversity analyses and visualizations of eDNA results. The app requires a taxonomy-by-sample matrix and a simple metadata file with descriptive information about each sample. The app enables users to explore the data with interactive figures and presents results from simple community ecology analyses. We demonstrate the value of ranacapa to two groups of community partners engaging with eDNA metabarcoding results.

]]>
<![CDATA[gganatogram: An R package for modular visualisation of anatograms and tissues based on ggplot2]]> https://www.researchpad.co/article/5c150baed5eed0c4840ada5d

Displaying data onto anatomical structures is a convenient technique to quickly observe tissue related information. However, drawing tissues is a complex task that requires both expertise in anatomy and the arts. While web based applications exist for displaying gene expression on anatograms, other non-genetic disciplines lack similar tools. Moreover, web based tools often lack the modularity associated with packages in programming languages, such as R. Here I present gganatogram, an R package used to plot modular species anatograms based on a combination of the graphical grammar of ggplot2 and the publicly available anatograms from the Expression Atlas. This combination allows for quick and easy, modular, and reproducible generation of anatograms. Using only one command and a data frame with tissue name, group, colour, and value, this tool enables the user to visualise specific human and mouse tissues with desired colours, grouped by a variable, or displaying a desired value, such as gene-expression, pharmacokinetics, or bacterial load across selected tissues. I hope that this tool will be useful by the wider community in biological sciences. Community members are welcome to submit additional anatograms, which can be incorporated into the package.

A stable version gganatogram has been deposited to neuroconductor, and a development version can be found on github/jespermaag/gganatogram.

]]>
<![CDATA[sPop: Age-structured discrete-time population dynamics model in C, Python, and R]]> https://www.researchpad.co/article/5c129aa9d5eed0c4848c86ba

This article describes the sPop packages implementing the deterministic and stochastic versions of an age-structured discrete-time population dynamics model. The packages enable mechanistic modelling of a population by monitoring the age and development stage of each individual. Survival and development are included as the main effectors and they progress at a user-defined pace: follow a fixed-rate, delay for a given time, or progress at an age-dependent manner. The model is implemented in C, Python, and R with a uniform design to ease usage and facilitate adoption. Early versions of the model were previously employed for investigating climate-driven population dynamics of the tiger mosquito and the chikungunya disease spread by this vector. The sPop packages presented in this article enable the use of the model in a range of applications extending from vector-borne diseases towards any age-structured population including plant and animal populations, microbial dynamics, host-pathogen interactions, infectious diseases, and other time-delayed epidemiological processes.

]]>
<![CDATA[Norwegian e-Infrastructure for Life Sciences (NeLS)]]> https://www.researchpad.co/article/5c0dccbad5eed0c484cfebab

The Norwegian e-Infrastructure for Life Sciences (NeLS) has been developed by ELIXIR Norway to provide its users with a system enabling data storage, sharing, and analysis in a project-oriented fashion. The system is available through easy-to-use web interfaces, including the Galaxy workbench for data analysis and workflow execution. Users confident with a command-line interface and programming may also access it through Secure Shell (SSH) and application programming interfaces (APIs). 

NeLS has been in production since 2015, with training and support provided by the help desk of ELIXIR Norway. Through collaboration with NorSeq, the national consortium for high-throughput sequencing, an integrated service is offered so that sequencing data generated in a research project is provided to the involved researchers through NeLS. Sensitive data, such as individual genomic sequencing data, are handled using the TSD (Services for Sensitive Data) platform provided by Sigma2 and the University of Oslo. NeLS integrates national e-infrastructure storage and computing resources, and is also integrated with the SEEK platform in order to store large data files produced by experiments described in SEEK.  

In this article, we outline the architecture of NeLS and discuss possible directions for further development.

]]>
<![CDATA[WordCommentsAnalyzer: A windows software tool for qualitative research]]> https://www.researchpad.co/article/5c0dccb8d5eed0c484cfeb63

There is a lack of free software that provides a professional and smooth experience in text editing and markup for qualitative data analysis. Word processing software like Microsoft Word provides a good editing experience, allowing the researcher to effortlessly add comments to text portions. However, organizing the keywords and categories in the comments can become a more difficult task when the amount of data increases. We present WordCommentsAnalyzer, a software tool that is written in C# using .NET Framework and OpenXml, which helps a qualitative researcher to organize codes when using Microsoft Word as the primary text markup software. WordCommentsAnalyzer provides an effective user interface to count codes, to organize codes in a code hierarchy, and to see various data extracts belonging to each code. It also offers basic visualization tools. We illustrate how to use this software by conducting a preliminary content analysis on Tweets with the #successfulaging hashtag. We also demonstrate that the software has satisfactory performance on a large dataset of Iranian journals abstracts. We hope this open-source software will facilitate qualitative data analysis by researchers who are interested in using Word for this purpose.

]]>
<![CDATA[The Mega2R package: R tools for accessing and processing genetic data in common formats]]> https://www.researchpad.co/article/5c0dccb6d5eed0c484cfeabf

The standalone C++ Mega2 program has been facilitating data-reformatting for linkage and association analysis programs since 2000. Support for more analysis programs has been added over time. Currently, Mega2 converts data from several different genetic data formats (including PLINK, VCF, BCF, and IMPUTE2) into the specific data requirements for over 40 commonly-used linkage and association analysis programs (including Mendel, Merlin, Morgan, SHAPEIT, ROADTRIPS, MaCH/minimac3). Recently, Mega2 has been enhanced to use a SQLite database as an intermediate data representation. Additionally, Mega2 now stores bialleleic genotype data in a highly compressed form, like that of the GenABEL R package and the PLINK binary format. Our new Mega2R package now makes it easy to load Mega2 SQLite databases directly into R as data frames. In addition, Mega2R is memory efficient, keeping its genotype data in a compressed format, portions of which are only expanded when needed. Mega2R has functions that ease the process of applying gene-based tests by looping over genes, efficiently pulling out genotypes for variants within the desired boundaries. We have also created several more functions that illustrate how to use the data frames: these permit one to run the pedgene package to carry out gene-based association tests on family data, to run the SKAT package to carry out gene-based association tests, to output the Mega2R data as a VCF file and related files (for phenotype and family data), and to convert the data frames into GenABEL format. The Mega2R package enhances GenABEL since it supports additional input data formats (such as PLINK, VCF, and IMPUTE2) not currently supported by GenABEL. The Mega2 program and the Mega2R R package are both open source and are freely available, along with extensive documentation, from https://watson.hgen.pitt.edu/register for Mega2 and https://CRAN.R-project.org/package=Mega2R for Mega2R.

]]>
<![CDATA[SPEKcheck — fluorescence microscopy spectral visualisation and optimisation: a web application, javascript library, and data resource]]> https://www.researchpad.co/article/5c0d5e41d5eed0c484ba026e

Advanced fluorescence imaging methods require careful matching of excitation sources, dichroics, emission filters, detectors, and dyes to operate at their best. This complex task is often left to guesswork, preventing optimal dye:filter combinations, particularly for multicolour applications. To overcome this challenge we developed SPEKcheck, a web application to visualise the efficiency of the light path in a fluorescence microscope. The software reports values for the excitation efficiency of a dye, the collection efficiency of the emitted fluorescence, and a "brightness" score, allowing easy comparison between different fluorescent labels. It also displays a spectral plot of various elements in the configuration, enabling users to readily spot potential problems such as low efficiency excitation, emission, or high bleedthrough. It serves as an aid to exploring the performance of different dyes and filter sets.

]]>
<![CDATA[­­­A web resource for nutrient use efficiency-related genes, quantitative trait loci and microRNAs in important cereals and model plants]]> https://www.researchpad.co/article/5c0c3915d5eed0c4848a280c

Cereals are key contributors to global food security. Genes involved in the uptake (transport), assimilation and utilization of macro- and micronutrients are responsible for the presence of these nutrients in grain and straw. Although many genomic databases for cereals are available, there is currently no cohesive web resource of manually curated nutrient use efficiency (NtUE)-related genes and quantitative trait loci (QTLs). In this study, we present a web-resource containing information on NtUE-related genes/QTLs and the corresponding available microRNAs for some of these genes in four major cereal crops (wheat ( Triticum aestivum), rice ( Oryza sativa), maize ( Zea mays), barley ( Hordeum vulgare)), two alien species related to wheat ( Triticum urartu and Aegilops tauschii), and two model species ( Brachypodium distachyon and Arabidopsis thaliana). Gene annotations integrated in the current web resource were manually curated from the existing databases and the available literature. The primary goal of developing this web resource is to provide descriptions of the NtUE-related genes and their functional annotation. MicroRNAs targeting some of the NtUE related genes and the QTLs for NtUE-related traits are also included. The genomic information embedded in the web resource should help users to search for the desired information.

]]>
<![CDATA[EvolQG - An R package for evolutionary quantitative genetics]]> https://www.researchpad.co/article/5bd078f240307c787e454f0c

We present an open source package for performing evolutionary quantitative genetics analyses in the R environment for statistical computing. Evolutionary theory shows that evolution depends critically on the available variation in a given population. When dealing with many quantitative traits this variation is expressed in the form of a covariance matrix, particularly the additive genetic covariance matrix or sometimes the phenotypic matrix, when the genetic matrix is unavailable and there is evidence the phenotypic matrix is sufficiently similar to the genetic matrix. Given this mathematical representation of available variation, the EvolQG package provides functions for calculation of relevant evolutionary statistics; estimation of sampling error; corrections for this error; matrix comparison via correlations, distances and matrix decomposition; analysis of modularity patterns; and functions for testing evolutionary hypotheses on taxa diversification.

]]>
<![CDATA[Abstract Sifter: a comprehensive front-end system to PubMed]]> https://www.researchpad.co/article/5bf7a3dad5eed0c484eab6eb

The Abstract Sifter is a Microsoft Excel based application that enhances existing search capabilities of PubMed. The Abstract Sifter assists researchers to search effectively, triage results, and keep track of articles of interest. The tool implements an innovative “sifter” functionality for relevance ranking, giving the researcher a way to find articles of interest quickly. The tool also gives researchers a view of the literature landscape for a set of entities such as chemicals or genes. The Abstract Sifter is available as a Microsoft Excel macro-enabled workbook application.

]]>
<![CDATA[ lakemorpho: Calculating lake morphometry metrics in R]]> https://www.researchpad.co/article/5b45bce2463d7e537600e3c9

Metrics describing the shape and size of lakes, known as lake morphometry metrics, are important for any limnological study. In cases where a lake has long been the subject of study these data are often already collected and are openly available. Many other lakes have these data collected, but access is challenging as it is often stored on individual computers (or worse, in filing cabinets) and is available only to the primary investigators. The vast majority of lakes fall into a third category in which the data are not available. This makes broad scale modelling of lake ecology a challenge as some of the key information about in-lake processes are unavailable. While this valuable in situ information may be difficult to obtain, several national datasets exist that may be used to model and estimate lake morphometry. In particular, digital elevation models and hydrography have been shown to be predictive of several lake morphometry metrics. The R package lakemorpho has been developed to utilize these data and estimate the following morphometry metrics: surface area, shoreline length, major axis length, minor axis length, major and minor axis length ratio, shoreline development, maximum depth, mean depth, volume, maximum lake length, mean lake width, maximum lake width, and fetch. In this software tool article we describe the motivation behind developing lakemorpho, discuss the implementation in R, and describe the use of lakemorpho with an example of a typical use case.

]]>