ResearchPad - Information Systems https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Phylogeographic investigation of 2014 porcine epidemic diarrhea virus (PEDV) transmission in Taiwan]]> https://www.researchpad.co/product?articleinfo=5c89779dd5eed0c4847d319c

The porcine epidemic diarrhea virus (PEDV) that emerged and spread throughout Taiwan in 2014 triggered significant concern in the country’s swine industry. Acknowledging the absence of a thorough investigation at the geographic level, we used 2014 outbreak sequence information from the Taiwan government’s open access databases plus GenBank records to analyze PEDV dissemination among Taiwanese pig farms. Genetic sequences, locations, and dates of identified PEDV-positive cases were used to assess spatial, temporal, clustering, GIS, and phylogeographic factors affecting PEDV dissemination. Our conclusion is that S gene sequences from 2014 PEDV-positive clinical samples collected in Taiwan were part of the same Genogroup 2 identified in the US in 2013. According to phylogenetic and phylogeographic data, viral strains collected in different areas were generally independent of each other, with certain clusters identified across different communities. Data from GIS and multiple potential infection factors were used to pinpoint cluster dissemination in areas with large numbers of swine farms in southern Taiwan. The data indicate that the 2014 Taiwan PEDV epidemic resulted from the spread of multiple strains, with strong correlations identified with pig farm numbers and sizes (measured as animal concentrations), feed mill numbers, and the number of slaughterhouses in a specifically defined geographic area.

]]>
<![CDATA[Optimizing community screening for tuberculosis: Spatial analysis of localized case finding from door-to-door screening for TB in an urban district of Ho Chi Minh City, Viet Nam]]> https://www.researchpad.co/product?articleinfo=5c22a0c3d5eed0c4849ec15b

Background

Tuberculosis (TB) is the deadliest infectious disease globally. Current case finding approaches may miss many people with TB or detect them too late.

Data and methods

This study was a retrospective, spatial analysis of routine TB surveillance and cadastral data in Go Vap district, Ho Chi Minh City. We geocoded TB notifications from 2011 to 2015 and calculated theoretical yields of simulated door-to-door screening in three concentric catchment areas (50m, 100m, 200m) and three notification window scenarios (one, two and four quarters) for each index case. We calculated average yields, compared them to published reference values and fit a GEE (Generalized Estimating Equation) linear regression model onto the data.

Results

The sample included 3,046 TB patients. Adjusted theoretical yields in 50m, 100m and 200m catchment areas were 0.32% (95%CI: 0.27,0.37), 0.21% (95%CI: 0.14,0.29) and 0.17% (95%CI: 0.09,0.25), respectively, in the baseline notification window scenario. Theoretical yields in the 50m-catchment area for all notification window scenarios were significantly higher than a reference yield from literature. Yield was positively associated with treatment failure index cases (beta = 0.12, p = 0.001) and short-term inter-province migrants (beta = 0.06, p = 0.022), while greater distance to the DTU (beta = -0.02, p<0.001) was associated with lower yield.

Conclusions

This study is an example of inter-departmental collaboration and application of repurposed cadastral data to progress towards the end TB objectives. The results from Go Vap showed that the use of spatial analysis may be able to identify areas where targeted active case finding in Vietnam can help improve TB case detection.

]]>
<![CDATA[CircR2Disease: a manually curated database for experimentally supported circular RNAs associated with various diseases]]> https://www.researchpad.co/product?articleinfo=5b59986c463d7e77ce8a4a20

Abstract

CircR2Disease is a manually curated database, which provides a comprehensive resource for circRNA deregulation in various diseases. Increasing evidences have shown that circRNAs play critical roles in transcriptional, post-transcriptional and translational regulation. Therefore, the aberrant expression of circRNAs has been associated with a group of diseases. It is significant to develop a high-quality database to deposit the deregulated circRNAs in diseases. The current version of CircR2Disease contains 725 associations between 661 circRNAs and 100 diseases by reviewing existing literatures. Each entry in the CircR2Disease contains detailed information for the circRNA–disease relationship, including circRNA name, coordinates and gene symbol, disease name, expression patterns of circRNA, experimental techniques, a brief description of the circRNA–disease relationship, year of publication and the PubMed ID. CircR2Disease provides a user-friendly interface to browse, search and download as well as to submit novel disease-related circRNAs. CircR2Disease could be very beneficial for researches to investigate the mechanism of disease-related circRNAs and explore the appropriate algorithms for predicting novel associations.

Database URL: http://bioinfo.snnu.edu.cn/CircR2Disease/

]]>
<![CDATA[High-resolution reconstruction of the United States human population distribution, 1790 to 2010]]> https://www.researchpad.co/product?articleinfo=5bff4203d5eed0c484aa23ca

Where do people live, and how has this changed over timescales of centuries? High-resolution spatial information on historical human population distribution is of great significance to understand human-environment interactions and their temporal dynamics. However, the complex relationship between population distribution and various influencing factors coupled with limited data availability make it a challenge to reconstruct human population distribution over timescales of centuries. This study generated 1-km decadal population maps for the conterminous US from 1790 to 2010 using parsimonious models based on natural suitability, socioeconomic desirability, and inhabitability. Five models of increasing complexity were evaluated. The models were validated with census tract and county subdivision population data in 2000 and were applied to generate five sets of 22 historical population maps from 1790–2010. Separating urban and rural areas and excluding non-inhabitable areas were the most important factors for improving the overall accuracy. The generated gridded population datasets and the production and validation methods are described here.

]]>
<![CDATA[Wide-field corneal subbasal nerve plexus mosaics in age-controlled healthy and type 2 diabetes populations]]> https://www.researchpad.co/product?articleinfo=5bff4207d5eed0c484aa2455

A dense nerve plexus in the clear outer window of the eye, the cornea, can be imaged in vivo to enable non-invasive monitoring of peripheral nerve degeneration in diabetes. However, a limited field of view of corneal nerves, operator-dependent image quality, and subjective image sampling methods have led to difficulty in establishing robust diagnostic measures relating to the progression of diabetes and its complications. Here, we use machine-based algorithms to provide wide-area mosaics of the cornea’s subbasal nerve plexus (SBP) also accounting for depth (axial) fluctuation of the plexus. Degradation of the SBP with age has been mitigated as a confounding factor by providing a dataset comprising healthy and type 2 diabetes subjects of the same age. To maximize reuse, the dataset includes bilateral eye data, associated clinical parameters, and machine-generated SBP nerve density values obtained through automatic segmentation and nerve tracing algorithms. The dataset can be used to examine nerve degradation patterns to develop tools to non-invasively monitor diabetes progression while avoiding narrow-field imaging and image selection biases.

]]>
<![CDATA[A mobile brain-body imaging dataset recorded during treadmill walking with a brain-computer interface]]> https://www.researchpad.co/product?articleinfo=5bff4205d5eed0c484aa2410

We present a mobile brain-body imaging (MoBI) dataset acquired during treadmill walking in a brain-computer interface (BCI) task. The data were collected from eight healthy subjects, each having three identical trials. Each trial consisted of three conditions: standing, treadmill walking, and treadmill walking with a closed-loop BCI. During the BCI condition, subjects used their brain activity to control a virtual avatar on a screen to walk in real-time. Robust procedures were designed to record lower limb joint angles (bilateral hip, knee, and ankle) using goniometers synchronized with 60-channel scalp electroencephalography (EEG). Additionally, electrooculogram (EOG), EEG electrodes impedance, and digitized EEG channel locations were acquired to aid artifact removal and EEG dipole-source localization. This dataset is unique in that it is the first published MoBI dataset recorded during walking. It is useful in addressing several important open research questions, such as how EEG is coupled with gait cycle during closed-loop BCI, how BCI influences neural activity during walking, and how a BCI decoder may be optimized.

]]>
<![CDATA[High-throughput density-functional perturbation theory phonons for inorganic materials]]> https://www.researchpad.co/product?articleinfo=5bff420dd5eed0c484aa2540

The knowledge of the vibrational properties of a material is of key importance to understand physical phenomena such as thermal conductivity, superconductivity, and ferroelectricity among others. However, detailed experimental phonon spectra are available only for a limited number of materials, which hinders the large-scale analysis of vibrational properties and their derived quantities. In this work, we perform ab initio calculations of the full phonon dispersion and vibrational density of states for 1521 semiconductor compounds in the harmonic approximation based on density functional perturbation theory. The data is collected along with derived dielectric and thermodynamic properties. We present the procedure used to obtain the results, the details of the provided database and a validation based on the comparison with experimental data.

]]>
<![CDATA[Novel sequences, structural variations and gene presence variations of Asian cultivated rice]]> https://www.researchpad.co/product?articleinfo=5bff420ed5eed0c484aa25aa

Genomic diversity within a species genome is the genetic basis of its phenotypic diversity essential for its adaptation to environments. The big picture of the total genetic diversity within Asian cultivated rice has been uncovered since the sequencing of 3,000 rice genomes, including the SNP data publicly available in the SNP-Seek database. Here we report other aspects of the genetic diversity, including rice sequences assembled from over 3,000 accessions but absent in the Nipponbare reference genome, structural variations (SVs) and gene presence/absence variations (PAVs) in 453 accessions with sequencing depth over 20x. Using either SVs or gene PAVs, we were able to reconstruct the population structure of O. sativa, which was consistent with previous result based on SNPs. Moreover, we demonstrated the usefulness of the new data sets by successfully detecting the strong association of the “Green Revolution gene”, sd1, with plant height. Our data provide a more comprehensive view of the genetic diversity within rice, as well as additional genomic resources for research in rice breeding and plant biology.

]]>
<![CDATA[DataTri, a database of American triatomine species occurrence]]> https://www.researchpad.co/product?articleinfo=5bff41fed5eed0c484aa22af

Trypanosoma cruzi, the causative agent of Chagas disease, is transmitted to mammals - including humans - by insect vectors of the subfamily Triatominae. We present the results of a compilation of triatomine occurrence and complementary ecological data that represents the most complete, integrated and updated database (DataTri) available on triatomine species at a continental scale. This database was assembled by collecting the records of triatomine species published from 1904 to 2017, spanning all American countries with triatomine presence. A total of 21815 georeferenced records were obtained from published literature, personal fieldwork and data provided by colleagues. The data compiled includes 24 American countries, 14 genera and 135 species. From a taxonomic perspective, 67.33% of the records correspond to the genus Triatoma, 20.81% to Panstrongylus, 9.01% to Rhodnius and the remaining 2.85% are distributed among the other 11 triatomine genera. We encourage using DataTri information in various areas, especially to improve knowledge of the geographical distribution of triatomine species and its variations in time.

]]>
<![CDATA[PVCbase: an integrated web resource for the PVC bacterial proteomes]]> https://www.researchpad.co/product?articleinfo=5b58f14c463d7e541411604f

Abstract

Interest in the Planctomycetes-Verrucomicrobia-Chlamydiae (PVC) bacterial superphylum is growing within the microbiology community. These organisms do not have a specialized web resource that gathers in silico predictions in an integrated fashion. Hence, we are providing the PVC community with PVCbase, a specialized web resource that gathers in silico predictions in an integrated fashion. PVCbase integrates protein function annotations obtained through sequence analysis and tertiary structure prediction for 39 representative PVC proteomes (PVCdb), a protein feature visualizer (Foundation) and a custom BLAST webserver (PVCBlast) that allows to retrieve the annotation of a hit directly from the DataTables. We display results from various predictors, encompassing most functional aspects, allowing users to have a more comprehensive overview of protein identities. Additionally, we illustrate how the application of PVCdb can be used to address biological questions from raw data.

Database URL: PVCbase is freely accessible at www.pvcbacteria.org/pvcbase

]]>
<![CDATA[Probabilistic and machine learning-based retrieval approaches for biomedical dataset retrieval]]> https://www.researchpad.co/product?articleinfo=5b586b1a463d7e489147c580

Abstract

The bioCADDIE dataset retrieval challenge brought together different approaches to retrieval of biomedical datasets relevant to a user’s query, expressed as a text description of a needed dataset. We describe experiments in applying a data-driven, machine learning-based approach to biomedical dataset retrieval as part of this challenge. We report on a series of experiments carried out to evaluate the performance of both probabilistic and machine learning-driven techniques from information retrieval, as applied to this challenge. Our experiments with probabilistic information retrieval methods, such as query term weight optimization, automatic query expansion and simulated user relevance feedback, demonstrate that automatically boosting the weights of important keywords in a verbose query is more effective than other methods. We also show that although there is a rich space of potential representations and features available in this domain, machine learning-based re-ranking models are not able to improve on probabilistic information retrieval techniques with the currently available training data. The models and algorithms presented in this paper can serve as a viable implementation of a search engine to provide access to biomedical datasets. The retrieval performance is expected to be further improved by using additional training data that is created by expert annotation, or gathered through usage logs, clicks and other processes during natural operation of the system.

Database URL: https://github.com/emory-irlab/biocaddie

]]>
<![CDATA[A Mediterranean coastal database for assessing the impacts of sea-level rise and associated hazards]]> https://www.researchpad.co/product?articleinfo=5b4cf84b463d7e12d26b018a

We have developed a new coastal database for the Mediterranean basin that is intended for coastal impact and adaptation assessment to sea-level rise and associated hazards on a regional scale. The data structure of the database relies on a linear representation of the coast with associated spatial assessment units. Using information on coastal morphology, human settlements and administrative boundaries, we have divided the Mediterranean coast into 13 900 coastal assessment units. To these units we have spatially attributed 160 parameters on the characteristics of the natural and socio-economic subsystems, such as extreme sea levels, vertical land movement and number of people exposed to sea-level rise and extreme sea levels. The database contains information on current conditions and on plausible future changes that are essential drivers for future impacts, such as sea-level rise rates and socio-economic development. Besides its intended use in risk and impact assessment, we anticipate that the Mediterranean Coastal Database (MCD) constitutes a useful source of information for a wide range of coastal applications.

]]>
<![CDATA[AntiTbPdb: a knowledgebase of anti-tubercular peptides]]> https://www.researchpad.co/product?articleinfo=5bf986ded5eed0c4843490b1

Abstract

Tuberculosis is a global menace, caused by Mycobacterium tuberculosis, responsible for millions of premature deaths every year. In the era of drug-resistant tuberculosis, peptide-based therapeutics may provide alternate to small molecule based drugs. In order to create knowledgebase, AntiTbPdb (http://webs.iiitd.edu.in/raghava/antitbpdb/), experimentally validated anti-tubercular and anti-mycobacterial peptides were compiled from literature. We curate 10 652 research articles and 35 patents to extract anti-tubercular peptides and annotate these peptides manually. This knowledgebase has 1010 entries, each entry provides extensive information about an anti-tubercular peptide such as sequence, chemical modification, chirality, nature and source of origin. The tertiary structure of these anti-tubercular peptides containing natural as well as chemically modified residues was predicted using PEPstrMOD and I-TASSER. In addition to structural information, database maintains other properties of peptides like physiochemical properties. Numerous web-based tools have been integrated for data retrieval, browsing, sequence similarity search and peptide mapping. In order to assist wide range of user, we developed a responsive website suitable for smartphone, tablet and desktop.

Database URL: http://webs.iiitd.edu.in/raghava/antitbpdb/

]]>
<![CDATA[miRwayDB: a database for experimentally validated microRNA-pathway associations in pathophysiological conditions]]> https://www.researchpad.co/product?articleinfo=5bf986dbd5eed0c484348f40

Abstract

MicroRNAs (miRNAs) are well-known as key regulators of diverse biological pathways. A series of experimental evidences have shown that abnormal miRNA expression profiles are responsible for various pathophysiological conditions by modulating genes in disease associated pathways. In spite of the rapid increase in research data confirming such associations, scientists still do not have access to a consolidated database offering these miRNA-pathway association details for critical diseases. We have developed miRwayDB, a database providing comprehensive information of experimentally validated miRNA-pathway associations in various pathophysiological conditions utilizing data collected from published literature. To the best of our knowledge, it is the first database that provides information about experimentally validated miRNA mediated pathway dysregulation as seen specifically in critical human diseases and hence indicative of a cause-and-effect relationship in most cases. The current version of miRwayDB collects an exhaustive list of miRNA-pathway association entries for 76 critical disease conditions by reviewing 663 published articles. Each database entry contains complete information on the name of the pathophysiological condition, associated miRNA(s), experimental sample type(s), regulation pattern (up/down) of miRNA, pathway association(s), targeted member of dysregulated pathway(s) and a brief description. In addition, miRwayDB provides miRNA, gene and pathway score to evaluate the role of a miRNA regulated pathways in various pathophysiological conditions. The database can also be used for other biomedical approaches such as validation of computational analysis, integrated analysis and prediction of computational model. It also offers a submission page to submit novel data from recently published studies. We believe that miRwayDB will be a useful tool for miRNA research community.

Database URL: http://www.mirway.iitkgp.ac.in

]]>
<![CDATA[Updated regulation curation model at the Saccharomyces Genome Database]]> https://www.researchpad.co/product?articleinfo=5bf986dcd5eed0c48434900b

Abstract

The Saccharomyces Genome Database (SGD) provides comprehensive, integrated biological information for the budding yeast Saccharomyces cerevisiae, along with search and analysis tools to explore these data, enabling the discovery of functional relationships between sequence and gene products in fungi and higher organisms. We have recently expanded our data model for regulation curation to address regulation at the protein level in addition to transcription, and are presenting the expanded data on the ‘Regulation’ pages at SGD. These pages include a summary describing the context under which the regulator acts, manually curated and high-throughput annotations showing the regulatory relationships for that gene and a graphical visualization of its regulatory network and connected networks. For genes whose products regulate other genes or proteins, the Regulation page includes Gene Ontology enrichment analysis of the biological processes in which those targets participate. For DNA-binding transcription factors, we also provide other information relevant to their regulatory function, such as DNA binding site motifs and protein domains. As with other data types at SGD, all regulatory relationships and accompanying data are available through YeastMine, SGD’s data warehouse based on InterMine.

Database URL: http://www.yeastgenome.org

]]>
<![CDATA[Prevention of data duplication for high throughput sequencing repositories]]> https://www.researchpad.co/product?articleinfo=5bf986d9d5eed0c484348e8e

Abstract

Prevention of unintended duplication is one of the ongoing challenges many databases have to address. Working with high-throughput sequencing data, the complexity of that challenge increases with the complexity of the definition of a duplicate. In a computational data model, a data object represents a real entity like a reagent or a biosample. This representation is similar to how a card represents a book in a paper library catalog. Duplicated data objects not only waste storage, they can mislead users into assuming the model represents more than the single entity. Even if it is clear that two objects represent a single entity, data duplication opens the door to potential inconsistencies between the objects since the content of the duplicated objects can be updated independently, allowing divergence of the metadata associated with the objects. Analogously to a situation in which a catalog in a paper library would contain by mistake two cards for a single copy of a book. If these cards are listing simultaneously two different individuals as current book borrowers, it would be difficult to determine which borrower (out of the two listed) actually has the book. Unfortunately, in a large database with multiple submitters, unintended duplication is to be expected. In this article, we present three principal guidelines the Encyclopedia of DNA Elements (ENCODE) Portal follows in order to prevent unintended duplication of both actual files and data objects: definition of identifiable data objects (I), object uniqueness validation (II) and de-duplication mechanism (III). In addition to explaining our modus operandi, we elaborate on the methods used for identification of sequencing data files. Comparison of the approach taken by the ENCODE Portal vs other widely used biological data repositories is provided.

Database URL: https://www.encodeproject.org/

]]>
<![CDATA[eGenPub, a text mining system for extending computationally mapped bibliography for UniProt Knowledgebase by capturing centrality]]> https://www.researchpad.co/product?articleinfo=5bf033c5d5eed0c4849045e3 <![CDATA[Enforcement of entailment constraints in distributed service-based business processes]]> https://www.researchpad.co/product?articleinfo=5bcf6adb40307c74ebb862ad <![CDATA[An uncertainty and sensitivity analysis approach for GIS-based multicriteria landslide susceptibility mapping]]> https://www.researchpad.co/product?articleinfo=5bc85de640307c1d3bc5c5b3 <![CDATA[tmBioC: improving interoperability of text-mining tools with BioC]]> https://www.researchpad.co/product?articleinfo=5ba69b8640307c1e9641ba3a

The lack of interoperability among biomedical text-mining tools is a major bottleneck in creating more complex applications. Despite the availability of numerous methods and techniques for various text-mining tasks, combining different tools requires substantial efforts and time owing to heterogeneity and variety in data formats. In response, BioC is a recent proposal that offers a minimalistic approach to tool interoperability by stipulating minimal changes to existing tools and applications. BioC is a family of XML formats that define how to present text documents and annotations, and also provides easy-to-use functions to read/write documents in the BioC format. In this study, we introduce our text-mining toolkit, which is designed to perform several challenging and significant tasks in the biomedical domain, and repackage the toolkit into BioC to enhance its interoperability.

Our toolkit consists of six state-of-the-art tools for named-entity recognition, normalization and annotation (PubTator) of genes (GenNorm), diseases (DNorm), mutations (tmVar), species (SR4GN) and chemicals (tmChem). Although developed within the same group, each tool is designed to process input articles and output annotations in a different format. We modify these tools and enable them to read/write data in the proposed BioC format. We find that, using the BioC family of formats and functions, only minimal changes were required to build the newer versions of the tools. The resulting BioC wrapped toolkit, which we have named tmBioC, consists of our tools in BioC, an annotated full-text corpus in BioC, and a format detection and conversion tool.

Furthermore, through participation in the 2013 BioCreative IV Interoperability Track, we empirically demonstrate that the tools in tmBioC can be more efficiently integrated with each other as well as with external tools: Our experimental results show that using BioC reduces >60% in lines of code for text-mining tool integration. The tmBioC toolkit is publicly available at http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/.

Database URL: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/

]]>