ResearchPad - Library and Information Sciences Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[High-resolution reconstruction of the United States human population distribution, 1790 to 2010]]>

Where do people live, and how has this changed over timescales of centuries? High-resolution spatial information on historical human population distribution is of great significance to understand human-environment interactions and their temporal dynamics. However, the complex relationship between population distribution and various influencing factors coupled with limited data availability make it a challenge to reconstruct human population distribution over timescales of centuries. This study generated 1-km decadal population maps for the conterminous US from 1790 to 2010 using parsimonious models based on natural suitability, socioeconomic desirability, and inhabitability. Five models of increasing complexity were evaluated. The models were validated with census tract and county subdivision population data in 2000 and were applied to generate five sets of 22 historical population maps from 1790–2010. Separating urban and rural areas and excluding non-inhabitable areas were the most important factors for improving the overall accuracy. The generated gridded population datasets and the production and validation methods are described here.

<![CDATA[Wide-field corneal subbasal nerve plexus mosaics in age-controlled healthy and type 2 diabetes populations]]>

A dense nerve plexus in the clear outer window of the eye, the cornea, can be imaged in vivo to enable non-invasive monitoring of peripheral nerve degeneration in diabetes. However, a limited field of view of corneal nerves, operator-dependent image quality, and subjective image sampling methods have led to difficulty in establishing robust diagnostic measures relating to the progression of diabetes and its complications. Here, we use machine-based algorithms to provide wide-area mosaics of the cornea’s subbasal nerve plexus (SBP) also accounting for depth (axial) fluctuation of the plexus. Degradation of the SBP with age has been mitigated as a confounding factor by providing a dataset comprising healthy and type 2 diabetes subjects of the same age. To maximize reuse, the dataset includes bilateral eye data, associated clinical parameters, and machine-generated SBP nerve density values obtained through automatic segmentation and nerve tracing algorithms. The dataset can be used to examine nerve degradation patterns to develop tools to non-invasively monitor diabetes progression while avoiding narrow-field imaging and image selection biases.

<![CDATA[A mobile brain-body imaging dataset recorded during treadmill walking with a brain-computer interface]]>

We present a mobile brain-body imaging (MoBI) dataset acquired during treadmill walking in a brain-computer interface (BCI) task. The data were collected from eight healthy subjects, each having three identical trials. Each trial consisted of three conditions: standing, treadmill walking, and treadmill walking with a closed-loop BCI. During the BCI condition, subjects used their brain activity to control a virtual avatar on a screen to walk in real-time. Robust procedures were designed to record lower limb joint angles (bilateral hip, knee, and ankle) using goniometers synchronized with 60-channel scalp electroencephalography (EEG). Additionally, electrooculogram (EOG), EEG electrodes impedance, and digitized EEG channel locations were acquired to aid artifact removal and EEG dipole-source localization. This dataset is unique in that it is the first published MoBI dataset recorded during walking. It is useful in addressing several important open research questions, such as how EEG is coupled with gait cycle during closed-loop BCI, how BCI influences neural activity during walking, and how a BCI decoder may be optimized.

<![CDATA[High-throughput density-functional perturbation theory phonons for inorganic materials]]>

The knowledge of the vibrational properties of a material is of key importance to understand physical phenomena such as thermal conductivity, superconductivity, and ferroelectricity among others. However, detailed experimental phonon spectra are available only for a limited number of materials, which hinders the large-scale analysis of vibrational properties and their derived quantities. In this work, we perform ab initio calculations of the full phonon dispersion and vibrational density of states for 1521 semiconductor compounds in the harmonic approximation based on density functional perturbation theory. The data is collected along with derived dielectric and thermodynamic properties. We present the procedure used to obtain the results, the details of the provided database and a validation based on the comparison with experimental data.

<![CDATA[Novel sequences, structural variations and gene presence variations of Asian cultivated rice]]>

Genomic diversity within a species genome is the genetic basis of its phenotypic diversity essential for its adaptation to environments. The big picture of the total genetic diversity within Asian cultivated rice has been uncovered since the sequencing of 3,000 rice genomes, including the SNP data publicly available in the SNP-Seek database. Here we report other aspects of the genetic diversity, including rice sequences assembled from over 3,000 accessions but absent in the Nipponbare reference genome, structural variations (SVs) and gene presence/absence variations (PAVs) in 453 accessions with sequencing depth over 20x. Using either SVs or gene PAVs, we were able to reconstruct the population structure of O. sativa, which was consistent with previous result based on SNPs. Moreover, we demonstrated the usefulness of the new data sets by successfully detecting the strong association of the “Green Revolution gene”, sd1, with plant height. Our data provide a more comprehensive view of the genetic diversity within rice, as well as additional genomic resources for research in rice breeding and plant biology.

<![CDATA[DataTri, a database of American triatomine species occurrence]]>

Trypanosoma cruzi, the causative agent of Chagas disease, is transmitted to mammals - including humans - by insect vectors of the subfamily Triatominae. We present the results of a compilation of triatomine occurrence and complementary ecological data that represents the most complete, integrated and updated database (DataTri) available on triatomine species at a continental scale. This database was assembled by collecting the records of triatomine species published from 1904 to 2017, spanning all American countries with triatomine presence. A total of 21815 georeferenced records were obtained from published literature, personal fieldwork and data provided by colleagues. The data compiled includes 24 American countries, 14 genera and 135 species. From a taxonomic perspective, 67.33% of the records correspond to the genus Triatoma, 20.81% to Panstrongylus, 9.01% to Rhodnius and the remaining 2.85% are distributed among the other 11 triatomine genera. We encourage using DataTri information in various areas, especially to improve knowledge of the geographical distribution of triatomine species and its variations in time.

<![CDATA[Is there a place for undergraduate and graduate students in the systematic review process?]]>

Systematic reviews are a well-established and well-honed research methodology in the medical and health sciences fields. As the popularity of systematic reviews has increased, disciplines outside the sciences have started publishing them. This increase in familiarity has begun to trickle down from practitioners and faculty to graduate students and recently undergraduates. The amount of work and rigor that goes into producing a quality systematic review may make these types of research projects seem unattainable for undergraduate or graduate students, but is this an accurate assumption? This commentary discusses whether there is a place for undergraduate and graduate students in the systematic review process. It explains the possible benefits of having undergraduate and graduate students engage in systematic reviews and concludes with ideas for creating basic education or training opportunities for researchers and students who are new to the systematic review process.

<![CDATA[A Mediterranean coastal database for assessing the impacts of sea-level rise and associated hazards]]>

We have developed a new coastal database for the Mediterranean basin that is intended for coastal impact and adaptation assessment to sea-level rise and associated hazards on a regional scale. The data structure of the database relies on a linear representation of the coast with associated spatial assessment units. Using information on coastal morphology, human settlements and administrative boundaries, we have divided the Mediterranean coast into 13 900 coastal assessment units. To these units we have spatially attributed 160 parameters on the characteristics of the natural and socio-economic subsystems, such as extreme sea levels, vertical land movement and number of people exposed to sea-level rise and extreme sea levels. The database contains information on current conditions and on plausible future changes that are essential drivers for future impacts, such as sea-level rise rates and socio-economic development. Besides its intended use in risk and impact assessment, we anticipate that the Mediterranean Coastal Database (MCD) constitutes a useful source of information for a wide range of coastal applications.

<![CDATA[Break Down in Order To Build Up: Decomposing Small Molecules for Fragment-Based Drug Design with eMolFrag]]>


Constructing high-quality libraries of molecular building blocks is essential for successful fragment-based drug discovery. In this communication, we describe eMolFrag, a new open-source software to decompose organic compounds into nonredundant fragments retaining molecular connectivity information. Given a collection of molecules, eMolFrag generates a set of unique fragments comprising larger moieties, bricks, and smaller linkers connecting bricks. These building blocks can subsequently be used to construct virtual screening libraries for targeted drug discovery. The robustness and computational performance of eMolFrag is assessed against the Directory of Useful Decoys, Enhanced database conducted in serial and parallel modes with up to 16 computing cores. Further, the application of eMolFrag in de novo drug design is illustrated using the adenosine receptor. eMolFrag is implemented in Python, and it is available as stand-alone software and a web server at and

<![CDATA[Tracking the follow-up of work in progress papers]]>

Academic conferences offer numerous submission tracks to support the inclusion of a variety of researchers and topics. Work in progress papers are one such submission type where authors present preliminary results in a poster session. They have recently gained popularity in the area of Human Computer Interaction (HCI) as a relatively easier pathway to attending the conference due to their higher acceptance rate as compared to the main tracks. However, it is not clear if these work in progress papers are further extended or transitioned into more complete and thorough full papers or are simply one-off pieces of research. In order to answer this we explore self-citation patterns of four work in progress editions in two popular HCI conferences (CHI2010, CHI2011, HRI2010 and HRI2011). Our results show that almost 50% of the work in progress papers do not have any self-citations and approximately only half of the self-citations can be considered as true extensions of the original work in progress paper. Specific conferences dominate as the preferred venue where extensions of these work in progress papers are published. Furthermore, the rate of self-citations peaks in the immediate year after publication and gradually tails off. By tracing author publication records, we also delve into possible reasons of work in progress papers not being cited in follow up publications. In conclusion, we speculate on the main trends observed and what they may mean looking ahead for the work in progress track of premier HCI conferences.

<![CDATA[Country-specific determinants of world university rankings]]>

This paper examines country-specific factors that affect the three most influential world university rankings (the Academic Ranking of World Universities, the QS World University Ranking, and the Times Higher Education World University Ranking). We run a cross sectional regression that covers 42–71 countries (depending on the ranking and data availability). We show that the position of universities from a country in the ranking is determined by the following country-specific variables: economic potential of the country, research and development expenditure, long-term political stability (freedom from war, occupation, coups and major changes in the political system), and institutional variables, including government effectiveness.

<![CDATA[Discrepancies among Scopus, Web of Science, and PubMed coverage of funding information in medical journal articles]]>


The overall aim of the present study was to compare the coverage of existing research funding information for articles indexed in Scopus, Web of Science, and PubMed databases.


The numbers of articles with funding information published in 2015 were identified in the three selected databases and compared using bibliometric analysis of a sample of twenty-eight prestigious medical journals.


Frequency analysis of the number of articles with funding information showed statistically significant differences between Scopus, Web of Science, and PubMed databases. The largest proportion of articles with funding information was found in Web of Science (29.0%), followed by PubMed (14.6%) and Scopus (7.7%).


The results show that coverage of funding information differs significantly among Scopus, Web of Science, and PubMed databases in a sample of the same medical journals. Moreover, we found that, currently, funding data in PubMed is more difficult to obtain and analyze compared with that in the other two databases.

<![CDATA[Local Protein Structure Refinement via Molecular Dynamics Simulations with locPREFMD]]> <![CDATA[An uncertainty and sensitivity analysis approach for GIS-based multicriteria landslide susceptibility mapping]]> <![CDATA[Prediction of Substrates for Glutathione Transferases by Covalent Docking]]>


Enzymes in the glutathione transferase (GST) superfamily catalyze the conjugation of glutathione (GSH) to electrophilic substrates. As a consequence they are involved in a number of key biological processes, including protection of cells against chemical damage, steroid and prostaglandin biosynthesis, tyrosine catabolism, and cell apoptosis. Although virtual screening has been used widely to discover substrates by docking potential noncovalent ligands into active site clefts of enzymes, docking has been rarely constrained by a covalent bond between the enzyme and ligand. In this study, we investigate the accuracy of docking poses and substrate discovery in the GST superfamily, by docking 6738 potential ligands from the KEGG and MetaCyc compound libraries into 14 representative GST enzymes with known structures and substrates using the PLOP program [JacobsonProteins2004, 55, 35115048827]. For X-ray structures as receptors, one of the top 3 ranked models is within 3 Å all-atom root mean square deviation (RMSD) of the native complex in 11 of the 14 cases; the enrichment LogAUC value is better than random in all cases, and better than 25 in 7 of 11 cases. For comparative models as receptors, near-native ligand–enzyme configurations are often sampled but difficult to rank highly. For models based on templates with the highest sequence identity, the enrichment LogAUC is better than 25 in 5 of 11 cases, not significantly different from the crystal structures. In conclusion, we show that covalent docking can be a useful tool for substrate discovery and point out specific challenges for future method improvement.

<![CDATA[An Unbiased Method To Build Benchmarking Sets for Ligand-Based Virtual Screening and its Application To GPCRs]]>


Benchmarking data sets have become common in recent years for the purpose of virtual screening, though the main focus had been placed on the structure-based virtual screening (SBVS) approaches. Due to the lack of crystal structures, there is great need for unbiased benchmarking sets to evaluate various ligand-based virtual screening (LBVS) methods for important drug targets such as G protein-coupled receptors (GPCRs). To date these ready-to-apply data sets for LBVS are fairly limited, and the direct usage of benchmarking sets designed for SBVS could bring the biases to the evaluation of LBVS. Herein, we propose an unbiased method to build benchmarking sets for LBVS and validate it on a multitude of GPCRs targets. To be more specific, our methods can (1) ensure chemical diversity of ligands, (2) maintain the physicochemical similarity between ligands and decoys, (3) make the decoys dissimilar in chemical topology to all ligands to avoid false negatives, and (4) maximize spatial random distribution of ligands and decoys. We evaluated the quality of our Unbiased Ligand Set (ULS) and Unbiased Decoy Set (UDS) using three common LBVS approaches, with Leave-One-Out (LOO) Cross-Validation (CV) and a metric of average AUC of the ROC curves. Our method has greatly reduced the “artificial enrichment” and “analogue bias” of a published GPCRs benchmarking set, i.e., GPCR Ligand Library (GLL)/GPCR Decoy Database (GDD). In addition, we addressed an important issue about the ratio of decoys per ligand and found that for a range of 30 to 100 it does not affect the quality of the benchmarking set, so we kept the original ratio of 39 from the GLL/GDD.

<![CDATA[On the bibliometric coordinates of four different research fields in Geography]]>

This study is a bibliometric analysis of the highly complex research discipline Geography. In order to identify the most popular and most cited publication channels, to reveal publication strategies, and to analyse the discipline’s coverage within publications, the three main data sources for citation analyses, namely Web of Science, Scopus and Google Scholar, have been utilized. This study is based on publication data collected for four individual evaluation exercises performed at the University of Vienna and related to four different subfields: Geoecology, Social and Economic Geography, Demography and Population Geography, and Economic Geography. The results show very heterogeneous and individual publication strategies, even in the same research fields. Monographs, journal articles and book chapters are the most cited document types. Differences between research fields more related to the natural sciences than to the social sciences are clearly visible, but less considerable when taking into account the higher number of co-authors. General publication strategies seem to be established for both natural science and social sciences, however, with significant differences. While in natural science mainly publications in international peer-reviewed scientific journals matter, the focus in social sciences is rather on book chapters, reports and monographs. Although an “iceberg citation model” is suggested, citation analyses for monographs, book chapters and reports should be conducted separately and should include complementary data sources, such as Google Scholar, in order to enhance the coverage and to improve the quality of the visibility and impact analyses. This is particularly important for social sciences related research within Geography.

<![CDATA[Identification of Novel Potential Antibiotics against Staphylococcus Using Structure-Based Drug Screening Targeting Dihydrofolate Reductase]]>


The emergence of multidrug-resistant Staphylococcus aureus (S. aureus) makes the treatment of infectious diseases in hospitals more difficult and increases the mortality of the patients. In this study, we attempted to identify novel potent antibiotic candidate compounds against S. aureus dihydrofolate reductase (saDHFR). We performed three-step in silico structure-based drug screening (SBDS) based on the crystal structure of saDHFR using a 154,118 chemical compound library. We subsequently evaluated whether candidate chemical compounds exhibited inhibitory effects on the growth of the model bacterium: Staphylococcus epidermidis (S. epidermidis). The compound KB1 showed a strong inhibitory effect on the growth of S. epidermidis. Moreover, we rescreened chemical structures similar to KB1 from a 461,397 chemical compound library. Three of the four KB1 analogs (KBS1, KBS3, and KBS4) showed inhibitory effects on the growth of S. epidermidis and enzyme inhibitory effects on saDHFR. We performed structure–activity relationship (SAR) analysis of active chemical compounds and observed a correlative relationship among the IC50 values, interaction residues, and structure scaffolds. In addition, the active chemical compounds (KB1, KBS3, and KBS4) had no inhibitory effects on the growth of model enterobacteria (E. coli BL21 and JM109 strains) and no toxic effects on cultured mammalian cells (MDCK cells). Results obtained from Protein Ligand Interaction Fingerprint (PLIF) and Ligand Interaction (LI) analyses suggested that all of the active compounds exhibited potential inhibitory effects on mutated saDHFR of the drug-resistant strains. The structural and experimental information concerning these novel chemical compounds will likely contribute to the development of new antibiotics for both wild-type and drug-resistant S. aureus.

<![CDATA[vSDC: a method to improve early recognition in virtual screening when limited experimental resources are available]]>


In drug design, one may be confronted to the problem of finding hits for targets for which no small inhibiting molecules are known and only low-throughput experiments are available (like ITC or NMR studies), two common difficulties encountered in a typical academic setting. Using a virtual screening strategy like docking can alleviate some of the problems and save a considerable amount of time by selecting only top-ranking molecules, but only if the method is very efficient, i.e. when a good proportion of actives are found in the 1–10 % best ranked molecules.


The use of several programs (in our study, Gold, Surflex, FlexX and Glide were considered) shows a divergence of the results, which presents a difficulty in guiding the experiments. To overcome this divergence and increase the yield of the virtual screening, we created the standard deviation consensus (SDC) and variable SDC (vSDC) methods, consisting of the intersection of molecule sets from several virtual screening programs, based on the standard deviations of their ranking distributions.


SDC allowed us to find hits for two new protein targets by testing only 9 and 11 small molecules from a chemical library of circa 15,000 compounds. Furthermore, vSDC, when applied to the 102 proteins of the DUD-E benchmarking database, succeeded in finding more hits than any of the four isolated programs for 13–60 % of the targets. In addition, when only 10 molecules of each of the 102 chemical libraries were considered, vSDC performed better in the number of hits found, with an improvement of 6–24 % over the 10 best-ranked molecules given by the individual docking programs.

Graphical abstract

In drug design, for a given target and a given chemical library, the results obtained with different virtual screening programs are divergent. So how to rationally guide the experimental tests, especially when only a few number of experiments can be made? The variable Standard Deviation Consensus (vSDC) method was developed to answer this issue. Left panel the vSDC principle consists of intersecting molecule sets, chosen on the basis of the standard deviations of their ranking distributions, obtained from various virtual screening programs. In this study Glide, Gold, FlexX and Surflex were used and tested on the 102 targets of the DUD-E database. Right panel Comparison of the average percentage of hits found with vSDC and each of the four programs, when only 10 molecules from each of the 102 chemical libraries of the DUD-E database were considered. On average, vSDC was capable of finding 38 % of the findable hits, against 34 % for Glide, 32 % for Gold, 16 % for FlexX and 14 % for Surflex, showing that with vSDC, it was possible to overcome the unpredictability of the virtual screening results and to improve them

Electronic supplementary material

The online version of this article (doi:10.1186/s13321-016-0112-z) contains supplementary material, which is available to authorized users.

<![CDATA[Asymmetric transfer hydrogenation of imines and ketones using chiral Ru(II)Cl(&#951;6-p-cymene)[(S,S)-N-TsDPEN] catalyst: a computational study]]> ]]>