ResearchPad - Statistics, Probability and Uncertainty https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[High-resolution reconstruction of the United States human population distribution, 1790 to 2010]]> https://www.researchpad.co/product?articleinfo=5bff4203d5eed0c484aa23ca

Where do people live, and how has this changed over timescales of centuries? High-resolution spatial information on historical human population distribution is of great significance to understand human-environment interactions and their temporal dynamics. However, the complex relationship between population distribution and various influencing factors coupled with limited data availability make it a challenge to reconstruct human population distribution over timescales of centuries. This study generated 1-km decadal population maps for the conterminous US from 1790 to 2010 using parsimonious models based on natural suitability, socioeconomic desirability, and inhabitability. Five models of increasing complexity were evaluated. The models were validated with census tract and county subdivision population data in 2000 and were applied to generate five sets of 22 historical population maps from 1790–2010. Separating urban and rural areas and excluding non-inhabitable areas were the most important factors for improving the overall accuracy. The generated gridded population datasets and the production and validation methods are described here.

]]>
<![CDATA[Wide-field corneal subbasal nerve plexus mosaics in age-controlled healthy and type 2 diabetes populations]]> https://www.researchpad.co/product?articleinfo=5bff4207d5eed0c484aa2455

A dense nerve plexus in the clear outer window of the eye, the cornea, can be imaged in vivo to enable non-invasive monitoring of peripheral nerve degeneration in diabetes. However, a limited field of view of corneal nerves, operator-dependent image quality, and subjective image sampling methods have led to difficulty in establishing robust diagnostic measures relating to the progression of diabetes and its complications. Here, we use machine-based algorithms to provide wide-area mosaics of the cornea’s subbasal nerve plexus (SBP) also accounting for depth (axial) fluctuation of the plexus. Degradation of the SBP with age has been mitigated as a confounding factor by providing a dataset comprising healthy and type 2 diabetes subjects of the same age. To maximize reuse, the dataset includes bilateral eye data, associated clinical parameters, and machine-generated SBP nerve density values obtained through automatic segmentation and nerve tracing algorithms. The dataset can be used to examine nerve degradation patterns to develop tools to non-invasively monitor diabetes progression while avoiding narrow-field imaging and image selection biases.

]]>
<![CDATA[A mobile brain-body imaging dataset recorded during treadmill walking with a brain-computer interface]]> https://www.researchpad.co/product?articleinfo=5bff4205d5eed0c484aa2410

We present a mobile brain-body imaging (MoBI) dataset acquired during treadmill walking in a brain-computer interface (BCI) task. The data were collected from eight healthy subjects, each having three identical trials. Each trial consisted of three conditions: standing, treadmill walking, and treadmill walking with a closed-loop BCI. During the BCI condition, subjects used their brain activity to control a virtual avatar on a screen to walk in real-time. Robust procedures were designed to record lower limb joint angles (bilateral hip, knee, and ankle) using goniometers synchronized with 60-channel scalp electroencephalography (EEG). Additionally, electrooculogram (EOG), EEG electrodes impedance, and digitized EEG channel locations were acquired to aid artifact removal and EEG dipole-source localization. This dataset is unique in that it is the first published MoBI dataset recorded during walking. It is useful in addressing several important open research questions, such as how EEG is coupled with gait cycle during closed-loop BCI, how BCI influences neural activity during walking, and how a BCI decoder may be optimized.

]]>
<![CDATA[High-throughput density-functional perturbation theory phonons for inorganic materials]]> https://www.researchpad.co/product?articleinfo=5bff420dd5eed0c484aa2540

The knowledge of the vibrational properties of a material is of key importance to understand physical phenomena such as thermal conductivity, superconductivity, and ferroelectricity among others. However, detailed experimental phonon spectra are available only for a limited number of materials, which hinders the large-scale analysis of vibrational properties and their derived quantities. In this work, we perform ab initio calculations of the full phonon dispersion and vibrational density of states for 1521 semiconductor compounds in the harmonic approximation based on density functional perturbation theory. The data is collected along with derived dielectric and thermodynamic properties. We present the procedure used to obtain the results, the details of the provided database and a validation based on the comparison with experimental data.

]]>
<![CDATA[Novel sequences, structural variations and gene presence variations of Asian cultivated rice]]> https://www.researchpad.co/product?articleinfo=5bff420ed5eed0c484aa25aa

Genomic diversity within a species genome is the genetic basis of its phenotypic diversity essential for its adaptation to environments. The big picture of the total genetic diversity within Asian cultivated rice has been uncovered since the sequencing of 3,000 rice genomes, including the SNP data publicly available in the SNP-Seek database. Here we report other aspects of the genetic diversity, including rice sequences assembled from over 3,000 accessions but absent in the Nipponbare reference genome, structural variations (SVs) and gene presence/absence variations (PAVs) in 453 accessions with sequencing depth over 20x. Using either SVs or gene PAVs, we were able to reconstruct the population structure of O. sativa, which was consistent with previous result based on SNPs. Moreover, we demonstrated the usefulness of the new data sets by successfully detecting the strong association of the “Green Revolution gene”, sd1, with plant height. Our data provide a more comprehensive view of the genetic diversity within rice, as well as additional genomic resources for research in rice breeding and plant biology.

]]>
<![CDATA[DataTri, a database of American triatomine species occurrence]]> https://www.researchpad.co/product?articleinfo=5bff41fed5eed0c484aa22af

Trypanosoma cruzi, the causative agent of Chagas disease, is transmitted to mammals - including humans - by insect vectors of the subfamily Triatominae. We present the results of a compilation of triatomine occurrence and complementary ecological data that represents the most complete, integrated and updated database (DataTri) available on triatomine species at a continental scale. This database was assembled by collecting the records of triatomine species published from 1904 to 2017, spanning all American countries with triatomine presence. A total of 21815 georeferenced records were obtained from published literature, personal fieldwork and data provided by colleagues. The data compiled includes 24 American countries, 14 genera and 135 species. From a taxonomic perspective, 67.33% of the records correspond to the genus Triatoma, 20.81% to Panstrongylus, 9.01% to Rhodnius and the remaining 2.85% are distributed among the other 11 triatomine genera. We encourage using DataTri information in various areas, especially to improve knowledge of the geographical distribution of triatomine species and its variations in time.

]]>
<![CDATA[A Mediterranean coastal database for assessing the impacts of sea-level rise and associated hazards]]> https://www.researchpad.co/product?articleinfo=5b4cf84b463d7e12d26b018a

We have developed a new coastal database for the Mediterranean basin that is intended for coastal impact and adaptation assessment to sea-level rise and associated hazards on a regional scale. The data structure of the database relies on a linear representation of the coast with associated spatial assessment units. Using information on coastal morphology, human settlements and administrative boundaries, we have divided the Mediterranean coast into 13 900 coastal assessment units. To these units we have spatially attributed 160 parameters on the characteristics of the natural and socio-economic subsystems, such as extreme sea levels, vertical land movement and number of people exposed to sea-level rise and extreme sea levels. The database contains information on current conditions and on plausible future changes that are essential drivers for future impacts, such as sea-level rise rates and socio-economic development. Besides its intended use in risk and impact assessment, we anticipate that the Mediterranean Coastal Database (MCD) constitutes a useful source of information for a wide range of coastal applications.

]]>
<![CDATA[The Hamming Ball Sampler]]> https://www.researchpad.co/product?articleinfo=5bf6be03d5eed0c484d2b813

ABSTRACT

We introduce the Hamming ball sampler, a novel Markov chain Monte Carlo algorithm, for efficient inference in statistical models involving high-dimensional discrete state spaces. The sampling scheme uses an auxiliary variable construction that adaptively truncates the model space allowing iterative exploration of the full model space. The approach generalizes conventional Gibbs sampling schemes for discrete spaces and provides an intuitive means for user-controlled balance between statistical efficiency and computational tractability. We illustrate the generic utility of our sampling algorithm through application to a range of statistical models. Supplementary materials for this article are available online.

]]>
<![CDATA[Variable Selection in Kernel Regression Using Measurement Error Selection Likelihoods]]> https://www.researchpad.co/product?articleinfo=5b5c02a3463d7e28a3e55d69 ]]> <![CDATA[Maximum type 1 error rate inflation in multiarmed clinical trials with adaptive interim sample size modifications]]> https://www.researchpad.co/product?articleinfo=5add677f463d7e355c484536

Sample size modifications in the interim analyses of an adaptive design can inflate the type 1 error rate, if test statistics and critical boundaries are used in the final analysis as if no modification had been made. While this is already true for designs with an overall change of the sample size in a balanced treatment-control comparison, the inflation can be much larger if in addition a modification of allocation ratios is allowed as well. In this paper, we investigate adaptive designs with several treatment arms compared to a single common control group. Regarding modifications, we consider treatment arm selection as well as modifications of overall sample size and allocation ratios. The inflation is quantified for two approaches: a naive procedure that ignores not only all modifications, but also the multiplicity issue arising from the many-to-one comparison, and a Dunnett procedure that ignores modifications, but adjusts for the initially started multiple treatments. The maximum inflation of the type 1 error rate for such types of design can be calculated by searching for the “worst case” scenarios, that are sample size adaptation rules in the interim analysis that lead to the largest conditional type 1 error rate in any point of the sample space. To show the most extreme inflation, we initially assume unconstrained second stage sample size modifications leading to a large inflation of the type 1 error rate. Furthermore, we investigate the inflation when putting constraints on the second stage sample sizes. It turns out that, for example fixing the sample size of the control group, leads to designs controlling the type 1 error rate.

]]>
<![CDATA[Multiple-membership multiple-classification models for social network and group dependences]]> https://www.researchpad.co/product?articleinfo=5add67b9463d7e355c484537

The social network literature on network dependences has largely ignored other sources of dependence, such as the school that a student attends, or the area in which an individual lives. The multilevel modelling literature on school and area dependences has, in turn, largely ignored social networks. To bridge this divide, a multiple-membership multiple-classification modelling approach for jointly investigating social network and group dependences is presented. This allows social network and group dependences on individual responses to be investigated and compared. The approach is used to analyse a subsample of the Adolescent Health Study data set from the USA, where the response variable of interest is individual level educational attainment, and the three individual level covariates are sex, ethnic group and age. Individual, network, school and area dependences are accounted for in the analysis. The network dependences can be accounted for by including the network as a classification in the model, using various network configurations, such as ego-nets and cliques. The results suggest that ignoring the network affects the estimates of variation for the classifications that are included in the random part of the model (school, area and individual), as well as having some influence on the point estimates and standard errors of the estimates of regression coefficients for covariates in the fixed part of the model. From a substantive perspective, this approach provides a flexible and practical way of investigating variation in an individual level response due to social network dependences, and estimating the share of variation of an individual response for network, school and area classifications.

]]>