ResearchPad - data-descriptor https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[The draft genome sequence of an upland wild rice species, <i>Oryza granulata</i>]]> https://www.researchpad.co/article/N9d6d7468-356d-46ef-9f3b-7ef1e31d501a Measurement(s)DNA • RNA • transcriptome • genome coverage • sequence_assembly • sequence feature annotationTechnology Type(s)DNA sequencing • RNA sequencing • flow cytometry method • computational modeling technique • sequence assembly process • sequence annotationSample Characteristic - OrganismOryza granulata

Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.12063198

]]>
<![CDATA[A draft genome assembly of spotted hyena, <i>Crocuta crocuta</i>]]> https://www.researchpad.co/article/N253f0403-b95a-4444-9930-85761f0a7b08 Measurement(s)DNA • genome • sequence_assembly • sequence feature annotationTechnology Type(s)DNA sequencing • sequence assembly process • sequence annotationSample Characteristic - OrganismCrocuta crocuta

Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.12091458

]]>
<![CDATA[Reference exome data for Australian Aboriginal populations to support health-based research]]> https://www.researchpad.co/article/N7abf3bc0-ae37-46cf-8ce6-d7890cc1aa8f Measurement(s)Aboriginal Australian • DNA • sequence feature annotationTechnology Type(s)Whole Exome Sequencing • DNA sequencing • sequence annotationFactor Type(s)ancestry • sex • ageSample Characteristic - OrganismHomo sapiensSample Characteristic - LocationNorthern Territory

Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.12040638

]]>
<![CDATA[Goteo.org civic crowdfunding and match-funding data connecting Sustainable Development Goals]]> https://www.researchpad.co/article/N690985e4-78a7-4e21-9c12-55b0b3ad4ddc Measurement(s)crowdfunding campaigns • DonationTechnology Type(s)Goteo digital platform • digital curation

Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.12053847

]]>
<![CDATA[A multidecadal assessment of climate indices over Europe]]> https://www.researchpad.co/article/Nba5f089a-0607-454a-a75e-026d124ccb19 Measurement(s)climateTechnology Type(s)computational modeling techniqueFactor Type(s)air temperature • precipitation • sea level pressure • wind at 10 m • wind gust at 10 m • TOA radiation • radiation • insolation • snow density • snow depth • snowfall • total cloud cover • low cloud coverSample Characteristic - EnvironmentclimateSample Characteristic - LocationEurope

Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.12029427

]]>
<![CDATA[Electron microscopy dataset for the recognition of nanoscale ordering effects and location of nanoparticles]]> https://www.researchpad.co/article/N5d93fd94-d558-4eda-bbdd-a6a95a921e13

A unique ordering effect has been observed in functional catalytic nanoscale materials. Instead of randomly arranged binding to the catalyst surface, metal nanoparticles show spatially ordered behavior resulting in formation of geometrical patterns. Understanding of such nanoscale materials and analysis of corresponding microscopy images will never be comprehensive without appropriate reference datasets. Here we describe the first dataset of electron microscopy images comprising individual nanoparticles which undergo ordering on a surface towards the formation of geometrical patterns. The dataset developed in this study spans three levels of nanoscale organization: (i) individual nanoparticles (1–5 nm) and arrays of nanoparticles (5–20 nm), (ii) ordering effects (20–200 nm) and (iii) complex patterns (from nm to μm scales). The described dataset for the first time provides a possibility for the development of machine learning algorithms to study the unique phenomena of nanoparticles ordering and hierarchical organization.

]]>
<![CDATA[Rapid flood and damage mapping using synthetic aperture radar in response to Typhoon Hagibis, Japan]]> https://www.researchpad.co/article/Nbbbc6c99-415e-42e1-9dd1-b494cecf557a

During the aftermath of Typhoon Hagibis, we made flood and damage proxy maps, rapidly derived from synthetic aperture radar (SAR) data using change detection approaches. The maps have large spatial coverage over the Tokyo, Fukushima, Ibaraki, Iwate, and Nagano prefectures of Japan. The maps are also largely in agreement with various validation sources including aerial imagery, optical imagery and news sources. Apart from visual maps, we provide flood and damage extents in various formats compatible with geographic information system (GIS) applications. The data may potentially be used for applications such as typhoon risk modelling, investigating spatial correlations of typhoon impacts, and comparing alternative flood or damage mapping techniques.

]]>
<![CDATA[A database of freshwater fish species of the Amazon Basin]]> https://www.researchpad.co/article/N7dfc0a2c-b3b6-40b0-a56d-486886578576

The Amazon Basin is an unquestionable biodiversity hotspot, containing the highest freshwater biodiversity on earth and facing off a recent increase in anthropogenic threats. The current knowledge on the spatial distribution of the freshwater fish species is greatly deficient in this basin, preventing a comprehensive understanding of this hyper-diverse ecosystem as a whole. Filling this gap was the priority of a transnational collaborative project, i.e. the AmazonFish project - https://www.amazon-fish.com/. Relying on the outputs of this project, we provide the most complete fish species distribution records covering the whole Amazon drainage. The database, including 2,406 validated freshwater native fish species, 232,936 georeferenced records, results from an extensive survey of species distribution including 590 different sources (e.g. published articles, grey literature, online biodiversity databases and scientific collections from museums and universities worldwide) and field expeditions conducted during the project. This database, delivered at both georeferenced localities (21,500 localities) and sub-drainages grains (144 units), represents a highly valuable source of information for further studies on freshwater fish biodiversity, biogeography and conservation.

]]>
<![CDATA[Multidisciplinary database of permeability of fault zones and surrounding protolith rocks at world-wide sites]]> https://www.researchpad.co/article/N8be42098-bcd6-4224-afe0-70dc1efb443e

Brittle faults and fault zones are important fluid flow conduits through the upper part of Earth’s crust that are involved in many well-known phenomena (e.g. earthquakes, thermal water and gas transport, or water leakage to underground tunnels). The permeability property, or the ability of porous materials to conduct water and gas, is one of the key parameters required in understanding and predicting fluid flow. Although close to a thousand studies have been done, and permeability tested in parts of fault zones, a sytematic summary and database is lacking. This data descriptor is for a multi-disciplinary world-wide compilation and review of bulk and matrix permeability of fault zones: 410 datasets, 521 reviewed sites, 379 locations, >10000 publications searched. The review covers studies of faulting processes, geothermal engineering, radioactive waste repositories, groundwater resources, petroleum reservoirs, and underground engineering projects. The objectives are to stimulate the cross-disciplinary data sharing and communication about fault zone hydrogeology, document the biases and strategies for testing of fault zones, and provide the basic statistics of permeability values for models that require these parameters.

]]>
<![CDATA[Draft genomes of two Atlantic bay scallop subspecies Argopecten irradians irradians and A. i. concentricus]]> https://www.researchpad.co/article/N8b5be863-0eca-42b7-9b70-729c94ead8fe

The two subspecies of Atlantic bay scallop (Argopecten irradians), A. i. irradians and A. i. concentricus, are economically important aquacultural species in northern and southern China. Here, we performed the whole-genome sequencing, assembly, and gene annotation and produced draft genomes for both subspecies. In total, 253.17 and 272.97 gigabases (Gb) of raw reads were generated from Illumina Hiseq and PacBio platforms for A. i. irradians and A. i. concentricus, respectively. Draft genomes of 835.7 Mb and 874.82 Mb were assembled for the two subspecies, accounting for 83.9% and 89.79% of the estimated sizes of their corresponding genomes, respectively. The contig N50 and scaffold N50 were 78.54 kb and 1.53 Mb for the A. i. irradians genome, and those for the A. i. concentricus genome were 63.73 kb and 1.25 Mb. Moreover, 26,777 and 25,979 protein-coding genes were predicted for A. i. irradians and A. i. concentricus, respectively. These valuable genome assemblies lay a solid foundation for future theoretical studies and provide guidance for practical scallop breeding.

]]>
<![CDATA[Epidemiological data from the COVID-19 outbreak, real-time case information]]> https://www.researchpad.co/article/N01996ac9-0141-4457-8e90-529431044524

Cases of a novel coronavirus were first reported in Wuhan, Hubei province, China, in December 2019 and have since spread across the world. Epidemiological studies have indicated human-to-human transmission in China and elsewhere. To aid the analysis and tracking of the COVID-19 epidemic we collected and curated individual-level data from national, provincial, and municipal health reports, as well as additional information from online reports. All data are geo-coded and, where available, include symptoms, key dates (date of onset, admission, and confirmation), and travel history. The generation of detailed, real-time, and robust data for emerging disease outbreaks is important and can help to generate robust evidence that will support and inform public health decision making.

]]>
<![CDATA[A 12-Lead ECG database to identify origins of idiopathic ventricular arrhythmia containing 334 patients]]> https://www.researchpad.co/article/N5eaa7fa5-6291-4733-b453-e8001ab0d389

Cardiac catheter ablation has shown the effectiveness of treating the idiopathic premature ventricular complex and ventricular tachycardia. As the most important prerequisite for successful therapy, criteria based on analysis of 12-lead ECGs are employed to reliably speculate the locations of idiopathic ventricular arrhythmia before a subsequent catheter ablation procedure. Among these possible locations, right ventricular outflow tract and left outflow tract are the major ones. We created a new 12-lead ECG database under the auspices of Chapman University and Ningbo First Hospital of Zhejiang University that aims to provide high quality data enabling detection of the distinctions between idiopathic ventricular arrhythmia from right ventricular outflow tract to left ventricular outflow tract. The dataset contains 334 subjects who successfully underwent a catheter ablation procedure that validated the accurate origins of idiopathic ventricular arrhythmia.

]]>
<![CDATA[DNA methylation of chronic lymphocytic leukemia with differential response to chemotherapy]]> https://www.researchpad.co/article/N0717afb8-b42e-4524-8f0d-5d022329a4c8

Acquired resistance to chemotherapy is an important clinical problem and can also occur without detectable cytogenetic aberrations or gene mutations. Chronic lymphocytic leukemia (CLL) is molecularly well characterized and has been elemental for establishing central paradigms in oncology. This prompted us to check whether specific epigenetic changes at the level of DNA methylation might underlie development of treatment resistance. We used Illumina Infinium HumanMethylation450 BeadChips to obtain DNA methylation profiles of 71 CLL patients with differential responses. Thirty-six patients were categorized as relapsed/refractory after treatment with fludarabine or bendamustine and 21 of them had genetic aberrations of TP53. The other 35 patients were untreated at the time of sampling and 15 of them had genetic aberration of TP53. Although we could not correlate chemoresistance with epigenetic changes, the patients were comprehensively characterized regarding relevant prognostic and molecular markers (e.g. IGHV mutation status, chromosome aberrations, TP53 mutation status, clinical parameters), which makes our dataset a unique and valuable resource that can be used by researchers to test alternative hypotheses.

]]>
<![CDATA[The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules]]> https://www.researchpad.co/article/Nc44803b8-cc4c-4b18-a21d-b3b959afb4f1

Maximum diversification of data is a central theme in building generalized and accurate machine learning (ML) models. In chemistry, ML has been used to develop models for predicting molecular properties, for example quantum mechanics (QM) calculated potential energy surfaces and atomic charge models. The ANI-1x and ANI-1ccx ML-based general-purpose potentials for organic molecules were developed through active learning; an automated data diversification process. Here, we describe the ANI-1x and ANI-1ccx data sets. To demonstrate data diversity, we visualize it with a dimensionality reduction scheme, and contrast against existing data sets. The ANI-1x data set contains multiple QM properties from 5 M density functional theory calculations, while the ANI-1ccx data set contains 500 k data points obtained with an accurate CCSD(T)/CBS extrapolation. Approximately 14 million CPU core-hours were expended to generate this data. Multiple QM calculated properties for the chemical elements C, H, N, and O are provided: energies, atomic forces, multipole moments, atomic charges, etc. We provide this data to the community to aid research and development of ML models for chemistry.

]]>
<![CDATA[An annual time series of weekly size-resolved aerosol properties in the megacity of Metro Manila, Philippines]]> https://www.researchpad.co/article/Ncd4cf6b0-3cd6-4238-a3fd-342b62c024b8

Size-resolved aerosol samples were collected in Metro Manila between July 2018 and October 2019. Two Micro-Orifice Uniform Deposit Impactors (MOUDI) were deployed at Manila Observatory in Quezon City, Metro Manila with samples collected on a weekly basis for water-soluble speciation and mass quantification. Additional sets were collected for gravimetric and black carbon analysis, including during special events such as holidays. The unique aspect of the presented data is a year-long record with weekly frequency of size-resolved aerosol composition in a highly populated megacity where there is a lack of measurements. The data are suitable for research to understand the sources, evolution, and fate of atmospheric aerosols, as well as studies focusing on phenomena such as aerosol-cloud-precipitation-meteorology interactions, regional climate, boundary layer processes, and health effects. The dataset can be used to initialize, validate, and/or improve models and remote sensing algorithms.

]]>
<![CDATA[Harmonised global datasets of wind and solar farm locations and power]]> https://www.researchpad.co/article/Nf7886d0a-3b26-4c0b-84e9-a3fac4a96128

Energy systems need decarbonisation in order to limit global warming to within safe limits. While global land planners are promising more of the planet’s limited space to wind and solar photovoltaic, there is little information on where current infrastructure is located. The majority of recent studies use land suitability for wind and solar, coupled with technical and socioeconomic constraints, as a proxy for actual location data. Here, we address this shortcoming. Using readily accessible OpenStreetMap data we present, to our knowledge, the first global, open-access, harmonised spatial datasets of wind and solar installations. We also include user friendly code to enable users to easily create newer versions of the dataset. Finally, we include first order estimates of power capacities of installations. We anticipate these data will be of widespread interest within global studies of the future potential and trade-offs associated with the global decarbonisation of energy systems.

]]>
<![CDATA[Simultaneous human intracerebral stimulation and HD-EEG, ground-truth for source localization methods]]> https://www.researchpad.co/article/N4a4b47ae-1185-495b-9d1b-4b619ec19847

Precisely localizing the sources of brain activity as recorded by EEG is a fundamental procedure and a major challenge for both research and clinical practice. Even though many methods and algorithms have been proposed, their relative advantages and limitations are still not well established. Moreover, these methods involve tuning multiple parameters, for which no principled way of selection exists yet. These uncertainties are emphasized due to the lack of ground-truth for their validation and testing. Here we present the Localize-MI dataset, which constitutes the first open dataset that comprises EEG recorded electrical activity originating from precisely known locations inside the brain of living humans. High-density EEG was recorded as single-pulse biphasic currents were delivered at intensities ranging from 0.1 to 5 mA through stereotactically implanted electrodes in diverse brain regions during pre-surgical evaluation of patients with drug-resistant epilepsy. The uses of this dataset range from the estimation of in vivo tissue conductivity to the development, validation and testing of forward and inverse solution methods.

]]>
<![CDATA[Generation of a murine SWATH-MS spectral library to quantify more than 11,000 proteins]]> https://www.researchpad.co/article/Nbbaa22a9-ce6c-4f4f-b9b1-f9263d4133ea

Targeted SWATH-MS data analysis is critically dependent on the spectral library. Comprehensive spectral libraries of human or several other organisms have been published, but the extensive spectral library for mouse, a widely used model organism is not available. Here, we present a large murine spectral library covering more than 11,000 proteins and 240,000 proteotypic peptides, which included proteins derived from 9 murine tissue samples and one murine L929 cell line. This resource supports the quantification of 67% of all murine proteins annotated by UniProtKB/Swiss-Prot. Furthermore, we applied the spectral library to SWATH-MS data from murine tissue samples. Data are available via SWATHAtlas (PASS01441).

]]>
<![CDATA[Database of open-framework aluminophosphate structures]]> https://www.researchpad.co/article/N48957fe6-402d-48b0-8dba-7721a0ed0217

Open-framework aluminophosphates are an important class of inorganic crystalline compounds because of their rich structural chemistry and diverse properties. We have collected 312 open-framework aluminophosphate crystal structures from published literature and established a database for these structures. For each aluminophosphate structure, we have assigned a unique index code and extracted its key chemical and crystallographic information from the original literature and the associated CIF file, such as the name, chemical formula, extra-framework species, Al/P ratio, space group, and unit cell parameters of the compound. More importantly, we have calculated the topological features for each aluminophosphate framework, including local connectivity, framework dimension, coordination sequences, vertex symbols, topology density, and the largest ring. To help experimental chemists identify their products, we have also calculated theoretical XRD peaks for all aluminophosphate structures. This database will provide important insight into understanding the structural chemistry of open-framework aluminophosphate compounds.

]]>
<![CDATA[The global dataset of historical yields for major crops 1981–2016]]> https://www.researchpad.co/article/Nb3784a23-5ba8-483f-a374-7c17c212f4e6

Knowing the historical yield patterns of major commodity crops, including the trends and interannual variability, is crucial for understanding the current status, potential and risks in food production in the face of the growing demand for food and climate change. We updated the global dataset of historical yields for major crops (GDHY), which is a hybrid of agricultural census statistics and satellite remote sensing, to cover the 36-year period from 1981 to 2016, with a spatial resolution of 0.5°. Four major crops were considered: maize, rice, wheat and soybean. The updated version 1.3 was developed and then aligned with the earlier version 1.2 to ensure the continuity of the yield time series. Comparisons with different global yield datasets and published results demonstrate that the GDHY-aligned version v1.2 + v1.3 dataset is a valuable source of information on global yields. The aligned version dataset enables users to employ an increased number of yield samples for their analyses, which ultimately increases the confidence in their findings.

]]>