ResearchPad - biochemistry-genetics-and-molecular-biology Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Optimization of Genotype by Sequencing data for phylogenetic purposes]]> Image, graphical abstract

<![CDATA[Data on the link between genomic integration of IS1548 and lineage of the strain obtained by bioinformatic analyses of sequenced genomes of Streptococcus agalactiae available at the National Center for Biotechnology Information database]]>

IS1548, a 1316-bp element of the ISAs1 family affects the expression of several genes of the opportunistic pathogen Streptococcus agalactiae. Furthermore, certain lineages of S. agalactiae are more frequently associated to particular diseases than other [1, 2]. We took advantage of the release of the genome sequences of a huge number of epidemiologically unrelated S. agalactiae strains of various origin to analyze the prevalence of IS1548 among S. agalactiae strains. To this end, S. agalactiae genome available at the National Center for Biotechnology Information (NCBI) database were blasted with IS1548 DNA sequences. A sequence type (ST), based on the allelic profile of seven housekeeping genes, was assigned to each strain possessing IS1548. These strains were then grouped into clonal complexes (CCs). The data obtained will give the opportunity to compare the sequenced genomes of S. agalactiae based on their lineage and/or possession of IS1548, and to select the corresponding strains for comparative experimental studies. The data is related to the research article « Dual and divergent transcriptional impact of IS1548 insertion upstream of the peptidoglycan biosynthesis murB gene of Streptococcus agalactiae” [2].

<![CDATA[Analysis of the data on titration of native and peroxynitrite modified αA- and αB-crystallins by Cu2+-ions]]>

The interaction of αA- and αB-crystallins with Cu2+ ion modulates their structure and chaperone-like activity which is important for lens transparency. Theoretical analysis of the dependences of fluorescence intensity of native αA- and αB-crystallins and αA- and αB-crystallins modified by peroxynitrite on concentration of Cu2+ ions has been carried out. It has been shown that one subunit of native αA-crystallin contains two equivalent Cu2+-binding sites. The microscopic dissociation constant for Cu2+–αA-crystallin complex (Kdiss) was found to be equal to 9.7 µM. For peroxynitrite modified αA-crystallin the Kdiss value is equal to 17 µM. One subunit of native αB-crystallin contains two non-equivalent Cu2+-binding sites. The corresponding microscopic dissociation constants for Cu2+–αB-crystallin complexes (K1 and K2) were found to be equal to 0.94 and 36 µM. For peroxynitrite modified αB-crystallin the K1 and K2 values are equal to 4.3 and 70 µM, respectively.

<![CDATA[Whole genome sequencing data of Escherichia coli isolated from bloodstream infection patients in Cipto Mangunkusumo National Hospital, Jakarta, Indonesia]]>

Bloodstream infections (BSIs) are some of the most devastating preventable complications in critical care units. Of the bacterial causes of BSIs, Escherichia coli is the most common among Enterobacteriaceae. Bacteria resistant to therapeutic antibiotics represent a significant global health challenge. In this study, we present whole genome sequence data of 22 E. coli isolates that were obtained from bloodstream infection patients admitted to Cipto Mangunkusumo National Hospital, Jakarta, Indonesia. These data will be useful for analysing the serotypes, virulence genes, and antimicrobial resistance genes of E. coli. DNA sequences of E. coli were obtained using the Illumina MiSeq platform. The FASTQ raw files of these sequences are available under BioProject accession number PRJNA596854 and Sequence Read Archive accession numbers SRR10761126–SRR10761147.

<![CDATA[Data on protein changes of chick vitreous during normal eye growth using data-independent acquisition (SWATH-MS)]]>

Myopia is the most common refractive error which is estimated to affect half the population of the world by 2050. It has been suggested that it could be determined by multiple factors such as environmental and genetic, but the mechanism behind the cause of myopia is still yet to be identified. Vitreous humor (VH) is a transparent gelatin-like substance that takes up to 80% of the volume of the eye, making it the largest component of the eye. Although VH is the main contributor to axial elongation of the eye including normal eye growth (emmetropization) and myopia, the diluted nature of VH (made up of 99% of water) made it difficult for less abundant molecules to be identified and therefore often overlooked. Using the more sensitive label-free mass spectrometry approach with data-independent acquisition (SWATH-MS), we established a comprehensive VH proteome library in chick animal model and quantified possible protein biomarkers that are responsible for the axial elongation during emmetropization (7, 14, 21, 28 days after hatching, n = 48 eyes). Raw data files for both information-dependent acquisition (IDA) and data-independent acquisition (SWATH-MS) were uploaded on PeptideAtlas for public access (

<![CDATA[Dataset of foraminiferal sedimentary DNA (sedDNA) sequences from Svalbard]]>

Environmental DNA (eDNA) is usually defined as genetic material obtained directly from environmental samples, such as soil, water, or ice. Coupled to DNA metabarcoding, eDNA is a powerful tool in biodiversity assessments. Results from eDNA approach provided valuable insights to the studies of past and contemporary biodiversity in terrestrial and aquatic environments. However, the state and fate of eDNA are still investigated and the knowledge about the form of eDNA (i.e., extracellular vs. intracellular) or the DNA degradation under different environmental conditions is limited. Here, we tackle this issue by analyzing foraminiferal sedimentary DNA (sedDNA) from different size fractions of marine sediments: >500 µm, 500–100 µm, 100–63 µm, and < 63 µm. Surface sediment samples were collected at 15 sampling stations located in the Svalbard archipelago. Sequences of the foraminifera-specific 37f region were generated using Illumina technology. The presented data may be used as a reference for a wide range of eDNA-based studies, including biomonitoring and biodiversity assessments across time and space.

<![CDATA[Bambara groundnut soil metagenomics data]]>

Metagenomics analysis was carried out on extracted DNA of Rhizospheric soil samples from Bambara groundnut. This dataset presented reports on the bacterial communities at the different growth stages of Bambara groundnut and the bulk soil. Paired-end Illumina-Miseq sequencing of 16S rRNA genes was carried on the soil samples of the bacterial community with the phyla dominated by Actinobacteria (30.1%), Proteobacteria (22%), Acidobacteria (20.9%), Bacteroides (8.4%), Chloroflex (4.5%) and Firmicutes (4.4%) in all the soil samples. Samples from the bulk soil had the least average percent phyla, while samples at seed maturity stage had the highest average percent phyla. The alpha diversity at p = 0.05 was highest at this stage compared to the others and the control. Rubrobacter was the most predominant genera, after which is Acidobacterium and Skermanella. The biodiversity profile generated from the metagenomics analysis is useful in increasing knowledge of the drought-tolerance ability of Bambara groundnut. The data generated can be used to compare bacterial diversity at different growth stages of plants.

<![CDATA[Data on the cancer risk and mortalities induced by annual background radiations at various ages in Kohgiluyeh and Boyer-Ahmad province, Iran]]>

Measurement of background radiations (BRs) as the sources of cancer risk, is important. The aim of this study was to measure the BR, as well as its cancer risk and mortalities in Kohgiluyeh and Boyer-Ahmad province (KBAp). Indoors and outdoors BRs were measured in eight cities utilizing a Geiger-Muller detector. Five main locations (north, east, west, south, and center) were chosen for measuring outdoor and indoor BRs in each city of KBAp. The BEIR VII-Phase 2 model was used to calculate the BRs induced cancer risks and mortalities of various cancer types at different ages. The average dose rates of outdoor and indoor were 136.9 ± 12.5 and 149.3 ± 19.8 nSv.h−1, respectively. The average annual effective doses (AEDs) for adults, children, and infants were 0.17, 0.19, and 0.22 mSv.y−1 due to the outdoor, and 0.73, 0.84, and 0.94 mSv.y−1 resulting from the indoor exposure, respectively. The average lifetime risk for one year BRs induced cancers was 164.8 ± 15.7 and 307.1 ± 32.3 (in 100,000 people) for new-borns male and female, in that order. This risk decreased with age and reached 11.2 ± 1.6 and 13.8 ± 1.6 (in 100,000 people) for men and women at the age of 80, respectively. The average lifetime risk of mortality due to cancers induced by annual BRs was 70.7 ± 8.3 and 113.8 ± 10.6 (incidence probability in 100,000 people) for new-borns male and female respectively. This risk decreased with age and reached 9.8 ± 1.3 and 12.2 ± 1.3 (in 100,000 people) for men and women at the age of 80 years, respectively.

<![CDATA[Data analysis of PD-1 antibody in the treatment of melanoma patients]]>

Data presented in this article are supplementary materials to the research article entitled “IGFBP2 regulates PD-L1 expression by activating the EGFR-STAT3 signaling pathway in malignant melanoma”. Data for melanoma patients who did not receive anti-PD-1 treatment were obtained from Tianjin Medical University Cancer Institute & Hospital from February 1981 to May 2013. Kaplan–Meier was used for survival analysis. RNA sequencing data from 28 melanoma patients receiving anti-PD-1 therapy were download from GEO database (GSE78220). Cluster analysis of RNA expression was performed using R (package pheatmap). The difference of PD-L1 expression was analysed by the Boxplot (R ggplot2 package). Differences between each group were analyzed by Fisher exact test. Information of 13 melanoma patients who had failed prior chemotherapy and treated in the Tianjin Medical University Cancer Institute & Hospital between July 2015 and December 2018 was collected. The response was captured by Response Evaluation Criteria in Solid Tumors 1.1 (RECIST 1.1).

<![CDATA[High-speed single molecule imaging datasets of membrane proteins in rat basophilic leukemia cells]]>

A high-speed fluorescence microscope operating at a 490 Hz frame rate was used to image two different membrane proteins- the high-affinity IgE receptor FcɛRI, a transmembrane protein, and an outer-leaflet GPI-anchored protein. The IgE receptor was imaged via IgE labeled with Janelia Fluor 646 and the GPI-anchored protein was imaged using a GPI-GFP fusion protein and an ATTO 647 N labeled anti-GFP nanobody. Data was collected for both proteins in untreated cells and cells that had actin stabilized by phalloidin. This dataset can be used for development and testing of single-particle tracking methods on experimental data and to explore the hypothesis that the actin cytoskeleton may affect the movement of membrane proteins.

<![CDATA[Proteomic data of seminal plasma and spermatozoa of four purebred dogs]]>

Semen contains several proteins that are important to fertilization and to identify reproductive failures. There are proteins that are specie-specific expressed, although differs among several breeds. This article provides experimental data describing the protein profile of seminal plasma and spermatozoa of four healthy purebred dogs: Golden Retriever (n=3), Bernese Mountain Dog (n=4), Great Dane (n=3), and Maremmano-Abruzzese Sheepdog (n=3), housed at São Paulo state, Brazil. Semen samples were collected by manual stimulation of the penis in a presence of a teaser bitch, when possible. The seminal plasma and sperm cells were separated by centrifugation and prepared for mass spectrometry. The gene ontology annotation of the proteins found is described. This is the first time that proteomic profile of the semen of purebred dogs is described. These data are a valuable resource to improve the biotechnologies of reproduction applied to canid species.

<![CDATA[High spatiotemporal resolution data from a custom magnetic tweezers instrument]]>

Gene expression is achieved by enzymes as RNA polymerases that translocate along nucleic acids with steps as small as a single base pair, i.e., 0.34 nm for DNA. Deciphering the complex biochemical pathway that describes the activity of such enzymes requires an exquisite spatiotemporal resolution. Magnetic tweezers are a powerful single molecule force spectroscopy technique that uses a camera-based detection to enable the simultaneous observation of hundreds of nucleic acid tethered magnetic beads at a constant force with subnanometer resolution [1,2]. High spatiotemporal resolution magnetic tweezers have recently been reported [3–5]. We present data acquired using a bespoke magnetic tweezers instrument that is able to perform either in high throughput or at high resolution. The data reports on the best achievable resolution for surface-attached polystyrene beads and DNA tethered magnetic beads, and highlights the influence of mechanical stability for such assay. We also present data where we are able to detect 0.3 nm steps along the z-axis using DNA tethered magnetic beads. Because the data presented here are in agreement with the best resolution obtained with magnetic tweezers, they provide a useful benchmark comparison for setup adjustment and optimization.

<![CDATA[Data set of intrinsically disordered proteins analysed at a local protein conformation level]]>

Intrinsic Disorder Proteins (IDPs) have become a hot topic since their characterisation in the 90s. The data presented in this article are related to our research entitled “A structural entropy index to analyse local conformations in Intrinsically Disordered Proteins” published in Journal of Structural Biology [1]. In this study, we quantified, for the first time, continuum from rigidity to flexibility and finally disorder. Non-disordered regions were also highlighted in the ensemble of disordered proteins. This work was done using the Protein Ensemble Database (PED), which is a useful database collecting series of protein structures considered as IDPs. The data set consists of a collection of cleaned protein files in classical pdb format that can be readily used as an input with most automatic analysis software. The accompanying data include the coding of all structural information in terms of a structural alphabet, namely Protein Blocks (PBs). An entropy index derived from PBs that allows apprehending the continuum between protein rigidity to flexibility to disorder is included, with information from secondary structure assignment, protein accessibility and prediction of disorder from the sequences. The data may be used for further structural bioinformatics studies of IDPs. It can also be used as a benchmark for evaluating disorder prediction methods.

<![CDATA[Methods for multiple outcome meta-analysis of gene-expression data]]>

Meta-analysis is a valuable tool for the synthesis of evidence across a wide range study types including high-throughput experiments such as genome-wide association studies (GWAS) and gene expression studies. There are situations though, in which we have multiple outcomes or multiple treatments, in which the multivariate meta-analysis framework which performs a joint modeling of the different quantities of interest may offer important advantages, such as increasing statistical power and allowing performing global tests. In this work we adapted the multivariate meta-analysis method and applied it in gene expression data. With this method we can test for pleiotropic effects, that is, for genes that influence both outcomes or discover genes that have a change in expression not detectable in the univariate method. We tested this method on data regarding inflammatory bowel disease (IBD), with its two main forms, Crohn’s disease (CD) and Ulcerative colitis (UC), sharing many clinical manifestations, but differing in the location and extent of inflammation and in complications. The Stata code is given in the Appendix and it is available at:

  • Multivariate meta-analysis method for gene expression data.

  • Discover genes with pleiotropic effects.

  • Differentially Expressed Genes (DEGs) identification in complex traits.

<![CDATA[Quantitative analysis of H2O2 transport through purified membrane proteins]]>

Hydrogen peroxide (H2O2) is an important signal molecule produced in animal and plant cells. The balance of H2O2 between the intra- and extracellular space is regulated by integral membrane proteins, which thereby modulate signaling. Several methods have been established to analyze aquaporin mediated transport of H2O2 in whole cells with the intrinsic limitation that the amount of protein responsible for a certain activity cannot be standardized. As a consequence, the quantification of the transport and specific activity is difficult to extract making it problematic to compare isoforms and mutated variants of one specific target. Moreover, in cell-based assays, the expression of the target protein may alter the physiological processes of the host cell providing a complication and the risk of misleading results. To improve the measurements of protein based H2O2 transport, we have developed an assay allowing quantitative measurements.

  • Using purified aquaporin reconstituted in proteoliposomes, transport of H2O2 can be accurately measured.

  • Inside the liposomes, H2O2 catalyzes the reaction between Amplex Red and horseradish peroxidase (HRP) giving rise to the fluorescent product resorufin.

  • Analysing pure protein provides direct biochemical evidence of a specific transport excluding putative cellular background.

<![CDATA[Titration methods for rVSV-based vaccine manufacturing]]>

The recombinant Vesicular Stomatitis Virus (rVSV) is an emerging platform for viral vector-based vaccines. Promising results have been reported in clinical trials for the rVSV-ZEBOV vaccine for Ebola virus disease prevention. In this study, we describe the titration tools elaborated to assess the titre of rVSV-ZEBOV productions.

• A streamlined Median Tissue Culture Infectious Dose (TCID50) assay to determine the infectious titer of this vaccine was established.

• A digital polymerase chain reaction (dPCR) assay to assess the total number of viral particles present in cell-free culture supernatants of rVSV productions was developed.

• These assays are used to titre rVSV-ZEBOV samples and characterize the ratio of total particles to infectious units for monitoring process robustness and product quality attributes and can be used to titre samples generated in the production of further rVSV vectors.

<![CDATA[Dataset of allele, genotype and haplotype frequencies of four polymorphisms filaggrin gene in Russian patients with atopic dermatitis]]>

Data on the allele, genotype and haplotype frequencies of four single nucleotide polymorphisms (SNPs) (rs3126085, rs12144049, rs471144 and rs4363385) filaggrin (FLG) gene in Russian patients with atopic dermatitis are presented. Genome-wide association studies identified these SNPs could be significant genetic markers associated with atopic dermatitis. The frequencies of alleles, genotypes and haplotypes of four SNPs were calculated in 3 groups: entire sample, females and males. No significant differences in the allele, genotype and haplotype frequencies between males and females with AD patients were observed.

<![CDATA[A protocol for metabolic characterization of human induced pluripotent stem cell-derived cardiomyocytes (iPS-CM)]]>

Graphical abstract

<![CDATA[Bioinformatic tools for tRNA gene analyses in mitochondrial DNA sequence data]]>

The data presented here are related to the research article entitled “Hidden cases of tRNA genes duplication and remolding in mitochondrial genomes of amphipods” (Romanova et al., 2020) [1]. Correct tRNA gene sequence annotation in mitochondrial (mt) and nuclear genomes sometimes can be a challenging task because of the differential performances of tRNA annotation/prediction programmes. These programmes may cause false positive or false negative predictions. Moreover, additional difficulties with annotation may be caused by the presence of duplicated tRNA genes and those coding tRNAs with altered identities occurring as due to a mutation in their anticodon sequence (tRNA gene remolding/recruitment).

We developed an R script automating the diagnosis of ancestor tRNA gene coding specificity regardless of anticodon sequence based on genetic distance comparison. Some of the predicted tRNA genes from the mt genomes of amphipods are presented. We also developed an R script for estimation of the best mode of sequence alignment, which was applied to determine the best alignment of tRNA genes in [1], but is also suitable for testing of any nucleotide alignment sets used in phylogenetic inferences.

<![CDATA[Data on genetic linkage of oxidative stress with cardiometabolic traits in an intercross derived from hyperlipidemic mouse strains]]>

The data presented here are related to the research article, entitled Genetic linkage of oxidative stress with cardiometabolic traits in an intercross derived from hyperlipidemic mouse strains, published in Atherosclerosis 2019 Dec 3;293:1–10 (D. Fuller, A.T. Grainger, A. Manichaikul, W. Shi). The supporting materials include original genotypic and phenotypic data obtained from 266 female F2 mice derived from an intercross between C57BL/6 (B6) and BALB/cJ (BALB) Apoe−/- mice. F2 mice were fed 12 weeks of Western diet, starting at 6 weeks of age. Plasma levels of HDL, LDL cholesterol, triglycerides, glucose and malondialdehyde (MDA) and atherosclerosis in the aortic root and the left carotid artery were measured. 127 microsatellite markers across the entire genome were genotyped. The data is provided in the format ready for QTL analysis with J/qtl and MapManager QTX.