ResearchPad - genomic-libraries https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Genome-wide association study of partial resistance to sclerotinia stem rot of cultivated soybean based on the detached leaf method]]> https://www.researchpad.co/article/elastic_article_15721 Sclerotinia stem rot (SSR) is a devastating fungal disease that causes severe yield losses of soybean worldwide. In the present study, a representative population of 185 soybean accessions was selected and utilized to identify the quantitative trait nucleotide (QTN) of partial resistance to soybean SSR via a genome-wide association study (GWAS). A total of 22,048 single-nucleotide polymorphisms (SNPs) with minor allele frequencies (MAF) > 5% and missing data < 3% were used to assess linkage disequilibrium (LD) levels. Association signals associated with SSR partial resistance were identified by two models, including compressed mixed linear model (CMLM) and multi-locus random-SNP-effect mixed linear model (mrMLM). Finally, seven QTNs with major effects (a known locus and six novel loci) via CMLM and nine novel QTNs with minor effects via mrMLM were detected in relation to partial resistance to SSR, respectively. One of all the novel loci (Gm05:14834789 on Chr.05), which was co-located by these two methods, might be a stable one that showed high significance in SSR partial resistance. Additionally, a total of 71 major and 85 minor candidate genes located in the 200-kb genomic region of each peak SNP detected by CMLM and mrMLM were found, respectively. By using a gene-based association, a total of six SNPs from three major effects genes and eight SNPs from four minor effects genes were identified. Of them, Glyma.18G012200 has been characterized as a significant element in controlling fungal disease in plants.

]]>
<![CDATA[Chloroplast genomes of Rubiaceae: Comparative genomics and molecular phylogeny in subfamily Ixoroideae]]> https://www.researchpad.co/article/elastic_article_11231 In Rubiaceae phylogenetics, the number of markers often proved a limitation with authors failing to provide well-supported trees at tribal and generic levels. A robust phylogeny is a prerequisite to study the evolutionary patterns of traits at different taxonomic levels. Advances in next-generation sequencing technologies have revolutionized biology by providing, at reduced cost, huge amounts of data for an increased number of species. Due to their highly conserved structure, generally recombination-free, and mostly uniparental inheritance, chloroplast DNA sequences have long been used as choice markers for plant phylogeny reconstruction. The main objectives of this study are: 1) to gain insight in chloroplast genome evolution in the Rubiaceae (Ixoroideae) through efficient methodology for de novo assembly of plastid genomes; and, 2) to test the efficiency of mining SNPs in the nuclear genome of Ixoroideae based on the use of a coffee reference genome to produce well-supported nuclear trees. We assembled whole chloroplast genome sequences for 27 species of the Rubiaceae subfamily Ixoroideae using next-generation sequences. Analysis of the plastid genome structure reveals a relatively good conservation of gene content and order. Generally, low variation was observed between taxa in the boundary regions with the exception of the inverted repeat at both the large and short single copy junctions for some taxa. An average of 79% of the SNP determined in the Coffea genus are transferable to Ixoroideae, with variation ranging from 35% to 96%. In general, the plastid and the nuclear genome phylogenies are congruent with each other. They are well-resolved with well-supported branches. Generally, the tribes form well-identified clades but the tribe Sherbournieae is shown to be polyphyletic. The results are discussed relative to the methodology used and the chloroplast genome features in Rubiaceae and compared to previous Rubiaceae phylogenies.

]]>
<![CDATA[Identification of a novel archaea virus, detected in hydrocarbon polluted Hungarian and Canadian samples]]> https://www.researchpad.co/article/N5489318a-3499-4862-9afc-2378cea7eecb

Metagenomics is a helpful tool for the analysis of unculturable organisms and viruses. Viruses that target bacteria and archaea play important roles in the microbial diversity of various ecosystems. Here we show that Methanosarcina virus MV (MetMV), the second Methanosarcina sp. virus with a completely determined genome, is characteristic of hydrocarbon pollution in environmental (soil and water) samples. It was highly abundant in Hungarian hydrocarbon polluted samples and its genome was also present in the NCBI SRA database containing reads from hydrocarbon polluted samples collected in Canada, indicating the stability of its niche and the marker feature of this virus. MetMV, as the only currently identified marker virus for pollution in environmental samples, could contribute to the understanding of the complicated network of prokaryotes and their viruses driving the decomposition of environmental pollutants.

]]>
<![CDATA[Nosocomial transmission of extensively drug resistant Acinetobacter baumannii strains in a tertiary level hospital]]> https://www.researchpad.co/article/N9f3b656c-39ce-49ef-bced-db8369f1110d

Acinetobacter baumannii is an opportunistic infectious agent that affects primarily immunocompromised individuals. A. baumannii is highly prevalent in hospital settings being commonly associated with nosocomial transmission and drug resistance. Here, we report the identification and genetic characterization of A. baumannii strains among patients in a tertiary level hospital in Mexico. Whole genome sequencing analysis was performed to establish their genetic relationship and drug resistance mutations profile. Ten genetically different, extensively drug resistant strains were identified circulating among seven wards. The genetic profiles showed resistance primarily against aminoglycosides and beta-lactam antibiotics. Importantly, no mutants conferring resistance to colistin were observed. The results highlight the importance of implementing robust classification schemes for advanced genetic characterization of A. baumannii clinical isolates and simultaneous detection of drug resistance markers for adequate patient’s management in clinical settings.

]]>
<![CDATA[Identification of early fruit development reference genes in plum]]> https://www.researchpad.co/article/N34728444-bb7f-4d99-8469-dd5c2a1110fc

An RNAseq study of early fruit development and stone development in plum, Prunus domestica, was mined to identify sets of genes that could be used to normalize expression studies in early fruit development. The expression values of genes previously identified from Prunus as reference genes were first extracted and found to vary considerably in endocarp tissue relative to whole fruit tissue. Nine other genes were chosen that varied less than 2-fold amongst the 20 RNAseq libraries of early fruit development and endocarp tissues. These gene were tested on a series of developmental plum fruit samples to determine if any could be used as a reference gene in the analyses of fruit-based tissues in plum. The three most stable genes as determined using RefFinder were IPGD (imidazole glycerol-phosphate dehydratase), HAM1 (histone acetyltransferase) and SNX1 (sorting nexin 1). These were further tested to analyze genes expressed differentially in endocarp tissue between normal and minimal endocarp cultivars. To determine the universality of those nine genes as fruit development reference genes, three other data sets of RNAseq from peach and apple were analyzed to determine the reference gene expression. Multiple genes exhibited tissue specific patterns of expression while one gene, the SNX1, emerged as possessing a universal pattern between the Rosaceae species, at all developmental stages, and tissue types tested. The results suggest that the use of existing RNAseq data to identify standard genes can provide stable reference genes for a specific tissues or experimental conditions under exploration.

]]>
<![CDATA[Unusual genome expansion and transcription suppression in ectomycorrhizal Tricholoma matsutake by insertions of transposable elements]]> https://www.researchpad.co/article/Nd7412b83-0508-48a9-959e-b3aa8ede7a25

Genome sequencing of Tricholoma matsutake revealed its unusually large size as 189.0 Mbp, which is a consequence of extraordinarily high transposable element (TE) content. We identified that 702 genes were surrounded by TEs, and 83.2% of these genes were not transcribed at any developmental stage. This observation indicated that the insertion of TEs alters the transcription of the genes neighboring these TEs. Repeat-induced point mutation, such as C to T hypermutation with a bias over “CpG” dinucleotides, was also recognized in this genome, representing a typical defense mechanism against TEs during evolution. Many transcription factor genes were activated in both the primordia and fruiting body stages, which indicates that many regulatory processes are shared during the developmental stages. Small secreted protein genes (<300 aa) were dominantly transcribed in the hyphae, where symbiotic interactions occur with the hosts. Comparative analysis with 37 Agaricomycetes genomes revealed that IstB-like domains (PF01695) were conserved across taxonomically diverse mycorrhizal genomes, where the T. matsutake genome contained four copies of this domain. Three of the IstB-like genes were overexpressed in the hyphae. Similar to other ectomycorrhizal genomes, the CAZyme gene set was reduced in T. matsutake, including losses in the glycoside hydrolase genes. The T. matsutake genome sequence provides insight into the causes and consequences of genome size inflation.

]]>
<![CDATA[How “simple” methodological decisions affect interpretation of population structure based on reduced representation library DNA sequencing: A case study using the lake whitefish]]> https://www.researchpad.co/article/N3bb2bc39-24d6-4fe3-98ed-f97dea058c57

Reduced representation (RRL) sequencing approaches (e.g., RADSeq, genotyping by sequencing) require decisions about how much to invest in genome coverage and sequencing depth, as well as choices of values for adjustable bioinformatics parameters. To empirically explore the importance of these “simple” methodological decisions, we generated two independent sequencing libraries for the same 142 individual lake whitefish (Coregonus clupeaformis) using a nextRAD RRL approach: (1) a larger number of loci at low sequencing depth based on a 9mer (library A); and (2) fewer loci at higher sequencing depth based on a 10mer (library B). The fish were selected from populations with different levels of expected genetic subdivision. Each library was analyzed using the STACKS pipeline followed by three types of population structure assessment (FST, DAPC and ADMIXTURE) with iterative increases in the stringency of sequencing depth and missing data requirements, as well as more specific a priori population maps. Library B was always able to resolve strong population differentiation in all three types of assessment regardless of the selected parameters, largely due to retention of more loci in analyses. In contrast, library A produced more variable results; increasing the minimum sequencing depth threshold (-m) resulted in a reduced number of retained loci, and therefore lost resolution at high -m values for FST and ADMIXTURE, but not DAPC. When detecting fine population differentiation, the population map influenced the number of loci and missing data, which generated artefacts in all downstream analyses tested. Similarly, when examining fine scale population subdivision, library B was robust to changing parameters but library A lost resolution depending on the parameter set. We used library B to examine actual subdivision in our study populations. All three types of analysis found complete subdivision among populations in Lake Huron, ON and Dore Lake, SK, Canada using 10,640 SNP loci. Weak population subdivision was detected in Lake Huron with fish from sites in the north-west, Search Bay, North Point and Hammond Bay, showing slight differentiation. Overall, we show that apparently simple decisions about library construction and bioinformatics parameters can have important impacts on the interpretation of population subdivision. Although potentially more costly on a per-locus basis, early investment in striking a balance between the number of loci and sequencing effort is well worth the reduced genomic coverage for population genetics studies. More conservative stringency settings on STACKS parameters lead to a final dataset that was more consistent and robust when examining both weak and strong population differentiation. Overall, we recommend that researchers approach “simple” methodological decisions with caution, especially when working on non-model species for the first time.

]]>
<![CDATA[Genome-wide DNA methylation analysis of pituitaries during the initiation of puberty in gilts]]> https://www.researchpad.co/article/5c8accc9d5eed0c48498ffd8

It has been widely recognized that the early or delayed puberty appears to display harmful effects on adult health outcomes. During the timing of puberty, pituitaries responds to the hypothalamus and then introduce the following response of ovaries in hypothalamic-pituitary-gonadal axis. DNA methylation has been recently suggested to regulate the onset of puberty in female mammals. However, to date, the changes of DNA methylation in pituitaries have not been investigated during pubertal transition. In this study, using gilts as the pubertal model, the genome-scale DNA methylation of pituitaries was profiled and compared across Pre-, In- and Post-puberty by using the reduced representation bisulfite sequencing. We found that average methylation levels of each genomic feature in Post- were lower than Pre- and In-pubertal stage in CpG context, but they were higher in In- than that in Pre- and Post-pubertal stage in CpH (where H = A, T, or C) context. The methylation patterns of CpHs were more dynamic than that of CpGs at the location of high CpG content, low CpG content promoter genes, and differently genomic CGIs. Furthermore, the differently genomic CGIs were likely to show in a similar manner in CpG context but display in a stage-specific manner in the CpH context across the Pre-, In- and Post-pubertal stage. Among these pubertal stages, 5 kb upstream regions of the transcription start sites were protected from both CpG and CpH methylation changes. 12.65% of detected CpGs were identified as the differentially methylated CpGs, regarding 4301 genes which were involved in the fundamental functions of pituitaries. 0.35% of detected CpHs were identified as differentially methylated CpHs, regarding 3691 genes which were involved in the biological functions of releasing gonadotropin hormones. These observations and analyses would provide valuable insights into epigenetic mechanism of the initiation of puberty in pituitary level.

]]>
<![CDATA[BioJava 5: A community driven open-source bioinformatics library]]> https://www.researchpad.co/article/5c6730bad5eed0c484f37fa8

BioJava is an open-source project that provides a Java library for processing biological data. The project aims to simplify bioinformatic analyses by implementing parsers, data structures, and algorithms for common tasks in genomics, structural biology, ontologies, phylogenetics, and more. Since 2012, we have released two major versions of the library (4 and 5) that include many new features to tackle challenges with increasingly complex macromolecular structure data. BioJava requires Java 8 or higher and is freely available under the LGPL 2.1 license. The project is hosted on GitHub at https://github.com/biojava/biojava. More information and documentation can be found online on the BioJava website (http://www.biojava.org) and tutorial (https://github.com/biojava/biojava-tutorial). All inquiries should be directed to the GitHub page or the BioJava mailing list (http://lists.open-bio.org/mailman/listinfo/biojava-l).

]]>
<![CDATA[Genomic insights into neonicotinoid sensitivity in the solitary bee Osmia bicornis]]> https://www.researchpad.co/article/5c61e8f0d5eed0c48496f48d

The impact of pesticides on the health of bee pollinators is determined in part by the capacity of bee detoxification systems to convert these compounds to less toxic forms. For example, recent work has shown that cytochrome P450s of the CYP9Q subfamily are critically important in defining the sensitivity of honey bees and bumblebees to pesticides, including neonicotinoid insecticides. However, it is currently unclear if solitary bees have functional equivalents of these enzymes with potentially serious implications in relation to their capacity to metabolise certain insecticides. To address this question, we sequenced the genome of the red mason bee, Osmia bicornis, the most abundant and economically important solitary bee species in Central Europe. We show that O. bicornis lacks the CYP9Q subfamily of P450s but, despite this, exhibits low acute toxicity to the N-cyanoamidine neonicotinoid thiacloprid. Functional studies revealed that variation in the sensitivity of O. bicornis to N-cyanoamidine and N-nitroguanidine neonicotinoids does not reside in differences in their affinity for the nicotinic acetylcholine receptor or speed of cuticular penetration. Rather, a P450 within the CYP9BU subfamily, with recent shared ancestry to the Apidae CYP9Q subfamily, metabolises thiacloprid in vitro and confers tolerance in vivo. Our data reveal conserved detoxification pathways in model solitary and eusocial bees despite key differences in the evolution of specific pesticide-metabolising enzymes in the two species groups. The discovery that P450 enzymes of solitary bees can act as metabolic defence systems against certain pesticides can be leveraged to avoid negative pesticide impacts on these important pollinators.

]]>
<![CDATA[Apollo: Democratizing genome annotation]]> https://www.researchpad.co/article/5c648d41d5eed0c484c823a0

Genome annotation is the process of identifying the location and function of a genome's encoded features. Improving the biological accuracy of annotation is a complex and iterative process requiring researchers to review and incorporate multiple sources of information such as transcriptome alignments, predictive models based on sequence profiles, and comparisons to features found in related organisms. Because rapidly decreasing costs are enabling an ever-growing number of scientists to incorporate sequencing as a routine laboratory technique, there is widespread demand for tools that can assist in the deliberative analytical review of genomic information. To this end, we present Apollo, an open source software package that enables researchers to efficiently inspect and refine the precise structure and role of genomic features in a graphical browser-based platform. Some of Apollo’s newer user interface features include support for real-time collaboration, allowing distributed users to simultaneously edit the same encoded features while also instantly seeing the updates made by other researchers on the same region in a manner similar to Google Docs. Its technical architecture enables Apollo to be integrated into multiple existing genomic analysis pipelines and heterogeneous laboratory workflow platforms. Finally, we consider the implications that Apollo and related applications may have on how the results of genome research are published and made accessible.

]]>
<![CDATA[elPrep 4: A multithreaded framework for sequence analysis]]> https://www.researchpad.co/article/5c6dc9a8d5eed0c484529f91

We present elPrep 4, a reimplementation from scratch of the elPrep framework for processing sequence alignment map files in the Go programming language. elPrep 4 includes multiple new features allowing us to process all of the preparation steps defined by the GATK Best Practice pipelines for variant calling. This includes new and improved functionality for sorting, (optical) duplicate marking, base quality score recalibration, BED and VCF parsing, and various filtering options. The implementations of these options in elPrep 4 faithfully reproduce the outcomes of their counterparts in GATK 4, SAMtools, and Picard, even though the underlying algorithms are redesigned to take advantage of elPrep’s parallel execution framework to vastly improve the runtime and resource use compared to these tools. Our benchmarks show that elPrep executes the preparation steps of the GATK Best Practices up to 13x faster on WES data, and up to 7.4x faster for WGS data compared to running the same pipeline with GATK 4, while utilizing fewer compute resources.

]]>
<![CDATA[Testing of library preparation methods for transcriptome sequencing of real life glioblastoma and brain tissue specimens: A comparative study with special focus on long non-coding RNAs]]> https://www.researchpad.co/article/5c6b26afd5eed0c484289e7d

Current progress in the field of next-generation transcriptome sequencing have contributed significantly to the study of various malignancies including glioblastoma multiforme (GBM). Differential sequencing of transcriptomes of patients and non-tumor controls has a potential to reveal novel transcripts with significant role in GBM. One such candidate group of molecules are long non-coding RNAs (lncRNAs) which have been proved to be involved in processes such as carcinogenesis, epigenetic modifications and resistance to various therapeutic approaches. To maximize the value of transcriptome sequencing, a proper protocol for library preparation from tissue-derived RNA needs to be found which would produce high quality transcriptome sequencing data and increase the number of detected lncRNAs. It is important to mention that success of library preparation is determined by the quality of input RNA, which is in case of real-life tissue specimens very often altered in comparison to high quality RNA commonly used by manufacturers for development of library preparation chemistry. In the present study, we used GBM and non-tumor brain tissue specimens and compared three different commercial library preparation kits, namely NEXTflex Rapid Directional qRNA-Seq Kit (Bioo Scientific), SENSE Total RNA-Seq Library Prep Kit (Lexogen) and NEBNext Ultra II Directional RNA Library Prep Kit for Illumina (NEB). Libraries generated using SENSE kit were characterized by the most normal distribution of normalized average GC content, the least amount of over-represented sequences and the percentage of ribosomal RNA reads (0.3–1.5%) and highest numbers of uniquely mapped reads and reads aligning to coding regions. However, NEBNext kit performed better having relatively low duplication rates, even transcript coverage and the highest number of hits in Ensembl database for every biotype of our interest including lncRNAs. Our results indicate that out of three approaches the NEBNext library preparation kit was most suitable for the study of lncRNAs via transcriptome sequencing. This was further confirmed by highly consistent data reached in an independent validation on an expanded cohort.

]]>
<![CDATA[Vgsc-interacting proteins are genetically associated with pyrethroid resistance in Aedes aegypti]]> https://www.researchpad.co/article/5c59feaed5eed0c4841352e6

Association mapping of factors that condition pyrethroid resistance in Aedes aegypti has consistently identified genes in multiple functional groups. Toward better understanding of the mechanisms involved, we examined high throughput sequencing data (HTS) from two Aedes aegypti aegypti collections from Merida, Yucatan, Mexico treated with either permethrin or deltamethrin. Exome capture enrichment for coding regions and the AaegL5 annotation were used to identify genes statistically associated with resistance. The frequencies of single nucleotide polymorphisms (SNPs) were compared between resistant and susceptible mosquito pools using a contingency χ2 analysis. The -log102 p value) was calculated at each SNP site, with a weighted average determined from all sites in each gene. Genes with -log102 p value) ≥ 4.0 and present among all 3 treatment groups were subjected to gene set enrichment analysis (GSEA). We found that several functional groups were enriched compared to all coding genes. These categories were transport, signal transduction and metabolism, in order from highest to lowest statistical significance. Strikingly, 21 genes with demonstrated association to synaptic function were identified. In the high association group (n = 1,053 genes), several genes were identified that also genetically or physically interact with the voltage-gated sodium channel (VGSC). These genes were eg., CHARLATAN (CHL), a transcriptional regulator, several ankyrin-domain proteins, PUMILIO (PUM), a translational repressor, and NEDD4 (E3 ubiquitin-protein ligase). There were 13 genes that ranked among the top 10%: these included VGSC; CINGULIN, a predicted neuronal gap junction protein, and the aedine ortholog of NERVY (NVY), a transcriptional regulator. Silencing of CHL and NVY followed by standard permethrin bottle bioassays validated their association with permethrin resistance. Importantly, VGSC levels were also reduced about 50% in chl- or nvy-dsRNA treated mosquitoes. These results are consistent with the contribution of a variety of neuronal pathways to pyrethroid resistance in Ae. aegypti.

]]>
<![CDATA[Whole-genome sequence of the bovine blood fluke Schistosoma bovis supports interspecific hybridization with S. haematobium]]> https://www.researchpad.co/article/5c52186ad5eed0c4847981f8

Mesenteric infection by the parasitic blood fluke Schistosoma bovis is a common veterinary problem in Africa and the Middle East and occasionally in the Mediterranean Region. The species also has the ability to form interspecific hybrids with the human parasite S. haematobium with natural hybridisation observed in West Africa, presenting possible zoonotic transmission. Additionally, this exchange of alleles between species may dramatically influence disease dynamics and parasite evolution. We have generated a 374 Mb assembly of the S. bovis genome using Illumina and PacBio-based technologies. Despite infecting different hosts and organs, the genome sequences of S. bovis and S. haematobium appeared strikingly similar with 97% sequence identity. The two species share 98% of protein-coding genes, with an average sequence identity of 97.3% at the amino acid level. Genome comparison identified large continuous parts of the genome (up to several 100 kb) showing almost 100% sequence identity between S. bovis and S. haematobium. It is unlikely that this is a result of genome conservation and provides further evidence of natural interspecific hybridization between S. bovis and S. haematobium. Our results suggest that foreign DNA obtained by interspecific hybridization was maintained in the population through multiple meiosis cycles and that hybrids were sexually reproductive, producing viable offspring. The S. bovis genome assembly forms a highly valuable resource for studying schistosome evolution and exploring genetic regions that are associated with species-specific phenotypic traits.

]]>
<![CDATA[BRCA Challenge: BRCA Exchange as a global resource for variants in BRCA1 and BRCA2]]> https://www.researchpad.co/article/5c2d2eb3d5eed0c484d9b2c0

The BRCA Challenge is a long-term data-sharing project initiated within the Global Alliance for Genomics and Health (GA4GH) to aggregate BRCA1 and BRCA2 data to support highly collaborative research activities. Its goal is to generate an informed and current understanding of the impact of genetic variation on cancer risk across the iconic cancer predisposition genes, BRCA1 and BRCA2. Initially, reported variants in BRCA1 and BRCA2 available from public databases were integrated into a single, newly created site, www.brcaexchange.org. The purpose of the BRCA Exchange is to provide the community with a reliable and easily accessible record of variants interpreted for a high-penetrance phenotype. More than 20,000 variants have been aggregated, three times the number found in the next-largest public database at the project’s outset, of which approximately 7,250 have expert classifications. The data set is based on shared information from existing clinical databases—Breast Cancer Information Core (BIC), ClinVar, and the Leiden Open Variation Database (LOVD)—as well as population databases, all linked to a single point of access. The BRCA Challenge has brought together the existing international Evidence-based Network for the Interpretation of Germline Mutant Alleles (ENIGMA) consortium expert panel, along with expert clinicians, diagnosticians, researchers, and database providers, all with a common goal of advancing our understanding of BRCA1 and BRCA2 variation. Ongoing work includes direct contact with national centers with access to BRCA1 and BRCA2 diagnostic data to encourage data sharing, development of methods suitable for extraction of genetic variation at the level of individual laboratory reports, and engagement with participant communities to enable a more comprehensive understanding of the clinical significance of genetic variation in BRCA1 and BRCA2.

]]>
<![CDATA[Imputation accuracy of wheat genotyping-by-sequencing (GBS) data using barley and wheat genome references]]> https://www.researchpad.co/article/5c3d0126d5eed0c484038b91

Genotyping-by-sequencing (GBS) provides high SNP coverage and has recently emerged as a popular technology for genetic and breeding applications in bread wheat (Triticum aestivum L.) and many other plant species. Although GBS can discover millions of SNPs, a high rate of missing data is a major concern for many applications. Accurate imputation of those missing data can significantly improve the utility of GBS data. This study compared imputation accuracies among four genome references including three wheat references (Chinese Spring survey sequence, W7984, and IWGSC RefSeq v1.0) and one barley reference genome by comparing imputed data derived from low-depth sequencing to actual data from high-depth sequencing. After imputation, the average number of imputed data points was the highest in the B genome (~48.99%). The D genome had the lowest imputed data points (~15.02%) but the highest imputation accuracy. Among the four reference genomes, IWGSC RefSeq v1.0 reference provided the most imputed data points, but the lowest imputation accuracy for the SNPs with < 10% minor allele frequency (MAF). The W7984 reference, however, provided the highest imputation accuracy for the SNPs with < 10% MAF.

]]>
<![CDATA[Transcription of human endogenous retroviruses in human brain by RNA-seq analysis]]> https://www.researchpad.co/article/5c37b793d5eed0c484490502

Background

Human endogenous retroviruses (HERV) comprise 8% of the human genome and can be classified into at least 31 families. Increased levels of transcripts from the W and H families of HERV have been observed in association with human diseases, such as multiple sclerosis and schizophrenia. Although HERV transcripts have been detected in many tissues and cell-types based on microarray and PCR studies, the extent of HERV expression in different cell-types and diseases state has been less comprehensively studied.

Results

We examined overall transcription of HERV, and particularly of HERV-W and HERV-H elements in human postmortem brain samples obtained from individuals with psychiatric diagnoses (n = 111) and healthy controls (n = 51) by analyzing publicly available RNA sequencing datasets. Sequence reads were aligned to prototypical sequences representing HERV, downloaded from Repbase. We reported a consistent expression (0.1~0.2% of mappable reads) of different HERV families across three regions of human brains. Spearman correlations revealed highly correlated expression levels between three brain regionsacross 475 consensus sequences. By mapping sequences that aligned to the consensus sequences of HERV-W and HERV-H families to individual loci on chromosome 7, more than 60 loci from each family were identified, part of which are being transcribed. The ERVWE1, locus located at chr7q21.2, exhibited high levels of transcription across the three datasets. Notably, we demonstrated a trend of increased expression of overall HERV, as well as HERV-W family in samples from both schizophrenia and bipolar disorder patients.

Conclusions

The current analyses indicate that RNA sequencing is a useful approach for investigating global expression of repetitive elements, such as HERV, in the human genome. HERV-W/H with the tendency of transcription up-regulation in patients suggests potential implication of HERV-W/H in psychiatric diseases.

]]>
<![CDATA[Walking along chromosomes with super-resolution imaging, contact maps, and integrative modeling]]> https://www.researchpad.co/article/5c2d2eb1d5eed0c484d9b21a

Chromosome organization is crucial for genome function. Here, we present a method for visualizing chromosomal DNA at super-resolution and then integrating Hi-C data to produce three-dimensional models of chromosome organization. Using the super-resolution microscopy methods of OligoSTORM and OligoDNA-PAINT, we trace 8 megabases of human chromosome 19, visualizing structures ranging in size from a few kilobases to over a megabase. Focusing on chromosomal regions that contribute to compartments, we discover distinct structures that, in spite of considerable variability, can predict whether such regions correspond to active (A-type) or inactive (B-type) compartments. Imaging through the depths of entire nuclei, we capture pairs of homologous regions in diploid cells, obtaining evidence that maternal and paternal homologous regions can be differentially organized. Finally, using restraint-based modeling to integrate imaging and Hi-C data, we implement a method–integrative modeling of genomic regions (IMGR)–to increase the genomic resolution of our traces to 10 kb.

]]>
<![CDATA[Far away from the lamppost]]> https://www.researchpad.co/article/5c196694d5eed0c484b524af

This Formal Comment responds to a recent Meta-Research Article by identifying initiatives that are already in place for funding risky exploratory research that illuminate mysteries of the dark genome.

]]>