ResearchPad - methods-online https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Microfluidic automated plasmid library enrichment for biosynthetic gene cluster discovery]]> https://www.researchpad.co/article/N77bfd511-0a9f-457b-9ccb-92d6875a2225 Microbial biosynthetic gene clusters are a valuable source of bioactive molecules. However, because they typically represent a small fraction of genomic material in most metagenomic samples, it remains challenging to deeply sequence them. We present an approach to isolate and sequence gene clusters in metagenomic samples using microfluidic automated plasmid library enrichment. Our approach provides deep coverage of the target gene cluster, facilitating reassembly. We demonstrate the approach by isolating and sequencing type I polyketide synthase gene clusters from an Antarctic soil metagenome. Our method promotes the discovery of functional-related genes and biosynthetic pathways.

]]>
<![CDATA[S3norm: simultaneous normalization of sequencing depth and signal-to-noise ratio in epigenomic data]]> https://www.researchpad.co/article/N0506ba1c-997c-47e6-a79c-930edbb67ce4 Quantitative comparison of epigenomic data across multiple cell types or experimental conditions is a promising way to understand the biological functions of epigenetic modifications. However, differences in sequencing depth and signal-to-noise ratios in the data from different experiments can hinder our ability to identify real biological variation from raw epigenomic data. Proper normalization is required prior to data analysis to gain meaningful insights. Most existing methods for data normalization standardize signals by rescaling either background regions or peak regions, assuming that the same scale factor is applicable to both background and peak regions. While such methods adjust for differences in sequencing depths, they do not address differences in the signal-to-noise ratios across different experiments. We developed a new data normalization method, called S3norm, that normalizes the sequencing depths and signal-to-noise ratios across different data sets simultaneously by a monotonic nonlinear transformation. We show empirically that the epigenomic data normalized by our method, compared to existing methods, can better capture real biological variation, such as impact on gene expression regulation.

]]>
<![CDATA[Exonuclease combinations reduce noises in 3D genomics technologies]]> https://www.researchpad.co/article/Nd17c3c3c-63e4-4718-a7fb-220bef67ac3c Chromosome conformation-capture technologies are widely used in 3D genomics; however, experimentally, such methods have high-noise limitations and, therefore, require significant bioinformatics efforts to extract reliable distal interactions. Miscellaneous undesired linear DNAs, present during proximity-ligation, represent a main noise source, which needs to be minimized or eliminated. In this study, different exonuclease combinations were tested to remove linear DNA fragments from a circularized DNA preparation. This method efficiently removed linear DNAs, raised the proportion of annulation and increased the valid-pairs ratio from ∼40% to ∼80% for enhanced interaction detection in standard Hi-C. This strategy is applicable for development of various 3D genomics technologies, or optimization of Hi-C sequencing efficiency.

]]>
<![CDATA[Quantitative comparison of within-sample heterogeneity scores for DNA methylation data]]> https://www.researchpad.co/article/N75e6ae97-0171-4a6f-8b66-f518ee86dacf DNA methylation is an epigenetic mark with important regulatory roles in cellular identity and can be quantified at base resolution using bisulfite sequencing. Most studies are limited to the average DNA methylation levels of individual CpGs and thus neglect heterogeneity within the profiled cell populations. To assess this within-sample heterogeneity (WSH) several window-based scores that quantify variability in DNA methylation in sequencing reads have been proposed. We performed the first systematic comparison of four published WSH scores based on simulated and publicly available datasets. Moreover, we propose two new scores and provide guidelines for selecting appropriate scores to address cell-type heterogeneity, cellular contamination and allele-specific methylation. Most of the measures were sensitive in detecting DNA methylation heterogeneity in these scenarios, while we detected differences in susceptibility to technical bias. Using recently published DNA methylation profiles of Ewing sarcoma samples, we show that DNA methylation heterogeneity provides information complementary to the DNA methylation level. WSH scores are powerful tools for estimating variance in DNA methylation patterns and have the potential for detecting novel disease-associated genomic loci not captured by established statistics. We provide an R-package implementing the WSH scores for integration into analysis workflows.

]]>
<![CDATA[A user-friendly, high-throughput tool for the precise fluorescent quantification of deoxyribonucleoside triphosphates from biological samples]]> https://www.researchpad.co/article/Nb4c51b4b-2039-4b16-b950-76549442337e Cells maintain a fine-tuned, dynamic concentration balance in the pool of deoxyribonucleoside 5′-triphosphates (dNTPs). This balance is essential for physiological processes including cell cycle control or antiviral defense. Its perturbation results in increased mutation frequencies, replication arrest and may promote cancer development. An easily accessible and relatively high-throughput method would greatly accelerate the exploration of the diversified consequences of dNTP imbalances. The dNTP incorporation based, fluorescent TaqMan-like assay published by Wilson et al. has the aforementioned advantages over mass spectrometry, radioactive or chromatography based dNTP quantification methods. Nevertheless, the assay failed to produce reliable data in several biological samples. Therefore, we applied enzyme kinetics analysis on the fluorescent dNTP incorporation curves and found that the Taq polymerase exhibits a dNTP independent exonuclease activity that decouples signal generation from dNTP incorporation. Furthermore, we found that both polymerization and exonuclease activities are unpredictably inhibited by the sample matrix. To resolve these issues, we established a kinetics based data analysis method which identifies the signal generated by dNTP incorporation. We automated the analysis process in the nucleoTIDY software which enables even the inexperienced user to calculate the final and accurate dNTP amounts in a 96-well-plate setup within minutes.

]]>
<![CDATA[A novel NGS library preparation method to characterize native termini of fragmented DNA]]> https://www.researchpad.co/article/N1775440d-da06-4afe-95a5-bd1acb30829e Biological and chemical DNA fragmentation generates DNA molecules with a variety of termini, including blunt ends and single-stranded overhangs. We have developed a Next Generation Sequencing (NGS) assay, XACTLY, to interrogate the termini of fragmented DNA, information traditionally lost in standard NGS library preparation methods. Here we describe the XACTLY method, showcase its sensitivity and specificity, and demonstrate its utility in in vitro experiments. The XACTLY assay is able to report relative abundances of all lengths and types (5′ and 3′) of single-stranded overhangs, if present, on each DNA fragment with an overall accuracy between 80–90%. In addition, XACTLY retains the sequence of each native DNA molecule after fragmentation and can capture the genomic landscape of cleavage events at single nucleotide resolution. The XACTLY assay can be applied as a novel research and discovery tool for fragmentation analyses and in cell-free DNA.

]]>
<![CDATA[A comparative analysis of genome-wide chromatin immunoprecipitation data for mammalian transcription factors]]> https://www.researchpad.co/article/5b7c00da463d7e0599100333

Genome-wide location analysis (ChIP-chip, ChIP-PET) is a powerful technique to study mammalian transcriptional regulation. In order to obtain a basic understanding of the location data generated for mammalian transcription factors and potential issues in their analysis, we conducted a comparative study of eight independent ChIP experiments involving six different transcription factors in human and mouse. Our cross-study comparisons, to the best of our knowledge the first to analyze multiple datasets, revealed the importance of carefully chosen genomic controls in the de novo identification of key transcription factor binding motifs, raised issues about the interpretation of ubiquitously occurring sequence motifs, and demonstrated the clustering tendency of protein-binding regions for certain transcription factors.

]]>
<![CDATA[Absolute enrichment: gene set enrichment analysis for homeostatic systems]]> https://www.researchpad.co/article/5b7bff8d463d7e050fa0c25d

The Gene Set Enrichment Analysis (GSEA) identifies sets of genes that are differentially regulated in one direction. Many homeostatic systems will include one limb that is upregulated in response to a downregulation of another limb and vice versa. Such patterns are poorly captured by the standard formulation of GSEA. We describe a technique to identify groups of genes (which sometimes can be pathways) that include both up- and down-regulated components. This approach lends insights into the feedback mechanisms that may operate, especially when integrated with protein interaction databases.

]]>
<![CDATA[TypeTE: a tool to genotype mobile element insertions from whole genome resequencing data]]> https://www.researchpad.co/article/N7a29cdcb-85ee-4e7d-a0ed-c60567d34603

Abstract

Alu retrotransposons account for more than 10% of the human genome, and insertions of these elements create structural variants segregating in human populations. Such polymorphic Alus are powerful markers to understand population structure, and they represent variants that can greatly impact genome function, including gene expression. Accurate genotyping of Alus and other mobile elements has been challenging. Indeed, we found that Alu genotypes previously called for the 1000 Genomes Project are sometimes erroneous, which poses significant problems for phasing these insertions with other variants that comprise the haplotype. To ameliorate this issue, we introduce a new pipeline – TypeTE – which genotypes Alu insertions from whole-genome sequencing data. Starting from a list of polymorphic Alus, TypeTE identifies the hallmarks (poly-A tail and target site duplication) and orientation of Alu insertions using local re-assembly to reconstruct presence and absence alleles. Genotype likelihoods are then computed after re-mapping sequencing reads to the reconstructed alleles. Using a high-quality set of PCR-based genotyping of >200 loci, we show that TypeTE improves genotype accuracy from 83% to 92% in the 1000 Genomes dataset. TypeTE can be readily adapted to other retrotransposon families and brings a valuable toolbox addition for population genomics.

]]>
<![CDATA[Imaging unlabeled proteins on DNA with super-resolution]]> https://www.researchpad.co/article/N22442a84-cdb5-4341-9fbd-48a5adeebd15

Abstract

Fluorescence microscopy is invaluable to a range of biomolecular analysis approaches. The required labeling of proteins of interest, however, can be challenging and potentially perturb biomolecular functionality as well as cause imaging artefacts and photo bleaching issues. Here, we introduce inverse (super-resolution) imaging of unlabeled proteins bound to DNA. In this new method, we use DNA-binding fluorophores that transiently label bare DNA but not protein-bound DNA. In addition to demonstrating diffraction-limited inverse imaging, we show that inverse Binding-Activated Localization Microscopy or ‘iBALM’ can resolve biomolecular features smaller than the diffraction limit. The current detection limit is estimated to lie at features between 5 and 15 nm in size. Although the current image-acquisition times preclude super-resolving fast dynamics, we show that diffraction-limited inverse imaging can reveal molecular mobility at ∼0.2 s temporal resolution and that the method works both with DNA-intercalating and non-intercalating dyes. Our experiments show that such inverse imaging approaches are valuable additions to the single-molecule toolkit that relieve potential limitations posed by labeling.

]]>
<![CDATA[Measuring mRNA translation in neuronal processes and somata by tRNA-FRET]]> https://www.researchpad.co/article/N26eefdd3-9a89-43fc-a876-adf8b7011d56

Abstract

In neurons, the specific spatial and temporal localization of protein synthesis is of great importance for function and survival. Here, we visualized tRNA and protein synthesis events in fixed and live mouse primary cortical culture using fluorescently-labeled tRNAs. We were able to characterize the distribution and transport of tRNAs in different neuronal sub-compartments and to study their association with the ribosome. We found that tRNA mobility in neural processes is lower than in somata and corresponds to patterns of slow transport mechanisms, and that larger tRNA puncta co-localize with translational machinery components and are likely the functional fraction. Furthermore, chemical induction of long-term potentiation (LTP) in culture revealed up-regulation of mRNA translation with a similar effect in dendrites and somata, which appeared to be GluR-dependent 6 h post-activation. Importantly, measurement of protein synthesis in neurons with high resolutions offers new insights into neuronal function in health and disease states.

]]>
<![CDATA[A single-component light sensor system allows highly tunable and direct activation of gene expression in bacterial cells]]> https://www.researchpad.co/article/N44f5afe9-130b-43a0-868a-767e209c3c1e

Abstract

Light-regulated modules offer unprecedented new ways to control cellular behaviour with precise spatial and temporal resolution. Among a variety of bacterial light-switchable gene expression systems, single-component systems consisting of single transcription factors would be more useful due to the advantages of speed, simplicity, and versatility. In the present study, we developed a single-component light-activated bacterial gene expression system (eLightOn) based on a novel LOV domain from Rhodobacter sphaeroides (RsLOV). The eLightOn system showed significant improvements over the existing single-component bacterial light-activated expression systems, with benefits including a high ON/OFF ratio of >500-fold, a high activation level, fast activation kinetics, and/or good adaptability. Additionally, the induction characteristics, including regulatory windows, activation kinetics and light sensitivities, were highly tunable by altering the expression level of LexRO. We demonstrated the usefulness of the eLightOn system in regulating cell division and swimming by controlling the expression of the FtsZ and CheZ genes, respectively, as well as constructing synthetic Boolean logic gates using light and arabinose as the two inputs. Taken together, our data indicate that the eLightOn system is a robust and highly tunable tool for quantitative and spatiotemporal control of bacterial gene expression.

]]>
<![CDATA[Direct sequencing of RNA with MinION Nanopore: detecting mutations based on associations]]> https://www.researchpad.co/article/N1488a7bb-141d-4980-b78d-18cf16d99964

Abstract

One of the key challenges in the field of genetics is the inference of haplotypes from next generation sequencing data. The MinION Oxford Nanopore sequencer allows sequencing long reads, with the potential of sequencing complete genes, and even complete genomes of viruses, in individual reads. However, MinION suffers from high error rates, rendering the detection of true variants difficult. Here, we propose a new statistical approach named AssociVar, which differentiates between true mutations and sequencing errors from direct RNA/DNA sequencing using MinION. Our strategy relies on the assumption that sequencing errors will be dispersed randomly along sequencing reads, and hence will not be associated with each other, whereas real mutations will display a non-random pattern of association with other mutations. We demonstrate our approach using direct RNA sequencing data from evolved populations of the MS2 bacteriophage, whose small genome makes it ideal for MinION sequencing. AssociVar inferred several mutations in the phage genome, which were corroborated using parallel Illumina sequencing. This allowed us to reconstruct full genome viral haplotypes constituting different strains that were present in the sample. Our approach is applicable to long read sequencing data from any organism for accurate detection of bona fide mutations and inter-strain polymorphisms.

]]>
<![CDATA[IDR2D identifies reproducible genomic interactions]]> https://www.researchpad.co/article/Nef0ffebe-61d3-4baf-b291-7020b4b04c56

Abstract

Chromatin interaction data from protocols such as ChIA-PET, HiChIP and Hi-C provide valuable insights into genome organization and gene regulation, but can include spurious interactions that do not reflect underlying genome biology. We introduce an extension of the Irreproducible Discovery Rate (IDR) method called IDR2D that identifies replicable interactions shared by chromatin interaction experiments. IDR2D provides a principled set of interactions and eliminates artifacts from single experiments. The method is available as a Bioconductor package for the R community, as well as an online service at https://idr2d.mit.edu.

]]>
<![CDATA[Haplotyping by CRISPR-mediated DNA circularization (CRISPR-hapC) broadens allele-specific gene editing]]> https://www.researchpad.co/article/Nc7fe24af-a28a-4545-be22-eaf8c87b7545

Abstract

Allele-specific protospacer adjacent motif (asPAM)-positioning SNPs and CRISPRs are valuable resources for gene therapy of dominant disorders. However, one technical hurdle is to identify the haplotype comprising the disease-causing allele and the distal asPAM SNPs. Here, we describe a novel CRISPR-based method (CRISPR-hapC) for haplotyping. Based on the generation (with a pair of CRISPRs) of extrachromosomal circular DNA in cells, the CRISPR-hapC can map haplotypes from a few hundred bases to over 200 Mb. To streamline and demonstrate the applicability of the CRISPR-hapC and asPAM CRISPR for allele-specific gene editing, we reanalyzed the 1000 human pan-genome and generated a high frequency asPAM SNP and CRISPR database (www.crispratlas.com/knockout) for four CRISPR systems (SaCas9, SpCas9, xCas9 and Cas12a). Using the huntingtin (HTT) CAG expansion and transthyretin (TTR) exon 2 mutation as examples, we showed that the asPAM CRISPRs can specifically discriminate active and dead PAMs for all 23 loci tested. Combination of the CRISPR-hapC and asPAM CRISPRs further demonstrated the capability for achieving highly accurate and haplotype-specific deletion of the HTT CAG expansion allele and TTR exon 2 mutation in human cells. Taken together, our study provides a new approach and an important resource for genome research and allele-specific (haplotype-specific) gene therapy.

]]>
<![CDATA[Scalable and cost-effective ribonuclease-based rRNA depletion for transcriptomics]]> https://www.researchpad.co/article/N8593e718-06d0-477a-8821-b48b3e422bd8

Abstract

Bacterial RNA sequencing (RNA-seq) is a powerful approach for quantitatively delineating the global transcriptional profiles of microbes in order to gain deeper understanding of their physiology and function. Cost-effective bacterial RNA-seq requires efficient physical removal of ribosomal RNA (rRNA), which otherwise dominates transcriptomic reads. However, current methods to effectively deplete rRNA of diverse non-model bacterial species are lacking. Here, we describe a probe and ribonuclease based strategy for bacterial rRNA removal. We implemented the method using either chemically synthesized oligonucleotides or amplicon-based single-stranded DNA probes and validated the technique on three novel gut microbiota isolates from three distinct phyla. We further showed that different probe sets can be used on closely related species. We provide a detailed methods protocol, probe sets for >5000 common microbes from RefSeq, and an online tool to generate custom probe libraries. This approach lays the groundwork for large-scale and cost-effective bacterial transcriptomics studies.

]]>
<![CDATA[High-yield fabrication of DNA and RNA constructs for single molecule force and torque spectroscopy experiments]]> https://www.researchpad.co/article/N9c3b14db-12b7-43ba-a836-0b32e4fb030a

Abstract

Single molecule biophysics experiments have enabled the observation of biomolecules with a great deal of precision in space and time, e.g. nucleic acids mechanical properties and protein–nucleic acids interactions using force and torque spectroscopy techniques. The success of these experiments strongly depends on the capacity of the researcher to design and fabricate complex nucleic acid structures, as the outcome and the yield of the experiment also strongly depend on the high quality and purity of the final construct. Though the molecular biology techniques involved are well known, the fabrication of nucleic acid constructs for single molecule experiments still remains a difficult task. Here, we present new protocols to generate high quality coilable double-stranded DNA and RNA, as well as DNA and RNA hairpins with ∼500–1000 bp long stems. Importantly, we present a new approach based on single-stranded DNA (ssDNA) annealing and we use magnetic tweezers to show that this approach simplifies the fabrication of complex DNA constructs, such as hairpins, and converts more efficiently the input DNA into construct than the standard PCR-digestion-ligation approach. The protocols we describe here enable the design of a large range of nucleic acid construct for single molecule biophysics experiments.

]]>
<![CDATA[Click-encoded rolling FISH for visualizing single-cell RNA polyadenylation and structures]]> https://www.researchpad.co/article/Nb23e7f02-ef29-4d52-b06d-801483c85e3c

Abstract

Spatially resolved visualization of RNA processing and structures is important for better studying single-cell RNA function and landscape. However, currently available RNA imaging methods are limited to sequence analysis, and not capable of identifying RNA processing events and structures. Here, we developed click-encoded rolling FISH (ClickerFISH) for visualizing RNA polyadenylation and structures in single cells. In ClickerFISH, RNA 3′ polyadenylation tails, single-stranded and duplex regions are chemically labeled with different clickable DNA barcodes. These barcodes then initiate DNA rolling amplification, generating repetitive templates for FISH to image their subcellular distributions. Combined with single-molecule FISH, the proposed strategy can also obtain quantitative information of RNA of interest. Finally, we found that RNA poly(A) tailing and higher-order structures are spatially organized in a cell type-specific style with cell-to-cell heterogeneity. We also explored their spatiotemporal patterns during cell cycle stages, and revealed the highly dynamic organization especially in S phase. This method will help clarify the spatiotemporal architecture of RNA polyadenylation and structures.

]]>
<![CDATA[Convolutional neural network model to predict causal risk factors that share complex regulatory features]]> https://www.researchpad.co/article/N813f7056-933f-4e72-94f1-7f655b20d0b3

Abstract

Major progress in disease genetics has been made through genome-wide association studies (GWASs). One of the key tasks for post-GWAS analyses is to identify causal noncoding variants with regulatory function. Here, on the basis of >2000 functional features, we developed a convolutional neural network framework for combinatorial, nonlinear modeling of complex patterns shared by risk variants scattered among multiple associated loci. When applied for major psychiatric disorders and autoimmune diseases, neural and immune features, respectively, exhibited high explanatory power while reflecting the pathophysiology of the relevant disease. The predicted causal variants were concentrated in active regulatory regions of relevant cell types and tended to be in physical contact with transcription factors while residing in evolutionarily conserved regions and resulting in expression changes of genes related to the given disease. We demonstrate some examples of novel candidate causal variants and associated genes. Our method is expected to contribute to the identification and functional interpretation of potential causal noncoding variants in post-GWAS analyses.

]]>
<![CDATA[Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data]]> https://www.researchpad.co/article/N0d02c09c-33b7-47cb-b1fa-924efb782ee0

Abstract

Single-cell RNA sequencing (scRNA-seq) is a powerful tool for characterizing the cell-to-cell variation and cellular dynamics in populations which appear homogeneous otherwise in basic and translational biological research. However, significant challenges arise in the analysis of scRNA-seq data, including the low signal-to-noise ratio with high data sparsity, potential batch effects, scalability problems when hundreds of thousands of cells are to be analyzed among others. The inherent complexities of scRNA-seq data and dynamic nature of cellular processes lead to suboptimal performance of many currently available algorithms, even for basic tasks such as identifying biologically meaningful heterogeneous subpopulations. In this study, we developed the Latent Cellular Analysis (LCA), a machine learning–based analytical pipeline that combines cosine-similarity measurement by latent cellular states with a graph-based clustering algorithm. LCA provides heuristic solutions for population number inference, dimension reduction, feature selection, and control of technical variations without explicit gene filtering. We show that LCA is robust, accurate, and powerful by comparison with multiple state-of-the-art computational methods when applied to large-scale real and simulated scRNA-seq data. Importantly, the ability of LCA to learn from representative subsets of the data provides scalability, thereby addressing a significant challenge posed by growing sample sizes in scRNA-seq data analysis.

]]>