ResearchPad - method https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Prediction of cell position using single-cell transcriptomic data: an iterative procedure]]> https://www.researchpad.co/article/elastic_article_10981 Single-cell sequencing reveals cellular heterogeneity but not cell localization. However, by combining single-cell transcriptomic data with a reference atlas of a small set of genes, it would be possible to predict the position of individual cells and reconstruct the spatial expression profile of thousands of genes reported in the single-cell study. With the purpose of developing new algorithms, the Dialogue for Reverse Engineering Assessments and Methods (DREAM) consortium organized a crowd-sourced competition known as DREAM Single Cell Transcriptomics Challenge (SCTC). Within this context, we describe here our proposed procedures for adequate reference genes selection, and an iterative procedure to predict spatial expression profile of other genes.

]]>
<![CDATA[Protease‐resistant streptavidin for interaction proteomics]]> https://www.researchpad.co/article/elastic_article_7953 Many proteomic studies rely on streptavidin‐based purifications. To avoid streptavidin contamination, this study presents a straightforward protocol to prevent its proteolytic digestion. Protein identification rates are improved in various applications.

]]>
<![CDATA[A Computational Approach for Modeling the Allele Frequency Spectrum of Populations with Arbitrarily Varying Size]]> https://www.researchpad.co/article/elastic_article_6258 The allele frequency spectrum (AFS), or site frequency spectrum, is commonly used to summarize the genomic polymorphism pattern of a sample, which is informative for inferring population history and detecting natural selection. In 2013, Chen and Chen developed a method for analytically deriving the AFS for populations with temporally varying size through the coalescence time-scaling function. However, their approach is only applicable to population history scenarios in which the analytical form of the time-scaling function is tractable. In this paper, we propose a computational approach to extend the method to populations with arbitrary complex varying size by numerically approximating the time-scaling function. We demonstrate the performance of the approach by constructing the AFS for two population history scenarios: the logistic growth model and the Gompertz growth model, for which the AFS are unavailable with existing approaches. Software for implementing the algorithm can be downloaded at http://chenlab.big.ac.cn/software/.

]]>
<![CDATA[SPOT-Disorder2: Improved Protein Intrinsic Disorder Prediction by Ensembled Deep Learning]]> https://www.researchpad.co/article/elastic_article_6256 Intrinsically disordered or unstructured proteins (or regions in proteins) have been found to be important in a wide range of biological functions and implicated in many diseases. Due to the high cost and low efficiency of experimental determination of intrinsic disorder and the exponential increase of unannotated protein sequences, developing complementary computational prediction methods has been an active area of research for several decades. Here, we employed an ensemble of deep Squeeze-and-Excitation residual inception and long short-term memory (LSTM) networks for predicting protein intrinsic disorder with input from evolutionary information and predicted one-dimensional structural properties. The method, called SPOT-Disorder2, offers substantial and consistent improvement not only over our previous technique based on LSTM networks alone, but also over other state-of-the-art techniques in three independent tests with different ratios of disordered to ordered amino acid residues, and for sequences with either rich or limited evolutionary information. More importantly, semi-disordered regions predicted in SPOT-Disorder2 are more accurate in identifying molecular recognition features (MoRFs) than methods directly designed for MoRFs prediction. SPOT-Disorder2 is available as a web server and as a standalone program at https://sparks-lab.org/server/spot-disorder2/.

]]>
<![CDATA[nanoMLST: accurate multilocus sequence typing using Oxford Nanopore Technologies MinION with a dual-barcode approach to multiplex large numbers of samples]]> https://www.researchpad.co/article/N9c129733-73ca-419a-a4e4-cc8ba9c6cd17 Multilocus sequence typing (MLST) is one of the most commonly used methods for studying microbial lineage worldwide. However, the traditional MLST process using Sanger sequencing is time-consuming and expensive. We have designed a workflow that simultaneously sequenced seven full-length housekeeping genes of 96 meticillin-resistant isolates with dual-barcode multiplexing using just a single flow cell of an Oxford Nanopore Technologies MinION system, and then we performed bioinformatic analysis for strain typing. Fifty-one of the isolates comprising 34 sequence types had been characterized using Sanger sequencing. We demonstrate that the allele assignments obtained by our nanopore workflow (nanoMLST, available at https://github.com/jade-nhri/nanoMLST) were identical to those obtained by Sanger sequencing (359/359, with 100 % agreement rate). In addition, we estimate that our multiplex system is able to perform MLST for up to 1000 samples simultaneously; thus, providing a rapid and cost-effective solution for molecular typing.

]]>
<![CDATA[DEN-IM: dengue virus genotyping from amplicon and shotgun metagenomic sequencing]]> https://www.researchpad.co/article/Naff780e0-abf1-4148-a63f-d2882aa976e3 Dengue virus (DENV) represents a public health threat and economic burden in affected countries. The availability of genomic data is key to understanding viral evolution and dynamics, supporting improved control strategies. Currently, the use of high-throughput sequencing (HTS) technologies, which can be applied both directly to patient samples (shotgun metagenomics) and to PCR-amplified viral sequences (amplicon sequencing), is potentially the most informative approach to monitor viral dissemination and genetic diversity by providing, in a single methodological step, identification and characterization of the whole viral genome at the nucleotide level. Despite many advantages, these technologies require bioinformatics expertise and appropriate infrastructure for the analysis and interpretation of the resulting data. In addition, the many software solutions available can hamper the reproducibility and comparison of results. Here we present DEN-IM, a one-stop, user-friendly, containerized and reproducible workflow for the analysis of DENV short-read sequencing data from both amplicon and shotgun metagenomics approaches. It is able to infer the DENV coding sequence (CDS), identify the serotype and genotype, and generate a phylogenetic tree. It can easily be run on any UNIX-like system, from local machines to high-performance computing clusters, performing a comprehensive analysis without the requirement for extensive bioinformatics expertise. Using DEN-IM, we successfully analysed two types of DENV datasets. The first comprised 25 shotgun metagenomic sequencing samples from patients with variable serotypes and genotypes, including an in vitro spiked sample containing the four known serotypes. The second consisted of 106 paired-end and 76 single-end amplicon sequences of DENV 3 genotype III and DENV 1 genotype I, respectively, where DEN-IM allowed detection of the intra-genotype diversity. The DEN-IM workflow, parameters and execution configuration files, and documentation are freely available at https://github.com/B-UMMI/DEN-IM).

]]>
<![CDATA[DeepMILO: a deep learning approach to predict the impact of non-coding sequence variants on 3D chromatin structure]]> https://www.researchpad.co/article/N2b6bcf10-1e15-4d8c-bf27-8804c045bd44

Non-coding variants have been shown to be related to disease by alteration of 3D genome structures. We propose a deep learning method, DeepMILO, to predict the effects of variants on CTCF/cohesin-mediated insulator loops. Application of DeepMILO on variants from whole-genome sequences of 1834 patients of twelve cancer types revealed 672 insulator loops disrupted in at least 10% of patients. Our results show mutations at loop anchors are associated with upregulation of the cancer driver genes BCL2 and MYC in malignant lymphoma thus pointing to a possible new mechanism for their dysregulation via alteration of insulator loops.

]]>
<![CDATA[First derivative ATR-FTIR spectroscopic method as a green tool for the quantitative determination of diclofenac sodium tablets]]> https://www.researchpad.co/article/N41f18827-c4a4-4b86-828d-7019fe85f8f6

Background: Attenuated total reflection-Fourier transform infrared (ATR-FTIR) spectroscopy is a rapid quantitative method which has been applied for pharmaceutical analysis. This work describes the utility of first derivative ATR-FTIR spectroscopy in the quantitative determination of diclofenac sodium tablets.

Methods: This analytical quantitative technique depends on a first derivative measurement of the area of infrared bands corresponding to the CO stretching range of 1550-1605 cm -1. The specificity, linearity, detection limits, precision and accuracy of the calibration curve, the infrared analysis and data manipulation were determined in order to validate the method. The statistical results were compared with other methods for the quantification of diclofenac sodium.

Results: The excipients in the commercial tablet preparation did not interfere with the assay. Excellent linearity was found for the drug concentrations in the range 0.2 – 1.5 w/w %.  (r 2= 0.9994). Precision of the method was assessed by the repeated analysis of diclofenac sodium tablets; the results obtained showed small standard deviation and relative standard deviation values, which indicates that the method is quite precise. The high percentage of recovery of diclofenac sodium tablets (99.81, 101.54 and 99.41%) demonstrate the compliance of the obtained recoveries with the pharmacopeial percent recovery. The small limit of detection and limit of quantification values (0.0528 and 0.1599 w/w %, respectively) obtained by this method indicate the high sensitivity of the method.

Conclusions: First derivative ATR-FTIR spectroscopy showed high accuracy and precision, is considered as nondestructive, green, low cost and rapid, and can be applied easily for the pharmaceutical quantitative determination of diclofenac sodium tablet formulations.

]]>
<![CDATA[Optimization and clinical validation of a pathogen detection microarray]]> https://www.researchpad.co/article/5b7cc0b1463d7e2123decada

New design and optimization of pathogen detection microarrays is shown to allow robust and accurate detection of a range of pathogens. The customized microarray platform includes a method for reducing PCR bias during DNA amplification.

]]>
<![CDATA[A universal method for automated gene mapping]]> https://www.researchpad.co/article/5b79efed463d7e26194e6fcb

A high-throughput method for genotyping by mapping InDels. This method has been used to create fragment-length polymorphism maps for Drosophila and C. elegans.

]]>
<![CDATA[Identification of ciliated sensory neuron-expressed genes in Caenorhabditis elegans using targeted pull-down of poly(A) tails]]> https://www.researchpad.co/article/5b79efeb463d7e26194e6fc9

An mRNA-tagging method was used to selectively isolate mRNA from a small number of cells for subsequent cDNA microarray analysis. The approach was used to identify genes specifically expressed in ciliated sensory neurons of Caenorhabditis elegans.

]]>
<![CDATA[Large-scale exploration of growth inhibition caused by overexpression of genomic fragments in Saccharomyces cerevisiae]]> https://www.researchpad.co/article/5b79bf30463d7e168ee8f702

A screen of the Saccharomyces cerevisiae genome for fragments conferring a growth-impairment phenotype identified 714 fragments in about 84,000 clones tested.

]]>
<![CDATA[Organizing and running bioinformatics hackathons within Africa: The H3ABioNet cloud computing experience]]> https://www.researchpad.co/article/N9c2aff50-994f-4b23-a214-541b6d60fb9e

The need for portable and reproducible genomics analysis pipelines is growing globally as well as in Africa, especially with the growth of collaborative projects like the Human Health and Heredity in Africa Consortium (H3Africa). The Pan-African H3Africa Bioinformatics Network (H3ABioNet) recognized the need for portable, reproducible pipelines adapted to heterogeneous computing environments, and for the nurturing of technical expertise in workflow languages and containerization technologies. Building on the network’s Standard Operating Procedures (SOPs) for common genomic analyses, H3ABioNet arranged its first Cloud Computing and Reproducible Workflows Hackathon in 2016, with the purpose of translating those SOPs into analysis pipelines able to run on heterogeneous computing environments and meeting the needs of H3Africa research projects. This paper describes the preparations for this hackathon and reflects upon the lessons learned about its impact on building the technical and scientific expertise of African researchers. The workflows developed were made publicly available in GitHub repositories and deposited as container images on Quay.io.

]]>
<![CDATA[Avocado: a multi-scale deep tensor factorization method learns a latent representation of the human epigenome]]> https://www.researchpad.co/article/Nb5ba923b-1c31-4245-aa96-2da317c7ad43

The human epigenome has been experimentally characterized by thousands of measurements for every basepair in the human genome. We propose a deep neural network tensor factorization method, Avocado, that compresses this epigenomic data into a dense, information-rich representation. We use this learned representation to impute epigenomic data more accurately than previous methods, and we show that machine learning models that exploit this representation outperform those trained directly on epigenomic data on a variety of genomics tasks. These tasks include predicting gene expression, promoter-enhancer interactions, replication timing, and an element of 3D chromatin architecture.

]]>
<![CDATA[Completing the ENCODE3 compendium yields accurate imputations across a variety of assays and human biosamples]]> https://www.researchpad.co/article/Nb6afeebf-0ec0-4042-bf20-1846fc38fe67

Recent efforts to describe the human epigenome have yielded thousands of epigenomic and transcriptomic datasets. However, due primarily to cost, the total number of such assays that can be performed is limited. Accordingly, we applied an imputation approach, Avocado, to a dataset of 3814 tracks of data derived from the ENCODE compendium, including measurements of chromatin accessibility, histone modification, transcription, and protein binding. Avocado shows significant improvements in imputing protein binding compared to the top models in the ENCODE-DREAM challenge. Additionally, we show that the Avocado model allows for efficient addition of new assays and biosamples to a pre-trained model.

]]>
<![CDATA[Improved detection of differentially represented DNA barcodes for high‐throughput clonal phenomics]]> https://www.researchpad.co/article/Ncaa82658-f2c6-48a2-af7a-2c061e1803e7

Abstract

Cellular DNA barcoding has become a popular approach to study heterogeneity of cell populations and to identify clones with differential response to cellular stimuli. However, there is a lack of reliable methods for statistical inference of differentially responding clones. Here, we used mixtures of DNA‐barcoded cell pools to generate a realistic benchmark read count dataset for modelling a range of outcomes of clone‐tracing experiments. By accounting for the statistical properties intrinsic to the DNA barcode read count data, we implemented an improved algorithm that results in a significantly lower false‐positive rate, compared to current RNA‐seq data analysis algorithms, especially when detecting differentially responding clones in experiments with strong selection pressure. Building on the reliable statistical methodology, we illustrate how multidimensional phenotypic profiling enables one to deconvolute phenotypically distinct clonal subpopulations within a cancer cell line. The mixture control dataset and our analysis results provide a foundation for benchmarking and improving algorithms for clone‐tracing experiments.

]]>
<![CDATA[Genome sequencing in cytogenetics: Comparison of short‐read and linked‐read approaches for germline structural variant detection and characterization]]> https://www.researchpad.co/article/Nd445a98e-ab41-4c4b-8a50-dee6210ad218

Abstract

Background

Structural variants (SVs) include copy number variants (CNVs) and apparently balanced chromosomal rearrangements (ABCRs). Genome sequencing (GS) enables SV detection at base‐pair resolution, but the use of short‐read sequencing is limited by repetitive sequences, and long‐read approaches are not yet validated for diagnosis. Recently, 10X Genomics proposed Chromium, a technology providing linked‐reads to reconstruct long DNA fragments and which could represent a good alternative. No study has compared short‐read to linked‐read technologies to detect SVs in a constitutional diagnostic setting yet. The aim of this work was to determine whether the 10X Genomics technology enables better detection and comprehension of SVs than short‐read WGS.

Methods

We included 13 patients carrying various SVs. Whole genome analyses were performed using paired‐end HiSeq X sequencing with (linked‐read strategy) or without (short‐read strategy) Chromium library preparation. Two different bioinformatic pipelines were used: Variants are called using BreakDancer for short‐read strategy and LongRanger for long‐read strategy. Variant interpretations were first blinded.

Results

The short‐read strategy allowed diagnosis of known SV in 10/13 patients. After unblinding, the linked‐read strategy identified 10/13 SVs, including one (patient 7) missed by the short‐read strategy.

Conclusion

In conclusion, regarding the results of this study, 10X Genomics solution did not improve the detection and characterization of SV.

]]>
<![CDATA[ KCML: a machine‐learning framework for inference of multi‐scale gene functions from genetic perturbation screens]]> https://www.researchpad.co/article/N928f1cb3-fcba-4cf1-8e2a-e4c3f717f812

Abstract

Characterising context‐dependent gene functions is crucial for understanding the genetic bases of health and disease. To date, inference of gene functions from large‐scale genetic perturbation screens is based on ad hoc analysis pipelines involving unsupervised clustering and functional enrichment. We present Knowledge‐ and Context‐driven Machine Learning (KCML), a framework that systematically predicts multiple context‐specific functions for a given gene based on the similarity of its perturbation phenotype to those with known function. As a proof of concept, we test KCML on three datasets describing phenotypes at the molecular, cellular and population levels and show that it outperforms traditional analysis pipelines. In particular, KCML identified an abnormal multicellular organisation phenotype associated with the depletion of olfactory receptors, and TGFβ and WNT signalling genes in colorectal cancer cells. We validate these predictions in colorectal cancer patients and show that olfactory receptors expression is predictive of worse patient outcomes. These results highlight KCML as a systematic framework for discovering novel scale‐crossing and context‐dependent gene functions. KCML is highly generalisable and applicable to various large‐scale genetic perturbation screens.

]]>
<![CDATA[gscreend: modelling asymmetric count ratios in CRISPR screens to decrease experiment size and improve phenotype detection]]> https://www.researchpad.co/article/Nbfd67b40-411b-4b44-8932-f0a67b2e7027

Pooled CRISPR screens are a powerful tool to probe genotype-phenotype relationships at genome-wide scale. However, criteria for optimal design are missing, and it remains unclear how experimental parameters affect results. Here, we report that random decreases in gRNA abundance are more likely than increases due to bottle-neck effects during the cell proliferation phase. Failure to consider this asymmetry leads to loss of detection power. We provide a new statistical test that addresses this problem and improves hit detection at reduced experiment size. The method is implemented in the R package gscreend, which is available at http://bioconductor.org/packages/gscreend.

]]>
<![CDATA[Hunting to Feel Human, the Process of Women’s Help-Seeking for Suicidality After Intimate Partner Violence: A Feminist Grounded Theory and Photovoice Study]]> https://www.researchpad.co/article/Nc3398f5a-f9f5-4499-a342-6c5906a804c2

Women reach out to health care providers for a multitude of health problems in the aftermath of intimate partner violence, including suicidality; however, little is known about how they seek help. The purpose of this study was to explore how women seek help for suicidality after intimate partner violence using a feminist grounded theory and photovoice multiple qualitative research design. Interviews were conducted with 32 women from New Brunswick, Canada, and seven from this sample also participated in five photovoice meetings where they critically reflected on self-generated photos of their help-seeking experiences. Data were analyzed using the constant comparative analysis of grounded theory. Hunting to Feel Human involves fighting for a sense of belonging and personal value by perceiving validation from health care providers. Women battled System Entrapment, a feeling of being dehumanized, by Gauging for Validation and Taking the Path of Least Entrapment. Implications for health care providers include prioritizing validating interactions and adopting a relational approach to practice.

]]>