ResearchPad - sequencing-techniques https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Low LEF1 expression is a biomarker of early T-cell precursor, an aggressive subtype of T-cell lymphoblastic leukemia]]> https://www.researchpad.co/article/elastic_article_13868 Early T-cell precursor (ETP) is the only subtype of acute T-cell lymphoblastic leukemia (T-ALL) listed in the World Health Organization (WHO) classification of myeloid neoplasms and acute leukemia. Patients with ETP tend to have worse disease outcomes. ETP is defined by a series of immune markers. The diagnosis of ETP status can be vague due to the limitation of the current measurement. In this study, we performed unsupervised clustering and supervised prediction to investigate whether a molecular biomarker can be used to identify the ETP status in order to stratify risk groups. We found that the ETP status can be predicted by the expression level of Lymphoid enhancer binding factor 1 (LEF1) with high accuracy (AUC of ROC = 0.957 and 0.933 in two T-ALL cohorts). The patients with ETP subtype have a lower level of LEF1 comparing to the those without ETP. We suggest that incorporating the biomarker LEF1 with traditional immune-phenotyping will improve the diagnosis of ETP.

]]>
<![CDATA[Inferring the immune response from repertoire sequencing]]> https://www.researchpad.co/article/elastic_article_7765 High-throughput immune repertoire sequencing (RepSeq) experiments are becoming a common way to study the diversity, structure and composition of lymphocyte repertoires, promising to yield unique insight into individuals’ past infection history. However, the analysis of these sequences remains challenging, especially when comparing two different temporal or tissue samples. Here we develop a new theoretical approach and methodology to extract the characteristics of the lymphocyte repertoire response from different samples. The method is specifically tailored to RepSeq experiments and accounts for the multiple sources of noise present in these experiments. Its output provides expansion parameters, as well as a list of potentially responding clonotypes. We apply the method to describe the response to yellow fever vaccine obtained from samples taken at different time points. We also use our results to estimate the diversity and clone size statistics from data.

]]>
<![CDATA[Paleogenetic study on the 17th century Korean mummy with atherosclerotic cardiovascular disease]]> https://www.researchpad.co/article/5aafccf3463d7e7f05234537

While atherosclerotic cardiovascular disease (ASCVD) is known to be common among modern people exposed to various risk factors, recent paleopathological studies have shown that it affected ancient populations much more frequently than expected. In 2010, we investigated a 17th century Korean female mummy with presumptive ASCVD signs. Although the resulting report was a rare and invaluable conjecture on the disease status of an ancient East Asian population, the diagnosis had been based only on anatomical and radiological techniques, and so could not confirm the existence of ASCVD in the mummy. In the present study, we thus performed a paleogenetic analysis to supplement the previous conventional diagnosis of ASCVD. In aDNA extracted from the same Korean mummy, we identified the risk alleles of seven different SNPs (rs5351, rs10757274, rs2383206, rs2383207, rs10757278, rs4380028 and rs1333049) that had already been revealed to be the major risk loci of ASCVD in East Asian populations. The reliability of this study could be enhanced by cross-validation using two different analyses: Sanger and SNaPshot techniques. We were able to establish that the 17th century Korean female had a strong genetic predisposition to increased risk of ASCVD. The current paleogenetic diagnosis, the first of its kind outside Europe, re-confirms its utility as an adjunct modality for confirmatory diagnosis of ancient ASCVD.

]]>
<![CDATA[Physicochemical and biological evaluation of JR-131 as a biosimilar to a long-acting erythropoiesis-stimulating agent darbepoetin alfa]]> https://www.researchpad.co/article/Na789b0ff-1b14-409c-afa6-0c7a70fc7c42

Renal anemia is predominantly caused by a relative deficiency in erythropoietin (EPO). Conventional treatment for renal anemia includes the use of recombinant human EPO (rhEPO) or a long-acting erythropoiesis-activating agent named darbepoetin alfa, which is a modified rhEPO with a carbohydrate chain structure that differs from native hEPO. We have developed a biosimilar to darbepoetin alfa designated JR-131. Here, we comprehensively compare the physicochemical and biological characteristics of JR-131 to darbepoetin alfa. JR-131 demonstrated similar protein structure to the originator, darbepoetin alfa, by peptide mapping and circular dichroism spectroscopy. Additionally, mass spectroscopic analyses and capillary zone electrophoresis revealed similar glycosylation patterns between the two products. Human bone marrow-derived erythroblasts differentiated and proliferated to form colonies with JR-131 to a similar degree as darbepoetin alfa. Finally, JR-131 stimulated erythropoiesis and improved anemia in rats similarly to darbepoetin alfa. Our data show the similarity in physicochemical and biological properties of JR-131 to those of darbepoetin alfa, and JR-131 therefore represents a biosimilar for use in the treatment of renal anemia.

]]>
<![CDATA[Variants encoding a restricted carboxy-terminal domain of SLC12A2 cause hereditary hearing loss in humans]]> https://www.researchpad.co/article/Nd1837fa5-7737-42fc-aa07-ce2092d99c03

Hereditary hearing loss is challenging to diagnose because of the heterogeneity of the causative genes. Further, some genes involved in hereditary hearing loss have yet to be identified. Using whole-exome analysis of three families with congenital, severe-to-profound hearing loss, we identified a missense variant of SLC12A2 in five affected members of one family showing a dominant inheritance mode, along with de novo splice-site and missense variants of SLC12A2 in two sporadic cases, as promising candidates associated with hearing loss. Furthermore, we detected another de novo missense variant of SLC12A2 in a sporadic case. SLC12A2 encodes Na+, K+, 2Cl cotransporter (NKCC) 1 and plays critical roles in the homeostasis of K+-enriched endolymph. Slc12a2-deficient mice have congenital, profound deafness; however, no human variant of SLC12A2 has been reported as associated with hearing loss. All identified SLC12A2 variants mapped to exon 21 or its 3’-splice site. In vitro analysis indicated that the splice-site variant generates an exon 21-skipped SLC12A2 mRNA transcript expressed at much lower levels than the exon 21-included transcript in the cochlea, suggesting a tissue-specific role for the exon 21-encoded region in the carboy-terminal domain. In vitro functional analysis demonstrated that Cl influx was significantly decreased in all SLC12A2 variants studied. Immunohistochemistry revealed that SLC12A2 is located on the plasma membrane of several types of cells in the cochlea, including the strial marginal cells, which are critical for endolymph homeostasis. Overall, this study suggests that variants affecting exon 21 of the SLC12A2 transcript are responsible for hereditary hearing loss in humans.

]]>
<![CDATA[First description of a herpesvirus infection in genus Lepus]]> https://www.researchpad.co/article/N2b9a02c7-7220-4716-8700-9456c07e4236

During the necropsies of Iberian hares obtained in 2018/2019, along with signs of the nodular form of myxomatosis, other unexpected external lesions were also observed. Histopathology revealed nuclear inclusion bodies in stromal cells suggesting the additional presence of a nuclear replicating virus. Transmission electron microscopy further demonstrated the presence of herpesvirus particles in the tissues of affected hares. We confirmed the presence of herpesvirus in 13 MYXV-positive hares by PCR and sequencing analysis. Herpesvirus-DNA was also detected in seven healthy hares, suggesting its asymptomatic circulation. Phylogenetic analysis based on concatenated partial sequences of DNA polymerase gene and glycoprotein B gene enabled greater resolution than analysing the sequences individually. The hare’ virus was classified close to herpesviruses from rodents within the Rhadinovirus genus of the gammaherpesvirus subfamily. We propose to name this new virus Leporid gammaherpesvirus 5 (LeHV-5), according to the International Committee on Taxonomy of Viruses standards. The impact of herpesvirus infection on the reproduction and mortality of the Iberian hare is yet unknown but may aggravate the decline of wild populations caused by the recently emerged natural recombinant myxoma virus.

]]>
<![CDATA[Detection of microbial cell-free DNA in maternal and umbilical cord plasma in patients with chorioamnionitis using next generation sequencing]]> https://www.researchpad.co/article/N85cfbb28-a074-423a-88cd-d5e05af52830

Background

Chorioamnionitis has been linked to spontaneous preterm labor and complications such as neonatal sepsis. We hypothesized that microbial cell-free (cf) DNA would be detectable in maternal plasma in patients with chorioamnionitis and could be the basis for a non-invasive method to detect fetal exposure to microorganisms.

Objective

The purpose of this study was to determine whether next generation sequencing could detect microbial cfDNA in maternal plasma in patients with chorioamnionitis.

Study design

Maternal plasma (n = 94) and umbilical cord plasma (n = 120) were collected during delivery at gestational age 28–41 weeks. cfDNA was extracted and sequenced. Umbilical cord plasma samples with evidence of contamination were excluded. The prevalence of microorganisms previously implicated in choriomanionitis, neonatal sepsis and intra-amniotic infections, as described in the literature, were examined to determine if there was enrichment of these microorganisms in this cohort. Specific microbial cfDNA associated with chorioamnionitis was first detected in umbilical cord plasma and confirmed in the matched maternal plasma samples (n = 77 matched pairs) among 14 cases of histologically confirmed chorioamnionitis and one case of clinical chorioamnionitis; 63 paired samples were used as controls. A correlation of rank of a given microorganism across maternal plasma and matched umbilical cord plasma was used to assess whether signals found in umbilical cord plasma were also present in maternal plasma.

Results

Microbial DNA sequences associated with clinical and/or histological chorioamnionitis were enriched in maternal plasma in cases with suspected chorioamnionitis when compared to controls (12/14 microorganisms, p = 0.02). Analysis of the microbial cfDNA in umbilical cord plasma among the 1,251 microorganisms detectable with this assay identified Streptococcus mitis, Ureaplasma spp., and Mycoplasma spp. in cases of suspected chorioamnionitis. This assay also detected cfDNA from Lactobacillus spp. in controls. Comparison between maternal plasma and umbilical cord plasma confirmed these signatures were also present in maternal plasma. Unbiased analysis of microorganisms with significantly correlated signal between matched maternal plasma and umbilical cord plasma identified the above listed 3 microorganisms, all of which have previously been implicated in patients with chorioamnionitis (Mycoplasma hominis p = 0.0001; Ureaplasma parvum p = 0.002; Streptococcus mitis p = 0.007). These data show that the pathogen signal relevant for chorioamnionitis can be identified in both maternal and umbilical cord plasma.

Conclusion

This is the first report showing the detection of relevant microbial cell-free cfDNA in maternal plasma and umbilical cord plasma in patients with clinical and/or histological chorioamnionitis. These results may lead to the development of a specific assay to detect perinatal infections for targeted therapy to reduce early neonatal sepsis complications.

]]>
<![CDATA[Functional dynamics of bacterial species in the mouse gut microbiome revealed by metagenomic and metatranscriptomic analyses]]> https://www.researchpad.co/article/N74c1e0c6-8f1d-4282-af8e-273065d64236

Background

Microbial communities of the mouse gut have been extensively studied; however, their functional roles and regulation are yet to be elucidated. Metagenomic and metatranscriptomic analyses may allow us a comprehensive profiling of bacterial composition and functions of the complex gut microbiota. The present study aimed to investigate the active functions of the microbial communities in the murine cecum by analyzing both metagenomic and metatranscriptomic data on specific bacterial species within the microbial communities, in addition to the whole microbiome.

Results

Bacterial composition of the healthy mouse gut microbiome was profiled using the following three different approaches: 16S rRNA-based profiling based on amplicon and shotgun sequencing data, and genome-based profiling based on shotgun sequencing data. Consistently, Bacteroidetes, Firmicutes, and Deferribacteres emerged as the major phyla. Based on NCBI taxonomy, Muribaculaceae, Lachnospiraceae, and Deferribacteraceae were the predominant families identified in each phylum. The genes for carbohydrate metabolism were upregulated in Muribaculaceae, while genes for cofactors and vitamin metabolism and amino acid metabolism were upregulated in Deferribacteraceae. The genes for translation were commonly enhanced in all three families. Notably, combined analysis of metagenomic and metatranscriptomic sequencing data revealed that the functions of translation and metabolism were largely upregulated in all three families in the mouse gut environment. The ratio of the genes in the metagenome and their expression in the metatranscriptome indicated higher expression of carbohydrate metabolism in Muribaculum, Duncaniella, and Mucispirillum.

Conclusions

We demonstrated a fundamental methodology for linking genomic and transcriptomic datasets to examine functional activities of specific bacterial species in a complicated microbial environment. We investigated the normal flora of the mouse gut using three different approaches and identified Muribaculaceae, Lachnospiraceae, and Deferribacteraceae as the predominant families. The functional distribution of these families was reflected in the entire microbiome. By comparing the metagenomic and metatranscriptomic data, we found that the expression rates differed for different functional categories in the mouse gut environment. Application of these methods to track microbial transcription in individuals over time, or before and after administration of a specific stimulus will significantly facilitate future development of diagnostics and treatments.

]]>
<![CDATA[A graph-based algorithm for RNA-seq data normalization]]> https://www.researchpad.co/article/N0b813aa9-b155-4778-93ba-b0f37d26ae8a

The use of RNA-sequencing has garnered much attention in recent years for characterizing and understanding various biological systems. However, it remains a major challenge to gain insights from a large number of RNA-seq experiments collectively, due to the normalization problem. Normalization has been challenging due to an inherent circularity, requiring that RNA-seq data be normalized before any pattern of differential (or non-differential) expression can be ascertained; meanwhile, the prior knowledge of non-differential transcripts is crucial to the normalization process. Some methods have successfully overcome this problem by the assumption that most transcripts are not differentially expressed. However, when RNA-seq profiles become more abundant and heterogeneous, this assumption fails to hold, leading to erroneous normalization. We present a normalization procedure that does not rely on this assumption, nor prior knowledge about the reference transcripts. This algorithm is based on a graph constructed from intrinsic correlations among RNA-seq transcripts and seeks to identify a set of densely connected vertices as references. Application of this algorithm on our synthesized validation data showed that it could recover the reference transcripts with high precision, thus resulting in high-quality normalization. On a realistic data set from the ENCODE project, this algorithm gave good results and could finish in a reasonable time. These preliminary results imply that we may be able to break the long persisting circularity problem in RNA-seq normalization.

]]>
<![CDATA[How “simple” methodological decisions affect interpretation of population structure based on reduced representation library DNA sequencing: A case study using the lake whitefish]]> https://www.researchpad.co/article/N3bb2bc39-24d6-4fe3-98ed-f97dea058c57

Reduced representation (RRL) sequencing approaches (e.g., RADSeq, genotyping by sequencing) require decisions about how much to invest in genome coverage and sequencing depth, as well as choices of values for adjustable bioinformatics parameters. To empirically explore the importance of these “simple” methodological decisions, we generated two independent sequencing libraries for the same 142 individual lake whitefish (Coregonus clupeaformis) using a nextRAD RRL approach: (1) a larger number of loci at low sequencing depth based on a 9mer (library A); and (2) fewer loci at higher sequencing depth based on a 10mer (library B). The fish were selected from populations with different levels of expected genetic subdivision. Each library was analyzed using the STACKS pipeline followed by three types of population structure assessment (FST, DAPC and ADMIXTURE) with iterative increases in the stringency of sequencing depth and missing data requirements, as well as more specific a priori population maps. Library B was always able to resolve strong population differentiation in all three types of assessment regardless of the selected parameters, largely due to retention of more loci in analyses. In contrast, library A produced more variable results; increasing the minimum sequencing depth threshold (-m) resulted in a reduced number of retained loci, and therefore lost resolution at high -m values for FST and ADMIXTURE, but not DAPC. When detecting fine population differentiation, the population map influenced the number of loci and missing data, which generated artefacts in all downstream analyses tested. Similarly, when examining fine scale population subdivision, library B was robust to changing parameters but library A lost resolution depending on the parameter set. We used library B to examine actual subdivision in our study populations. All three types of analysis found complete subdivision among populations in Lake Huron, ON and Dore Lake, SK, Canada using 10,640 SNP loci. Weak population subdivision was detected in Lake Huron with fish from sites in the north-west, Search Bay, North Point and Hammond Bay, showing slight differentiation. Overall, we show that apparently simple decisions about library construction and bioinformatics parameters can have important impacts on the interpretation of population subdivision. Although potentially more costly on a per-locus basis, early investment in striking a balance between the number of loci and sequencing effort is well worth the reduced genomic coverage for population genetics studies. More conservative stringency settings on STACKS parameters lead to a final dataset that was more consistent and robust when examining both weak and strong population differentiation. Overall, we recommend that researchers approach “simple” methodological decisions with caution, especially when working on non-model species for the first time.

]]>
<![CDATA[Identification and expression profiling of miRNAs in two color variants of carrot (Daucus carota L.) using deep sequencing]]> https://www.researchpad.co/article/5c8accd2d5eed0c48499009d

microRNAs represent small endogenous RNAs which are known to play a crucial role in various plant metabolic processes. Carrot being an important vegetable crop, represents one of the richest sources of carotenoids and anthocyanins. Most of the studies on microRNAs have been conducted in the aerial parts of the plants. However, carrot has the rare distinction of storing these compounds in roots. Therefore, carrot represents a good model system to unveil the regulatory roles of miRNAs in the underground edible part of the plant. For the first time, we report the genome wide identification and expression profiling of miRNAs in two contrasting color variants of carrot namely Orange Red and Purple Black using RNA-seq. Illumina sequencing resulted in the generation of 25.5M and 18.9M reads in Orange Red and Purple Black libraries, respectively. In total, 144 and 98 (read count >10), conserved microRNAs and 36 and 66 novel microRNAs were identified in Orange Red and Purple Black, respectively. Functional categorization and differential gene expression revealed the presence of several miRNA genes targeting various secondary metabolic pathways including carotenoid and anthocyanin biosynthetic pathways in the two libraries. 11 known and 2 novel microRNAs were further validated using Stem-Loop PCR and qRT-PCR. Also, target validation was performed for selected miRNA genes using RLM-RACE approach. The present work has laid a foundation towards understanding of various metabolic processes, particularly the color development in carrot. This information can be further employed in targeted gene expression for increasing the carotenoid and anthocyanin content in crop plants.

]]>
<![CDATA[Determination of essential phenotypic elements of clusters in high-dimensional entities—DEPECHE]]> https://www.researchpad.co/article/5c8accc7d5eed0c48498ffa7

Technological advances have facilitated an exponential increase in the amount of information that can be derived from single cells, necessitating new computational tools that can make such highly complex data interpretable. Here, we introduce DEPECHE, a rapid, parameter free, sparse k-means-based algorithm for clustering of multi- and megavariate single-cell data. In a number of computational benchmarks aimed at evaluating the capacity to form biologically relevant clusters, including flow/mass-cytometry and single cell RNA sequencing data sets with manually curated gold standard solutions, DEPECHE clusters as well or better than the currently available best performing clustering algorithms. However, the main advantage of DEPECHE, compared to the state-of-the-art, is its unique ability to enhance interpretability of the formed clusters, in that it only retains variables relevant for cluster separation, thereby facilitating computational efficient analyses as well as understanding of complex datasets. DEPECHE is implemented in the open source R package DepecheR currently available at github.com/Theorell/DepecheR.

]]>
<![CDATA[Molecular analyses and phylogeny of the herpes simplex virus 2 US9 and glycoproteins gE/gI obtained from infected subjects during the Herpevac Trial for Women]]> https://www.researchpad.co/article/5c8c193cd5eed0c484b4d241

Herpes simplex virus 2 (HSV-2) is a large double-stranded DNA virus that causes genital sores when spread by sexual contact and is a principal cause of viral encephalitis in newborns and infants. Viral glycoproteins enable virion entry into and spread between cells, making glycoproteins a prime target for vaccine development. A truncated glycoprotein D2 (gD2) vaccine candidate, recently tested in the phase 3 Herpevac Trial for Women, did not prevent HSV-2 infection in initially seronegative women. Some women who became infected experienced multiple recurrences during the trial. The HSV US7, US8, and US9 genes encode glycoprotein I (gI), glycoprotein E (gE), and the US9 type II membrane protein, respectively. These proteins participate in viral spread across cell junctions and facilitate anterograde transport of virion components in neurons, prompting us to investigate whether sequence variants in these genes could be associated with frequent recurrence. The nucleotide sequences and dN/dS ratios of the US7-US9 region from viral isolates of individuals who experienced multiple recurrences were compared with those who had had a single episode of disease. No consistent polymorphism(s) distinguished the recurrent isolates. In frequently recurring isolates, the dN/dS ratio of US7 was low while greater variation (higher dN/dS ratio) occurred in US8, suggesting conserved function of the former during reactivation. Phylogenetic reconstruction of the US7-US9 region revealed eight strongly supported clusters within the 55 U.S. HSV-2 strains sampled, which were preserved in a second global phylogeny. Thus, although we have demonstrated evolutionary diversity in the US7-US9 complex, we found no molecular evidence of sequence variation in US7-US9 that distinguishes isolates from subjects with frequently recurrent episodes of disease.

]]>
<![CDATA[Profile of the tprK gene in primary syphilis patients based on next-generation sequencing]]> https://www.researchpad.co/article/5c784fecd5eed0c484007915

Background

The highly variable tprK gene of Treponema pallidum has been acknowledged to be one of the mechanisms that causes persistent infection. Previous studies have mainly focused on the heterogeneity in tprK in propagated strains using a clone-based Sanger approach. Few studies have investigated tprK directly from clinical samples using deep sequencing.

Methods/Principal findings

We conducted a comprehensive analysis of 14 primary syphilis clinical isolates of T. pallidum via next-generation sequencing to gain better insight into the profile of tprK in primary syphilis patients. Our results showed that there was a mixture of distinct sequences within each V region of tprK. Except for the predominant sequence for each V region as previously reported using the clone-based Sanger approach, there were many minor variants of all strains that were mainly observed at a frequency of 1–5%. Interestingly, the identified distinct sequences within the regions were variable in length and differed by only 3 bp or multiples of 3 bp. In addition, amino acid sequence consistency within each V region was found among the 14 strains. Among the regions, the sequence IASDGGAIKH in V1 and the sequence DVGHKKENAANVNGTVGA in V4 showed a high stability of inter-strain redundancy.

Conclusions

The seven V regions of the tprK gene in primary syphilis infection demonstrated high diversity; they generally contained a high proportion sequence and numerous low-frequency minor variants, most of which are far below the detection limit of Sanger sequencing. The rampant variation in each V region was regulated by a strict gene conversion mechanism that maintained the length difference to 3 bp or multiples of 3 bp. The highly stable sequence of inter-strain redundancy may indicate that the sequences play a critical role in T. pallidum virulence. These highly stable peptides are also likely to be potential targets for vaccine development.

]]>
<![CDATA[A new highly sensitive real-time quantitative-PCR method for detection of BCR-ABL1 to monitor minimal residual disease in chronic myeloid leukemia after discontinuation of imatinib]]> https://www.researchpad.co/article/5c8823f1d5eed0c4846393bf

Tyrosine kinase inhibitors (TKIs) targeting the BCR-ABL1 fusion protein, encoded by the Philadelphia chromosome, have drastically improved the outcomes for patients with chronic myeloid leukemia (CML). Although several real-time quantitative polymerase chain reaction (RQ-PCR) kits for the detection of BCR-ABL1 transcripts are commercially available, their accuracy and efficiency in laboratory practice require reevaluation. We have developed a new in-house RQ-PCR method to detect minimal residual disease (MRD) in CML cases. MRD was analyzed in 102 patients with CML from the DOMEST study, a clinical trial to study the rationale for imatinib mesylate discontinuation in Japan. The BCR-ABL1/ABL1 ratio was evaluated using the international standard (IS) ratio, where IS < 0.1% was defined as a major molecular response. At enrollment, BCR-ABL1 transcripts were undetectable in all samples using a widely-applied RQ-PCR method performed in the commercial laboratory, BML (BML Inc., Tokyo, Japan); however, the in-house method detected the BCR-ABL1 transcripts in five samples (5%) (mean IS ratio: 0.0062 ± 0.0010%). After discontinuation of imatinib, BCR-ABL1 transcripts were detected using the in-house RQ-PCR in 21 patients (21%) that were not positive using the BML method. Nineteen samples were also tested using a commercially available RQ-PCR assay kit with a detection limit of IS ratio, 0.0032 (ODK-1201, Otsuka Pharmaceutical Co., Tokyo, Japan). This method detected low levels of BCR-ABL1 transcripts in 14 samples (74%), but scored negative for five samples (26%) that were positive using the in-house method. From the perspective of the in-house RQ-PCR method, number of patients confirmed loss of MMR was 4. These data suggest that our new in-house RQ-PCR method is effective for monitoring MRD in CML.

]]>
<![CDATA[HCV transmission in high-risk communities in Bulgaria]]> https://www.researchpad.co/article/5c882406d5eed0c4846395b0

Background

The rate of HIV infection in Bulgaria is low. However, the rate of HCV-HIV-coinfection and HCV infection is high, especially among high-risk communities. The molecular epidemiology of those infections has not been studied before.

Methods

Consensus Sanger sequences of HVR1 and NS5B from 125 cases of HIV/HCV coinfections, collected during 2010–2014 in 15 different Bulgarian cities, were used for preliminary phylogenetic evaluation. Next-generation sequencing (NGS) data of the hypervariable region 1 (HVR1) analyzed via the Global Hepatitis Outbreak and Surveillance Technology (GHOST) were used to evaluate genetic heterogeneity and possible transmission linkages. Links between pairs that were below and above the established genetic distance threshold, indicative of transmission, were further examined by generating k-step networks.

Results

Preliminary genetic analyses showed predominance of HCV genotype 1a (54%), followed by 1b (20.8%), 2a (1.4%), 3a (22.3%) and 4a (1.4%), indicating ongoing transmission of many HCV strains of different genotypes. NGS of HVR1 from 72 cases showed significant genetic heterogeneity of intra-host HCV populations, with 5 cases being infected with 2 different genotypes or subtypes and 6 cases being infected with 2 strains of same subtype. GHOST revealed 8 transmission clusters involving 30 cases (41.7%), indicating a high rate of transmission.

Four transmission clusters were found in Sofia, three in Plovdiv, and one in Peshtera. The main risk factor for the clusters was injection drug use. Close genetic proximity among HCV strains from the 3 Sofia clusters, and between HCV strains from Peshtera and one of the two Plovdiv clusters confirms a long and extensive transmission history of these strains in Bulgaria.

Conclusions

Identification of several HCV genotypes and many HCV strains suggests a frequent introduction of HCV to the studied high-risk communities. GHOST detected a broad transmission network, which sustains circulation of several HCV strains since their early introduction in the 3 cities. This is the first report on the molecular epidemiology of HIV/HCV coinfections in Bulgaria.

]]>
<![CDATA[Targeted next generation sequencing can serve as an alternative to conventional tests in myeloid neoplasms]]> https://www.researchpad.co/article/5c897760d5eed0c4847d2b67

The 2016 World Health Organization classification introduced a number of genes with somatic mutations and a category for germline predisposition syndromes in myeloid neoplasms. We have designed a comprehensive next-generation sequencing assay to detect somatic mutations, translocations, and germline mutations in a single assay and have evaluated its clinical utility in patients with myeloid neoplasms. Extensive and specified bioinformatics analyses were undertaken to detect single nucleotide variations, FLT3 internal tandem duplication, genic copy number variations, and chromosomal copy number variations. This enabled us to maximize the clinical utility of the assay, and we concluded that, as a single assay, it can be a good supplement for many conventional tests, including Sanger sequencing, RT-PCR, and cytogenetics. Of note, we found that 8.4–11.6% of patients with acute myeloid leukemia and 12.9% of patients with myeloproliferative neoplasms had germline mutations, and most were heterozygous carriers for autosomal recessive marrow failure syndromes. These patients often did not respond to standard chemotherapy, suggesting that germline predisposition may have distinct and significant clinical implications.

]]>
<![CDATA[16S rRNA sequence embeddings: Meaningful numeric feature representations of nucleotide sequences that are convenient for downstream analyses]]> https://www.researchpad.co/article/5c7ee7c5d5eed0c4848f4d9c

Advances in high-throughput sequencing have increased the availability of microbiome sequencing data that can be exploited to characterize microbiome community structure in situ. We explore using word and sentence embedding approaches for nucleotide sequences since they may be a suitable numerical representation for downstream machine learning applications (especially deep learning). This work involves first encoding (“embedding”) each sequence into a dense, low-dimensional, numeric vector space. Here, we use Skip-Gram word2vec to embed k-mers, obtained from 16S rRNA amplicon surveys, and then leverage an existing sentence embedding technique to embed all sequences belonging to specific body sites or samples. We demonstrate that these representations are meaningful, and hence the embedding space can be exploited as a form of feature extraction for exploratory analysis. We show that sequence embeddings preserve relevant information about the sequencing data such as k-mer context, sequence taxonomy, and sample class. Specifically, the sequence embedding space resolved differences among phyla, as well as differences among genera within the same family. Distances between sequence embeddings had similar qualities to distances between alignment identities, and embedding multiple sequences can be thought of as generating a consensus sequence. In addition, embeddings are versatile features that can be used for many downstream tasks, such as taxonomic and sample classification. Using sample embeddings for body site classification resulted in negligible performance loss compared to using OTU abundance data, and clustering embeddings yielded high fidelity species clusters. Lastly, the k-mer embedding space captured distinct k-mer profiles that mapped to specific regions of the 16S rRNA gene and corresponded with particular body sites. Together, our results show that embedding sequences results in meaningful representations that can be used for exploratory analyses or for downstream machine learning applications that require numeric data. Moreover, because the embeddings are trained in an unsupervised manner, unlabeled data can be embedded and used to bolster supervised machine learning tasks.

]]>
<![CDATA[Presence, persistence and effects of pre-treatment HIV-1 drug resistance variants detected using next generation sequencing: A Retrospective longitudinal study from rural coastal Kenya]]> https://www.researchpad.co/article/5c6dc9f3d5eed0c48452a5bd

Background

The epidemiology of HIV-1 drug resistance (HIVDR) determined by Sanger capillary sequencing, has been widely studied. However, much less is known about HIVDR detected using next generation sequencing (NGS) methods. We aimed to determine the presence, persistence and effect of pre-treatment HIVDR variants detected using NGS in HIV-1 infected antiretroviral treatment (ART) naïve participants from rural Coastal Kenya.

Methods

In a retrospective longitudinal study, samples from HIV-1 infected participants collected prior [n = 2 time-points] and after [n = 1 time-point] ART initiation were considered. An ultra-deep amplicon-based NGS assay, calling for nucleotide variants at >2.0% frequency of viral population, was used. Suspected virologic failure (sVF) was defined as a one-off HIV-1 viral load of >1000 copies/ml whilst on ART.

Results

Of the 50 eligible participants, 12 (24.0% [95% CI: 13.1–38.2]) had at least one detectable pre-treatment HIVDR variant against Protease Inhibitors (PIs, n = 6 [12%]), Nucleoside Reverse Transcriptase Inhibitors (NRTIs, n = 4 [8.0%]) and Non-NRTIs (n = 3 [6.0%]). Overall, 15 pre-treatment resistance variants were detected (frequency, range: 2.3–92.0%). A positive correlation was observed between mutation frequency and absolute load for NRTI and/or NNRTI variants (r = 0.761 [p = 0.028]), but not for PI variants (r = -0.117 [p = 0.803]). Participants with pre-treatment NRTI and/or NNRTI resistance had increased odds of sVF (OR = 6.0; 95% CI = 1.0–36.9; p = 0.054).

Conclusions

Using NGS, pre-treatment resistance variants were common, though observed PI variants were unlikely transmitted, but rather probably generated de novo. Even when detected from a low frequency, pre-treatment NRTI and/or NNRTI resistance variants may adversely affect treatment outcomes.

]]>
<![CDATA[Information about variations in multiple copies of bacterial 16S rRNA genes may aid in species identification]]> https://www.researchpad.co/article/5c706762d5eed0c4847c6f93

Variable region analysis of 16S rRNA gene sequences is the most common tool in bacterial taxonomic studies. Although used for distinguishing bacterial species, its use remains limited due to the presence of variable copy numbers with sequence variation in the genomes. In this study, 16S rRNA gene sequences, obtained from completely assembled whole genome and Sanger electrophoresis sequencing of cloned PCR products from Serratia fonticola GS2, were compared. Sanger sequencing produced a combination of sequences from multiple copies of 16S rRNA genes. To determine whether the variant copies of 16S rRNA genes affected Sanger sequencing, two ratios (5:5 and 8:2) with different concentrations of cloned 16S rRNA genes were used; it was observed that the greater the number of copies with similar sequences the higher its chance of amplification. Effect of multiple copies for taxonomic classification of 16S rRNA gene sequences was investigated using the strain GS2 as a model. 16S rRNA copies with the maximum variation had 99.42% minimum pairwise similarity and this did not have an effect on species identification. Thus, PCR products from genomes containing variable 16S rRNA gene copies can provide sufficient information for species identification except from species which have high similarity of sequences in their 16S rRNA gene copies like the case of Bacillus thuringiensis and Bacillus cereus. In silico analysis of 1,616 bacterial genomes from long-read sequencing was also done. The average minimum pairwise similarity for each phylum was reported with their average genome size and average “unique copies” of 16S rRNA genes and we found that the phyla Proteobacteria and Firmicutes showed the highest amount of variation in their copies of their 16S rRNA genes. Overall, our results shed light on how the variations in the multiple copies of the 16S rRNA genes of bacteria can aid in appropriate species identification.

]]>