ResearchPad - sequence-assembly-tools https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Within-patient plasmid dynamics in <i>Klebsiella pneumoniae</i> during an outbreak of a carbapenemase-producing <i>Klebsiella pneumoniae</i>]]> https://www.researchpad.co/article/elastic_article_15752 Knowledge of within-patient dynamics of resistance plasmids during outbreaks is important for understanding the persistence and transmission of plasmid-mediated antimicrobial resistance. During an outbreak of a Klebsiella pneumoniae carbapenemase-producing (KPC) K. pneumoniae, the plasmid and chromosomal dynamics of K. pneumoniae within-patients were investigated.MethodsDuring the outbreak, all K. pneumoniae isolates of colonized or infected patients were collected, regardless of their susceptibility pattern. A selection of isolates was short-read and long-read sequenced. A hybrid assembly of the short-and long-read sequence data was performed. Plasmid contigs were extracted from the hybrid assembly, annotated, and within patient plasmid comparisons were performed.ResultsFifteen K. pneumoniae isolates of six patients were short-read whole-genome sequenced. Whole-genome multi-locus sequence typing revealed a maximum of 4 allele differences between the sequenced isolates. Within patients 1 and 2 the resistance gene- and plasmid replicon-content did differ between the isolates sequenced. Long-read sequencing and hybrid assembly of 4 isolates revealed loss of the entire KPC-gene containing plasmid in the isolates of patient 2 and a recombination event between the plasmids in the isolates of patient 1. This resulted in two different KPC-gene containing plasmids being simultaneously present during the outbreak.ConclusionDuring a hospital outbreak of a KPC-producing K. pneumoniae isolate, plasmid loss of the KPC-gene carrying plasmid and plasmid recombination was detected within the isolates from two patients. When investigating outbreaks, one should be aware that plasmid transmission can occur and the possibility of within- and between-patient plasmid variation needs to be considered. ]]> <![CDATA[Rediscovering an old foe: Optimised molecular methods for DNA extraction and sequencing applications for fungarium specimens of powdery mildew (Erysiphales)]]> https://www.researchpad.co/article/elastic_article_14476 The purpose of this study was to identify a reliable DNA extraction protocol to use on 25-year-old powdery mildew specimens from the reference collection VPRI in order to produce high quality sequences suitable to address taxonomic phylogenetic questions. We tested 13 extraction protocols and two library preparation kits and found the combination of the E.Z.N.A.® Forensic DNA kit for DNA extraction and the NuGen Ovation® Ultralow System library preparation kit was the most suitable for this purpose.

]]>
<![CDATA[Genome reconstruction of the non-culturable spinach downy mildew <i>Peronospora effusa</i> by metagenome filtering]]> https://www.researchpad.co/article/elastic_article_13800 Peronospora effusa (previously known as P. farinosa f. sp. spinaciae, and here referred to as Pfs) is an obligate biotrophic oomycete that causes downy mildew on spinach (Spinacia oleracea). To combat this destructive many disease resistant cultivars have been bred and used. However, new Pfs races rapidly break the employed resistance genes. To get insight into the gene repertoire of Pfs and identify infection-related genes, the genome of the first reference race, Pfs1, was sequenced, assembled, and annotated. Due to the obligate biotrophic nature of this pathogen, material for DNA isolation can only be collected from infected spinach leaves that, however, also contain many other microorganisms. The obtained sequences can, therefore, be considered a metagenome. To filter and obtain Pfs sequences we utilized the CAT tool to taxonomically annotate ORFs residing on long sequences of a genome pre-assembly. This study is the first to show that CAT filtering performs well on eukaryotic contigs. Based on the taxonomy, determined on multiple ORFs, contaminating long sequences and corresponding reads were removed from the metagenome. Filtered reads were re-assembled to provide a clean and improved Pfs genome sequence of 32.4 Mbp consisting of 8,635 scaffolds. Transcript sequencing of a range of infection time points aided the prediction of a total of 13,277 gene models, including 99 RxLR(-like) effector, and 14 putative Crinkler genes. Comparative analysis identified common features in the predicted secretomes of different obligate biotrophic oomycetes, regardless of their phylogenetic distance. Their secretomes are generally smaller, compared to hemi-biotrophic and necrotrophic oomycete species. We observe a reduction in proteins involved in cell wall degradation, in Nep1-like proteins (NLPs), proteins with PAN/apple domains, and host translocated effectors. The genome of Pfs1 will be instrumental in studying downy mildew virulence and for understanding the molecular adaptations by which new isolates break spinach resistance.

]]>
<![CDATA[Transcriptomic analysis of polyketide synthases in a highly ciguatoxic dinoflagellate, Gambierdiscus polynesiensis and low toxicity Gambierdiscus pacificus, from French Polynesia]]> https://www.researchpad.co/article/Nca210627-69b7-4a50-96ce-ecb4ce1a2ae1

Marine dinoflagellates produce a diversity of polyketide toxins that are accumulated in marine food webs and are responsible for a variety of seafood poisonings. Reef-associated dinoflagellates of the genus Gambierdiscus produce toxins responsible for ciguatera poisoning (CP), which causes over 50,000 cases of illness annually worldwide. The biosynthetic machinery for dinoflagellate polyketides remains poorly understood. Recent transcriptomic and genomic sequencing projects have revealed the presence of Type I modular polyketide synthases in dinoflagellates, as well as a plethora of single domain transcripts with Type I sequence homology. The current transcriptome analysis compares polyketide synthase (PKS) gene transcripts expressed in two species of Gambierdiscus from French Polynesia: a highly toxic ciguatoxin producer, G. polynesiensis, versus a non-ciguatoxic species G. pacificus, each assembled from approximately 180 million Illumina 125 nt reads using Trinity, and compares their PKS content with previously published data from other Gambierdiscus species and more distantly related dinoflagellates. Both modular and single-domain PKS transcripts were present. Single domain β-ketoacyl synthase (KS) transcripts were highly amplified in both species (98 in G. polynesiensis, 99 in G. pacificus), with smaller numbers of standalone acyl transferase (AT), ketoacyl reductase (KR), dehydratase (DH), enoyl reductase (ER), and thioesterase (TE) domains. G. polynesiensis expressed both a larger number of multidomain PKSs, and larger numbers of modules per transcript, than the non-ciguatoxic G. pacificus. The largest PKS transcript in G. polynesiensis encoded a 10,516 aa, 7 module protein, predicted to synthesize part of the polyether backbone. Transcripts and gene models representing portions of this PKS are present in other species, suggesting that its function may be performed in those species by multiple interacting proteins. This study contributes to the building consensus that dinoflagellates utilize a combination of Type I modular and single domain PKS proteins, in an as yet undefined manner, to synthesize polyketides.

]]>
<![CDATA[A precedented nuclear genetic code with all three termination codons reassigned as sense codons in the syndinean Amoebophrya sp. ex Karlodinium veneficum]]> https://www.researchpad.co/article/5c818e8fd5eed0c484cc2557

Amoebophrya is part of an enigmatic, diverse, and ubiquitous marine alveolate lineage known almost entirely from anonymous environmental sequencing. Two cultured Amoebophrya strains grown on core dinoflagellate hosts were used for transcriptome sequencing. BLASTx using different genetic codes suggests that Amoebophyra sp. ex Karlodinium veneficum uses the three typical stop codons (UAA, UAG, and UGA) to encode amino acids. When UAA and UAG are translated as glutamine about half of the alignments have better BLASTx scores, and when UGA is translated as tryptophan one fifth have better scores. However, the sole stop codon appears to be UGA based on conserved genes, suggesting contingent translation of UGA. Neither host sequences, nor sequences from the second strain, Amoebophrya sp. ex Akashiwo sanguinea had similar results in BLASTx searches. A genome survey of Amoebophyra sp. ex K. veneficum showed no evidence for transcript editing aside from mitochondrial transcripts. The dynein heavy chain (DHC) gene family was surveyed and of 14 transcripts only two did not use UAA, UAG, or UGA in a coding context. Overall the transcriptome displayed strong bias for A or U in third codon positions, while the tRNA genome survey showed bias against codons ending in U, particularly for amino acids with two codons ending in either C or U. Together these clues suggest contingent translation mechanisms in Amoebophyra sp. ex K. veneficum and a phylogenetically distinct instance of genetic code modification.

]]>
<![CDATA[Nitrogen- and phosphorus-starved Triticum aestivum show distinct belowground microbiome profiles]]> https://www.researchpad.co/article/5c76fe27d5eed0c484e5b5dd

Many plants have natural partnerships with microbes that can boost their nitrogen (N) and/or phosphorus (P) acquisition. To assess whether wheat may have undiscovered associations of these types, we tested if N/P-starved Triticum aestivum show microbiome profiles that are simultaneously different from those of N/P-amended plants and those of their own bulk soils. The bacterial and fungal communities of root, rhizosphere, and bulk soil samples from the Historical Dryland Plots (Lethbridge, Canada), which hold T. aestivum that is grown both under N/P fertilization and in conditions of extreme N/P-starvation, were taxonomically described and compared (bacterial 16S rRNA genes and fungal Internal Transcribed Spacers—ITS). As the list may include novel N- and/or P-providing wheat partners, we then identified all the operational taxonomic units (OTUs) that were proportionally enriched in one or more of the nutrient starvation- and plant-specific communities. These analyses revealed: a) distinct N-starvation root and rhizosphere bacterial communities that were proportionally enriched, among others, in OTUs belonging to families Enterobacteriaceae, Chitinophagaceae, Comamonadaceae, Caulobacteraceae, Cytophagaceae, Streptomycetaceae, b) distinct N-starvation root fungal communities that were proportionally enriched in OTUs belonging to taxa Lulworthia, Sordariomycetes, Apodus, Conocybe, Ascomycota, Crocicreas, c) a distinct P-starvation rhizosphere bacterial community that was proportionally enriched in an OTU belonging to genus Agrobacterium, and d) a distinct P-starvation root fungal community that was proportionally enriched in OTUs belonging to genera Parastagonospora and Phaeosphaeriopsis. Our study might have exposed wheat-microbe connections that can form the basis of novel complementary yield-boosting tools.

]]>
<![CDATA[Secondary contact between diverged host lineages entails ecological speciation in a European hantavirus]]> https://www.researchpad.co/article/5c76fdefd5eed0c484e5b0f1

The diversity of viruses probably exceeds biodiversity of eukaryotes, but little is known about the origin and emergence of novel virus species. Experimentation and disease outbreak investigations have allowed the characterization of rapid molecular virus adaptation. However, the processes leading to the establishment of functionally distinct virus taxa in nature remain obscure. Here, we demonstrate that incipient speciation in a natural host species has generated distinct ecological niches leading to adaptive isolation in an RNA virus. We found a very strong association between the distributions of two major phylogenetic clades in Tula orthohantavirus (TULV) and the rodent host lineages in a natural hybrid zone of the European common vole (Microtus arvalis). The spatial transition between the virus clades in replicated geographic clines is at least eight times narrower than between the hybridizing host lineages. This suggests a strong barrier for effective virus transmission despite frequent dispersal and gene flow among local host populations, and translates to a complete turnover of the adaptive background of TULV within a few hundred meters in the open, unobstructed landscape. Genetic differences between TULV clades are homogenously distributed in the genomes and mostly synonymous (93.1%), except for a cluster of nonsynonymous changes in the 5′ region of the viral envelope glycoprotein gene, potentially involved in host-driven isolation. Evolutionary relationships between TULV clades indicate an emergence of these viruses through rapid differential adaptation to the previously diverged host lineages that resulted in levels of ecological isolation exceeding the progress of speciation in their vertebrate hosts.

]]>
<![CDATA[The heterogeneity of plasma miRNA profiles in hepatocellular carcinoma patients and the exploration of diagnostic circulating miRNAs for hepatocellular carcinoma]]> https://www.researchpad.co/article/5c63397ad5eed0c484ae6867

Heterogeneity is prevalent in cancer both between and within individuals. Although a few studies have identified several circulating microRNAs (miRNAs) for cancer diagnosis, the complete plasma miRNA profile for hepatocellular carcinoma (HCC) remains undefined, and whether the plasma miRNA profiles are heterogeneous is unknown. Here, we obtained individualized plasma miRNA profiles of both healthy subjects and HCC patients via genome-wide deep sequencing. Compared with the highly stable miRNA profile of the healthy subjects, the profile of the HCC patients was highly variable. Seven miRNAs were optimized as potential plasma-based biomarkers for HCC diagnosis. Combined with the clinical data of The Cancer Genome Atlas (TCGA) cohort, three out of the seven miRNAs were correlated with the survival of the HCC patients. To investigate the effect of cancer cells on the plasma miRNAs profile, we compared the most differentially expressed miRNAs between plasma and tissues. Furthermore, miRNAseq data of HCC patients from TCGA were recruited for comparisons. We found that the differences between plasma and tissue were inconsistent, suggesting that other cells in addition to cancer cells also contribute to plasma miRNAs. Using two HCC cancer cell lines, we examined the levels of seven differentially expressed miRNAs. The reverse direction of certain miRNAs alterations between cancer cells and media further confirmed that miRNAs may be selectively pump out by cancer cells.

]]>
<![CDATA[Miniaturization and optimization of 384-well compatible RNA sequencing library preparation]]> https://www.researchpad.co/article/5c40f7c2d5eed0c48438688b

Preparation of high-quality sequencing libraries is a costly and time-consuming component of metagenomic next generation sequencing (mNGS). While the overall cost of sequencing has dropped significantly over recent years, the reagents needed to prepare sequencing samples are likely to become the dominant expense in the process. Furthermore, libraries prepared by hand are subject to human variability and needless waste due to limitations of manual pipetting volumes. Reduction of reaction volumes, combined with sub-microliter automated dispensing of reagents without consumable pipette tips, has the potential to provide significant advantages. Here, we describe the integration of several instruments, including the Labcyte Echo 525 acoustic liquid handler and the iSeq and NovaSeq Illumina sequencing platforms, to miniaturize and automate mNGS library preparation, significantly reducing the cost and the time required to prepare samples. Through the use of External RNA Controls Consortium (ERCC) spike-in RNAs, we demonstrated the fidelity of the miniaturized preparation to be equivalent to full volume reactions. Furthermore, detection of viral and microbial species from cell culture and patient samples was also maintained in the miniaturized libraries. For 384-well mNGS library preparations, we achieved cost savings of over 80% in materials and reagents alone, and reduced preparation time by 90% compared to manual approaches, without compromising quality or representation within the library.

]]>
<![CDATA[De novo transcriptome sequencing and SSR markers development for Cedrela balansae C.DC., a native tree species of northwest Argentina]]> https://www.researchpad.co/article/5c141ea3d5eed0c484d27882

The endangered Cedrela balansae C.DC. (Meliaceae) is a high-value timber species with great potential for forest plantations that inhabits the tropical forests in Northwestern Argentina.Research on this species is scarce because of the limited genetic and genomic information available. Here, we explored the transcriptome of C. balansae using 454 GS FLX Titanium next-generation sequencing (NGS) technology. Following de novo assembling, we identified 27,111 non-redundant unigenes longer than 200 bp, and considered these transcripts for further downstream analysis. The functional annotation was performed searching the 27,111 unigenes against the NR-Protein and the Interproscan databases. This analysis revealed 26,977 genes with homology in at least one of the Database analyzed. Furthermore, 7,774 unigenes in 142 different active biological pathways in C. balansae were identified with the KEGG database. Moreover, after in silico analyses, we detected 2,663 simple sequence repeats (SSRs) markers. A subset of 70 SSRs related to important “stress tolerance” traits based on functional annotation evidence, were selected for wet PCR-validation in C. balansae and other Cedrela species inhabiting in northwest and northeast of Argentina (C. fissilis, C. saltensis and C. angustifolia). Successful transferability was between 77% and 93% and thanks to this study, 32 polymorphic functional SSRs for all analyzed Cedrela species are now available. The gene catalog and molecular markers obtained here represent a starting point for further research, which will assist genetic breeding programs in the Cedrela genus and will contribute to identifying key populations for its preservation.

]]>
<![CDATA[Rapid and highly-specific generation of targeted DNA sequencing libraries enabled by linking capture probes with universal primers]]> https://www.researchpad.co/article/5c117be8d5eed0c48469adc4

Targeted Next Generation Sequencing (NGS) is being adopted increasingly broadly in many research, commercial and clinical settings. Currently used target capture methods, however, typically require complex and lengthy (sometimes multi-day) workflows that complicates their use in certain applications. In addition, small panels for high sequencing depth applications such as liquid biopsy typically have low on-target rates, resulting in unnecessarily high sequencing cost. We have developed a novel targeted sequencing library preparation method, named Linked Target Capture (LTC), which replaces typical multi-day target capture workflows with a single-day, combined ‘target-capture-PCR’ workflow. This approach uses physically linked capture probes and PCR primers and is expected to work with panel sizes from 100 bp to >10 Mbp. It reduces the time and complexity of the capture workflow, eliminates long hybridization and wash steps and enables rapid library construction and target capture. High on-target read fractions are achievable due to repeated sequence selection in the target-capture-PCR step, thus lowering sequencing cost. We have demonstrated this technology on sample types including cell-free DNA (cfDNA) and formalin-fixed, paraffin-embedded (FFPE) derived DNA, capturing a 35-gene pan-cancer panel, and therein detecting single nucleotide variants, copy number variants, insertions, deletions and gene fusions. With the integration of unique molecular identifiers (UMIs), variants as low as 0.25% abundance were detected, limited by input mass and sequencing depth. Additionally, sequencing libraries were prepared in less than eight hours from extracted DNA to loaded sequencer, demonstrating that LTC holds promise as a broadly applicable tool for rapid, cost-effective and high performance targeted sequencing.

]]>
<![CDATA[Population sequencing reveals clonal diversity and ancestral inbreeding in the grapevine cultivar Chardonnay]]> https://www.researchpad.co/article/5bfdb375d5eed0c4845c9a5b

Chardonnay is the basis of some of the world’s most iconic wines and its success is underpinned by a historic program of clonal selection. There are numerous clones of Chardonnay available that exhibit differences in key viticultural and oenological traits that have arisen from the accumulation of somatic mutations during centuries of asexual propagation. However, the genetic variation that underlies these differences remains largely unknown. To address this knowledge gap, a high-quality, diploid-phased Chardonnay genome assembly was produced from single-molecule real time sequencing, and combined with re-sequencing data from 15 different Chardonnay clones. There were 1620 markers identified that distinguish the 15 clones. These markers were reliably used for clonal identification of independently sourced genomic material, as well as in identifying a potential genetic basis for some clonal phenotypic differences. The predicted parentage of the Chardonnay haplomes was elucidated by mapping sequence data from the predicted parents of Chardonnay (Gouais blanc and Pinot noir) against the Chardonnay reference genome. This enabled the detection of instances of heterosis, with differentially-expanded gene families being inherited from the parents of Chardonnay. Most surprisingly however, the patterns of nucleotide variation present in the Chardonnay genome indicate that Pinot noir and Gouais blanc share an extremely high degree of kinship that has resulted in the Chardonnay genome displaying characteristics that are indicative of inbreeding.

]]>
<![CDATA[SNPSelect: A scalable and flexible targeted sequence-based genotyping solution]]> https://www.researchpad.co/article/5bca48fb40307c051665641c

In plant breeding the use of molecular markers has resulted in tremendous improvement of the speed with which new crop varieties are introduced into the market. Single Nucleotide Polymorphism (SNP) genotyping is routinely used for association studies, Linkage Disequilibrium (LD) and Quantitative Trait Locus (QTL) mapping studies, marker-assisted backcrosses and validation of large numbers of novel SNPs. Here we present the KeyGene SNPSelect technology, a scalable and flexible multiplexed, targeted sequence-based, genotyping solution. The multiplex composition of SNPSelect assays can be easily changed between experiments by adding or removing loci, demonstrating their content flexibility. To demonstrate this versatility, we first designed a 1,056-plex maize assay and genotyped a total of 374 samples originating from an F2 and a Recombinant Inbred Line (RIL) population and a maize germplasm collection. Next, subsets of the most informative SNP loci were assembled in 384-plex and 768-plex assays for further genotyping. Indeed, selection of the most informative SNPs allows cost-efficient yet highly informative genotyping in a custom-made fashion, with average call rates between 88.1% (1,056-plex assay) and 99.4% (384-plex assay), and average reproducibility rates between duplicate samples ranging from 98.2% (1056-plex assay) to 99.9% (384-plex assay). The SNPSelect workflow can be completed from a DNA sample to a genotype dataset in less than three days. We propose SNPSelect as an attractive and competitive genotyping solution to meet the targeted genotyping needs in fields such as plant breeding.

]]>
<![CDATA[Developmental Transcriptomic Features of the Carcinogenic Liver Fluke, Clonorchis sinensis]]> https://www.researchpad.co/article/5989d9f0ab0ee8fa60b6e20a

Clonorchis sinensis is the causative agent of the life-threatening disease endemic to China, Korea, and Vietnam. It is estimated that about 15 million people are infected with this fluke. C. sinensis provokes inflammation, epithelial hyperplasia, and periductal fibrosis in bile ducts, and may cause cholangiocarcinoma in chronically infected individuals. Accumulation of a large amount of biological information about the adult stage of this liver fluke in recent years has advanced our understanding of the pathological interplay between this parasite and its hosts. However, no developmental gene expression profiles of C. sinensis have been published. In this study, we generated gene expression profiles of three developmental stages of C. sinensis by analyzing expressed sequence tags (ESTs). Complementary DNA libraries were constructed from the adult, metacercaria, and egg developmental stages of C. sinensis. A total of 52,745 ESTs were generated and assembled into 12,830 C. sinensis assembled EST sequences, and then these assemblies were further categorized into groups according to biological functions and developmental stages. Most of the genes that were differentially expressed in the different stages were consistent with the biological and physical features of the particular developmental stage; high energy metabolism, motility and reproduction genes were differentially expressed in adults, minimal metabolism and final host adaptation genes were differentially expressed in metacercariae, and embryonic genes were differentially expressed in eggs. The higher expression of glucose transporters, proteases, and antioxidant enzymes in the adults accounts for active uptake of nutrients and defense against host immune attacks. The types of ion channels present in C. sinensis are consistent with its parasitic nature and phylogenetic placement in the tree of life. We anticipate that the transcriptomic information on essential regulators of development, bile chemotaxis, and physico-metabolic pathways in C. sinensis that presented in this study will guide further studies to identify novel drug targets and diagnostic antigens.

]]>
<![CDATA[Genome Analysis of Bacillus amyloliquefaciens Subsp. plantarum UCMB5113: A Rhizobacterium That Improves Plant Growth and Stress Management]]> https://www.researchpad.co/article/5989db00ab0ee8fa60bc678f

The Bacillus amyloliquefaciens subsp. plantarum strain UCMB5113 is a Gram-positive rhizobacterium that can colonize plant roots and stimulate plant growth and defense based on unknown mechanisms. This reinforcement of plants may provide protection to various forms of biotic and abiotic stress. To determine the genetic traits involved in the mechanism of plant-bacteria association, the genome sequence of UCMB5113 was obtained by assembling paired-end Illumina reads. The assembled chromosome of 3,889,532 bp was predicted to encode 3,656 proteins. Genes that potentially contribute to plant growth promotion such as indole-3-acetic acid (IAA) biosynthesis, acetoin synthesis and siderophore production were identified. Moreover, annotation identified putative genes responsible for non-ribosomal synthesis of secondary metabolites and genes supporting environment fitness of UCMB5113 including drug and metal resistance. A large number of genes encoding a diverse set of secretory proteins, enzymes of primary and secondary metabolism and carbohydrate active enzymes were found which reflect a high capacity to degrade various rhizosphere macromolecules. Additionally, many predicted membrane transporters provides the bacterium with efficient uptake capabilities of several nutrients. Although, UCMB5113 has the possibility to produce antibiotics and biosurfactants, the protective effect of plants to pathogens seems to be indirect and due to priming of plant induced systemic resistance. The availability of the genome enables identification of genes and their function underpinning beneficial interactions of UCMB5113 with plants.

]]>
<![CDATA[An Improved Protocol for Sequencing of Repetitive Genomic Regions and Structural Variations Using Mutagenesis and Next Generation Sequencing]]> https://www.researchpad.co/article/5989db0cab0ee8fa60bca851

The rise of Next Generation Sequencing (NGS) technologies has transformed de novo genome sequencing into an accessible research tool, but obtaining high quality eukaryotic genome assemblies remains a challenge, mostly due to the abundance of repetitive elements. These also make it difficult to study nucleotide polymorphism in repetitive regions, including certain types of structural variations. One solution proposed for resolving such regions is Sequence Assembly aided by Mutagenesis (SAM), which relies on the fact that introducing enough random mutations breaks the repetitive structure, making assembly possible. Sequencing many different mutated copies permits the sequence of the repetitive region to be inferred by consensus methods. However, this approach relies on molecular cloning in order to isolate and amplify individual mutant copies, making it hard to scale-up the approach for use in conjunction with high-throughput sequencing technologies. To address this problem, we propose NG-SAM, a modified version of the SAM protocol that relies on PCR and dilution steps only, coupled to a NGS workflow. NG-SAM therefore has the potential to be scaled-up, e.g. using emerging microfluidics technologies. We built a realistic simulation pipeline to study the feasibility of NG-SAM, and our results suggest that under appropriate experimental conditions the approach might be successfully put into practice. Moreover, our simulations suggest that NG-SAM is capable of reconstructing robustly a wide range of potential target sequences of varying lengths and repetitive structures.

]]>
<![CDATA[qpure: A Tool to Estimate Tumor Cellularity from Genome-Wide Single-Nucleotide Polymorphism Profiles]]> https://www.researchpad.co/article/5989db4bab0ee8fa60bda2c6

Tumour cellularity, the relative proportion of tumour and normal cells in a sample, affects the sensitivity of mutation detection, copy number analysis, cancer gene expression and methylation profiling. Tumour cellularity is traditionally estimated by pathological review of sectioned specimens; however this method is both subjective and prone to error due to heterogeneity within lesions and cellularity differences between the sample viewed during pathological review and tissue used for research purposes. In this paper we describe a statistical model to estimate tumour cellularity from SNP array profiles of paired tumour and normal samples using shifts in SNP allele frequency at regions of loss of heterozygosity (LOH) in the tumour. We also provide qpure, a software implementation of the method. Our experiments showed that there is a medium correlation 0.42 (-value = 0.0001) between tumor cellularity estimated by qpure and pathology review. Interestingly there is a high correlation 0.87 (-value 2.2e-16) between cellularity estimates by qpure and deep Ion Torrent sequencing of known somatic KRAS mutations; and a weaker correlation 0.32 (-value = 0.004) between IonTorrent sequencing and pathology review. This suggests that qpure may be a more accurate predictor of tumour cellularity than pathology review. qpure can be downloaded from https://sourceforge.net/projects/qpure/.

]]>
<![CDATA[Linkage Mapping and Comparative Genomics Using Next-Generation RAD Sequencing of a Non-Model Organism]]> https://www.researchpad.co/article/5989da0dab0ee8fa60b785f6

Restriction-site associated DNA (RAD) sequencing is a powerful new method for targeted sequencing across the genomes of many individuals. This approach has broad potential for genetic analysis of non-model organisms including genotype-phenotype association mapping, phylogeography, population genetics and scaffolding genome assemblies through linkage mapping. We constructed a RAD library using genomic DNA from a Plutella xylostella (diamondback moth) backcross that segregated for resistance to the insecticide spinosad. Sequencing of 24 individuals was performed on a single Illumina GAIIx lane (51 base paired-end reads). Taking advantage of the lack of crossing over in homologous chromosomes in female Lepidoptera, 3,177 maternally inherited RAD alleles were assigned to the 31 chromosomes, enabling identification of the spinosad resistance and W/Z sex chromosomes. Paired-end reads for each RAD allele were assembled into contigs and compared to the genome of Bombyx mori (n = 28) using BLAST, revealing 28 homologous matches plus 3 expected fusion/breakage events which account for the difference in chromosome number. A genome-wide linkage map (1292 cM) was inferred with 2,878 segregating RAD alleles inherited from the backcross father, producing chromosome and location specific sequenced RAD markers. Here we have used RAD sequencing to construct a genetic linkage map de novo for an organism that has no previous genome data. Comparative analysis of P. xyloxtella linkage groups with B. mori chromosomes shows for the first time, genetic synteny appears common beyond the Macrolepidoptera. RAD sequencing is a powerful system capable of rapidly generating chromosome specific data for non-model organisms.

]]>
<![CDATA[An Improved Method for Including Upper Size Range Plasmids in Metamobilomes]]> https://www.researchpad.co/article/5989db46ab0ee8fa60bd8830

Two recently developed isolation methods have shown promise when recovering pure community plasmid DNA (metamobilomes/plasmidomes), which is useful in conducting culture-independent investigations into plasmid ecology. However, both methods employ multiple displacement amplification (MDA) to ensure suitable quantities of plasmid DNA for high-throughput sequencing. This study demonstrates that MDA greatly favors smaller circular DNA elements (<10 Kbp), which, in turn, leads to stark underrepresentation of upper size range plasmids (>10 Kbp). Throughout the study, we used two model plasmids, a 4.4 Kbp cloning vector (pBR322), and a 56 Kbp conjugative plasmid (pKJK10), to represent lower- and upper plasmid size ranges, respectively. Subjecting a mixture of these plasmids to the overall isolation protocol revealed a 34-fold over-amplification of pBR322 after MDA. To address this bias, we propose the addition of an electroelution step that separates different plasmid size ranges prior to MDA in order to reduce size-dependent competition during incubation. Subsequent analyses of metamobilome data from wastewater spiked with the model plasmids showed in silica recovery of pKJK10 to be very poor with the established method and a 1,300-fold overrepresentation of pBR322. Conversely, complete recovery of pKJK10 was enabled with the new modified protocol although considerable care must be taken during electroelution to minimize cross-contamination between samples. For further validation, non-spiked wastewater metamobilomes were mapped to more than 2,500 known plasmid genomes. This displayed an overall recovery of plasmids well into the upper size range (median size: 30 kilobases) with the modified protocol. Analysis of de novo assembled metamobilome data also suggested distinctly better recovery of larger plasmids, as gene functions associated with these plasmids, such as conjugation, was exclusively encoded in the data output generated through the modified protocol. Thus, with the suggested modification, access to a large uncharacterized pool of accessory elements that reside on medium-to-large plasmids has been improved.

]]>
<![CDATA[Development of microsatellite markers and assembly of the plastid genome in Cistanthe longiscapa (Montiaceae) based on low-coverage whole genome sequencing]]> https://www.researchpad.co/article/5989db5cab0ee8fa60be0299

Cistanthe longiscapa is an endemic annual herb and characteristic element of the Chilean Atacama Desert. Principal threats are the destruction of its seed deposits by human activities and reduced germination rates due to the decreasing occurrence of precipitation events. To enable population genetic and phylogeographic analyses in this species we performed paired-end shotgun sequencing (2x100 bp) of genomic DNA on the Illumina HiSeq platform and identified microsatellite (SSR) loci in the resulting sequences. From 29 million quality-filtered read pairs we obtained 549,174 contigs (average length 614 bp; N50 = 904). Searching for SSRs revealed 10,336 loci with microsatellite motifs. Initially, we designed primers for 96 loci, which were tested for PCR amplification on three C. longiscapa individuals. Successfully amplifying loci were further tested on eight individuals to screen for length variation in the resulting amplicons, and the alleles were exemplarily sequenced to infer the basis for the observed length variation. Finally we arrived at 26 validated SSR loci for population studies in C. longiscapa, which resulted in 146 bi-allelic SSR markers in our test sample of eight individuals. The genomic sequences were also used to assemble the plastid genome of C. longiscapa, which provides an additional set of maternally inherited genetic markers.

]]>