ResearchPad - dna-library-construction https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Comparative analysis of plastid genomes within the Campanulaceae and phylogenetic implications]]> https://www.researchpad.co/article/elastic_article_14639 The conflicts exist between the phylogeny of Campanulaceae based on nuclear ITS sequence and plastid markers, particularly in the subdivision of Cyanantheae (Campanulaceae). Besides, various and complicated plastid genome structures can be found in species of the Campanulaceae. However, limited availability of genomic information largely hinders the studies of molecular evolution and phylogeny of Campanulaceae. We reported the complete plastid genomes of three Cyanantheae species, compared them to eight published Campanulaceae plastomes, and shed light on a deeper understanding of the applicability of plastomes. We found that there were obvious differences among gene order, GC content, gene compositions and IR junctions of LSC/IRa. Almost all protein-coding genes and amino acid sequences showed obvious codon preferences. We identified 14 genes with highly positively selected sites and branch-site model displayed 96 sites under potentially positive selection on the three lineages of phylogenetic tree. Phylogenetic analyses showed that Cyananthus was more closely related to Codonopsis compared with Cyclocodon and also clearly illustrated the relationship among the Cyanantheae species. We also found six coding regions having high nucleotide divergence value. Hotpot regions were considered to be useful molecular markers for resolving phylogenetic relationships and species authentication of Campanulaceae.

]]>
<![CDATA[Discovery of Leptospira spp. seroreactive peptides using ORFeome phage display]]> https://www.researchpad.co/article/5c536b32d5eed0c484a482f9

Background

Leptospirosis is the most common zoonotic disease worldwide. The diagnostic performance of a serological test for human leptospirosis is mainly influenced by the antigen used in the test assay. An ideal serological test should cover all serovars of pathogenic leptospires with high sensitivity and specificity and use reagents that are relatively inexpensive to produce and can be used in tropical climates. Peptide-based tests fulfil at least the latter two requirements, and ORFeome phage display has been successfully used to identify immunogenic peptides from other pathogens.

Methodology/Principal findings

Two ORFeome phage display libraries of the entire Leptospira spp. genomes from five local strains isolated in Malaysia and seven WHO reference strains were constructed. Subsequently, 18 unique Leptospira peptides were identified in a screen using a pool of sera from patients with acute leptospirosis. Five of these were validated by titration ELISA using different pools of patient or control sera. The diagnostic performance of these five peptides was then assessed against 16 individual sera from patients with acute leptospirosis and 16 healthy donors and was compared to that of two recombinant reference proteins from L. interrogans. This analysis revealed two peptides (SIR16-D1 and SIR16-H1) from the local isolates with good accuracy for the detection of acute leptospirosis (area under the ROC curve: 0.86 and 0.78, respectively; sensitivity: 0.88 and 0.94; specificity: 0.81 and 0.69), which was close to that of the reference proteins LipL32 and Loa22 (area under the ROC curve: 0.91 and 0.80; sensitivity: 0.94 and 0.81; specificity: 0.75 and 0.75).

Conclusions/Significance

This analysis lends further support for using ORFeome phage display to identify pathogen-associated immunogenic peptides, and it suggests that this technique holds promise for the development of peptide-based diagnostics for leptospirosis and, possibly, of vaccines against this pathogen.

]]>
<![CDATA[Functional screening for triclosan resistance in a wastewater metagenome and isolates of Escherichia coli and Enterococcus spp. from a large Canadian healthcare region]]> https://www.researchpad.co/article/5c536bccd5eed0c484a49249

The biocide triclosan is in many consumer products and is a frequent contaminant of wastewater (WW) such that there is concern that triclosan promotes resistance to important antibiotics. This study identified functional mechanisms of triclosan resistance (TCSR) in WW metagenomes, and assessed the frequency of TCSR in WW-derived and clinical isolates of Escherichia coli and Enterococcus spp. Metagenomic DNA extracted from WW was used to profile the microbiome and construct large-insert cosmid libraries, which were screened for TCSR. Resistant cosmids were sequenced and the TCSR determinant identified by transposon mutagenesis. Wastewater Enterococcus spp. (N = 94) and E. coli (N = 99) and clinical Enterococcus spp. (N = 146) and vancomycin-resistant E. faecium (VRE; N = 149) were collected and tested for resistance to triclosan and a comprehensive drug panel. Functional metagenomic screening revealed diverse FabV homologs as major WW TCSR determinants. Resistant clones harboured sequences likely originating from Aeromonas spp., a common WW microbe. The triclosan MIC90s for E. coli, E. faecalis, and E. faecium isolates were 0.125, 32, and 32 mg/L, respectively. For E. coli, there was no correlation between the triclosan MIC and any drug tested. Negative correlations were detected between the triclosan MIC and levofloxacin resistance for E. faecalis, and between triclosan and vancomycin, teicoplanin, and ampicillin resistance for E. faecium. Thus, FabV homologs were the major contributor to the WW triclosan resistome and high-level TCSR was not observed in WW or clinical isolates. Elevated triclosan MICs were not positively correlated with antimicrobial resistance to any drug tested.

]]>
<![CDATA[Imputation accuracy of wheat genotyping-by-sequencing (GBS) data using barley and wheat genome references]]> https://www.researchpad.co/article/5c3d0126d5eed0c484038b91

Genotyping-by-sequencing (GBS) provides high SNP coverage and has recently emerged as a popular technology for genetic and breeding applications in bread wheat (Triticum aestivum L.) and many other plant species. Although GBS can discover millions of SNPs, a high rate of missing data is a major concern for many applications. Accurate imputation of those missing data can significantly improve the utility of GBS data. This study compared imputation accuracies among four genome references including three wheat references (Chinese Spring survey sequence, W7984, and IWGSC RefSeq v1.0) and one barley reference genome by comparing imputed data derived from low-depth sequencing to actual data from high-depth sequencing. After imputation, the average number of imputed data points was the highest in the B genome (~48.99%). The D genome had the lowest imputed data points (~15.02%) but the highest imputation accuracy. Among the four reference genomes, IWGSC RefSeq v1.0 reference provided the most imputed data points, but the lowest imputation accuracy for the SNPs with < 10% minor allele frequency (MAF). The W7984 reference, however, provided the highest imputation accuracy for the SNPs with < 10% MAF.

]]>
<![CDATA[Whole genome sequencing of Moraxella bovoculi reveals high genetic diversity and evidence for interspecies recombination at multiple loci]]> https://www.researchpad.co/article/5c2151b8d5eed0c4843fb90f

Moraxella bovoculi is frequently cultured from the ocular secretions and conjunctiva of cattle with Infectious Bovine Keratoconjunctivitis (IBK). Previous work has shown that single nucleotide polymorphism (SNP) diversity in this species is quite high with 81,284 SNPs identified in eight genomes representing two distinct genotypes isolated from IBK affected eyes (genotype 1) and the nasopharynx of cattle without clinical IBK signs (genotype 2), respectively. The goals of this study were to identify SNPs from a collection of geographically diverse and epidemiologically unlinked M. bovoculi strains from the eyes of IBK positive cattle (n = 183) and another from the eyes of cattle (most from a single population at a single time-point) without signs of IBK (n = 63) and to characterize the genetic diversity. Strains of both genotypes were identified from the eyes of cattle without IBK signs. Only genotype 1 strains were identified from IBK affected eyes, however, these strains were isolated before the discovery of genotype 2, and the protocol for their isolation would have preferentially selected genotype 1 M. bovoculi. The core genome comprised ~74% of the whole and contained >127,000 filtered SNPs. More than 80% of these characterize diversity within genotype 1 while 23,611 SNPs (~18%) delimit the two major genotypes. Genotype 2 strains lacked a repeats-in-toxin (RTX) putative pathogenesis factor and any of ten putative antibiotic resistance genes carried within a genomic island. Within genotype 1, prevalence of these elements was 0.85 and 0.12 respectively in strains from eyes that were IBK positive. Recombination appears to be an important source of genetic diversity for genotype 1 and undermines the utility of ribosomal-locus-based species identification. The extremely high genetic diversity in genotype 1 presents a challenge to the development of an efficacious vaccine directed against them, however, several low-diversity pilin-like genes were identified. Finally, the genotype-defining SNPs described in this study are a resource that can facilitate the development of more accurate M. bovoculi diagnostic tests.

]]>
<![CDATA[Human plague associated with Tibetan sheep originates in marmots]]> https://www.researchpad.co/article/5b8b29ea40307c405292ca55

The Qinghai-Tibet plateau is a natural plague focus and is the largest such focus in China. In this area, while Marmota himalayana is the primary host, a total of 18 human plague outbreaks associated with Tibetan sheep (78 cases with 47 deaths) have been reported on the Qinghai-Tibet plateau since 1956. All of the index infectious cases had an exposure history of slaughtering or skinning diseased or dead Tibetan sheep. In this study, we sequenced and compared 38 strains of Yersinia pestis isolated from different hosts, including humans, Tibetan sheep, and M. himalayana. Phylogenetic relationships were reconstructed based on genome-wide single-nucleotide polymorphisms identified from our isolates and reference strains. The phylogenetic relationships illustrated in our study, together with the finding that the Tibetan sheep plague clearly lagged behind the M. himalayana plague, and a previous study that identified the Tibetan sheep as a plague reservoir with high susceptibility and moderate sensitivity, indicated that the human plague was transmitted from Tibetan sheep, while the Tibetan sheep plague originated from marmots. Tibetan sheep may encounter this infection by contact with dead rodents or through being bitten by fleas originating from M. himalayana during local epizootics.

]]>
<![CDATA[Optimized double-digest genotyping by sequencing (ddGBS) method with high-density SNP markers and high genotyping accuracy for chickens]]> https://www.researchpad.co/article/5989db5dab0ee8fa60be04f9

High-density single nucleotide polymorphism (SNP) markers are crucial to improve the resolution and accuracy of genome-wide association study (GWAS) and genomic selection (GS). Numerous approaches, including whole genome sequencing, genome sampling sequencing, and SNP chips are able to discover or genotype markers at different densities and costs. Achieving an optimal balance between sequencing resolution and budgets, especially in large-scale population genetics research, constitutes a major challenge. Here, we performed improved double-enzyme digestion genotyping by sequencing (ddGBS) on chicken. We evaluated eight double-enzyme digestion combinations, and EcoR I- Mse I was chosen as the optimal combination for the chicken genome. We firstly proposed that two parameters, optimal read-count point (ORP) and saturated read-count point (SRP), could be utilized to determine the optimal sequencing volume. A total of 291,772 high-density SNPs from 824 animals were identified. By validation using the SNP chip, we found that the consistency between ddGBS data and the SNP chip is over 99%. The approach that we developed in chickens, which is high-quality, high-density, cost-effective (300 K, $30/sample), and time-saving (within 48 h), will have broad applications in animal breeding programs.

]]>
<![CDATA[Step-Wise Increase in Tigecycline Resistance in Klebsiella pneumoniae Associated with Mutations in ramR, lon and rpsJ]]> https://www.researchpad.co/article/5989d9e8ab0ee8fa60b6bf4d

Klebsiella pneumoniae is a gram-negative bacterium that causes numerous diseases, including pneumonia and urinary tract infections. An increase in multidrug resistance has complicated the treatment of these bacterial infections, and although tigecycline shows activity against a broad spectrum of bacteria, resistant strains have emerged. In this study, the whole genomes of two clinical and six laboratory-evolved strains were sequenced to identify putative mutations related to tigecycline resistance. Of seven tigecycline-resistant strains, seven (100%) had ramR mutations, five (71.4%) had lon mutations, one (14.2%) had a ramA mutation, and one (14.2%) had an rpsJ mutation. A higher fitness cost was observed in the laboratory-evolved strains but not in the clinical strains. A transcriptome analysis demonstrated high expression of the ramR operon and acrA in all tigecycline-resistant strains. Genes involved in nitrogen metabolism were induced in the laboratory-evolved strains compared with the wild-type and clinical strains, and this difference in nitrogen metabolism reflected the variation between the laboratory-evolved and the clinical strains. Complementation experiments showed that both the wild-type ramR and the lon genes could partially restore the tigecycline sensitivity of K. pneumoniae. We believe that this manuscript describes the first construct of a lon mutant in K. pneumoniae, which allowed confirmation of its association with tigecycline resistance. Our findings illustrate the importance of the ramR operon and the lon and rpsJ genes in K. pneumoniae resistance to tigecycline.

]]>
<![CDATA[Genomic Characteristics of Chinese Borrelia burgdorferi Isolates]]> https://www.researchpad.co/article/5989daa7ab0ee8fa60ba7fca

In China, B. burgdorferi, B.garinii, B. afzelii and B. yangtze sp. nov have been reported; B.garinii and B. afzelii are the main pathogenic genotypes. But until now only one Chinese strain was reported with whole genome sequence. In order to further understand the genomic characteristics and diversity of Chinese Borrelia strains, 5 isolates from China were sequenced and compared with the whole genome sequences of strains in other areas. The results showed a high degree of conservation within the linear chromosome of Chinese strains, whereas plasmid showed a much larger diversity according to the majority genomic information of plasmids. The genome sequences of the five Chinese strains were compared with the corresponding reference strains, respectively, according to the genospecies. Pairwise analysis demonstrates that there are only 70 SNPs between the genomes of CS4 and B31. However, there are many more SNPs between the genomes of QX-S13 and VS116, PD91 and PBi, FP1 and PKo, R9 and Pko, respectively. Gene comparison showed some important different genes. OspA was one of the important different genes. Comparative genomic studies have found that OspA gene sequences of PD91 and R9 had great differences compared with the sequence of B31. OspA gene sequence of R9 had a 96bp deletion; OspA gene of PD91 had two deletions: 9bp and 10 bp. To conclude, we showed the genomic characteristics of four genotype Chinese B. burgdorferi strains. The genomic sequence of B. yangtze sp. nov and differences from B. valaisiana were first reported. Comparative analysis of Chinese strains with the different Borrelia species from other areas will help us to understand evolution and pathogenesis of Chinese Borrelia burgdorferi strains.

]]>
<![CDATA[Development of microsatellite markers and assembly of the plastid genome in Cistanthe longiscapa (Montiaceae) based on low-coverage whole genome sequencing]]> https://www.researchpad.co/article/5989db5cab0ee8fa60be0299

Cistanthe longiscapa is an endemic annual herb and characteristic element of the Chilean Atacama Desert. Principal threats are the destruction of its seed deposits by human activities and reduced germination rates due to the decreasing occurrence of precipitation events. To enable population genetic and phylogeographic analyses in this species we performed paired-end shotgun sequencing (2x100 bp) of genomic DNA on the Illumina HiSeq platform and identified microsatellite (SSR) loci in the resulting sequences. From 29 million quality-filtered read pairs we obtained 549,174 contigs (average length 614 bp; N50 = 904). Searching for SSRs revealed 10,336 loci with microsatellite motifs. Initially, we designed primers for 96 loci, which were tested for PCR amplification on three C. longiscapa individuals. Successfully amplifying loci were further tested on eight individuals to screen for length variation in the resulting amplicons, and the alleles were exemplarily sequenced to infer the basis for the observed length variation. Finally we arrived at 26 validated SSR loci for population studies in C. longiscapa, which resulted in 146 bi-allelic SSR markers in our test sample of eight individuals. The genomic sequences were also used to assemble the plastid genome of C. longiscapa, which provides an additional set of maternally inherited genetic markers.

]]>
<![CDATA[Comparison of Sample Preparation Methods Used for the Next-Generation Sequencing of Mycobacterium tuberculosis]]> https://www.researchpad.co/article/5989da2fab0ee8fa60b83c9f

The advent and widespread application of next-generation sequencing (NGS) technologies to the study of microbial genomes has led to a substantial increase in the number of studies in which whole genome sequencing (WGS) is applied to the analysis of microbial genomic epidemiology. However, microorganisms such as Mycobacterium tuberculosis (MTB) present unique problems for sequencing and downstream analysis based on their unique physiology and the composition of their genomes. In this study, we compare the quality of sequence data generated using the Nextera and TruSeq isolate preparation kits for library construction prior to Illumina sequencing-by-synthesis. Our results confirm that MTB NGS data quality is highly dependent on the purity of the DNA sample submitted for sequencing and its guanine-cytosine content (or GC-content). Our data additionally demonstrate that the choice of library preparation method plays an important role in mitigating downstream sequencing quality issues. Importantly for MTB, the Illumina TruSeq library preparation kit produces more uniform data quality than the Nextera XT method, regardless of the quality of the input DNA. Furthermore, specific genomic sequence motifs are commonly missed by the Nextera XT method, as are regions of especially high GC-content relative to the rest of the MTB genome. As coverage bias is highly undesirable, this study illustrates the importance of appropriate protocol selection when performing NGS studies in order to ensure that sound inferences can be made regarding mycobacterial genomes.

]]>
<![CDATA[Signatures of Crested Ibis MHC Revealed by Recombination Screening and Short-Reads Assembly Strategy]]> https://www.researchpad.co/article/5989dafeab0ee8fa60bc5bbc

Whole-genome shotgun (WGS) sequencing has become a routine method in genome research over the past decade. However, the assembly of highly polymorphic regions in WGS projects remains a challenge, especially for large genomes. Employing BAC library constructing, PCR screening and Sanger sequencing, traditional strategy is laborious and expensive, which hampers research on polymorphic genomic regions. As one of the most highly polymorphic regions, the major histocompatibility complex (MHC) plays a central role in the adaptive immunity of all jawed vertebrates. In this study, we introduced an efficient procedure based on recombination screening and short-reads assembly. With this procedure, we constructed a high quality 488-kb region of crested ibis MHC that consists of 3 superscaffolds and contains 50 genes. Our sequence showed comparable quality (97.29% identity) to traditional Sanger assembly, while the workload was reduced almost 7 times. Comparative study revealed distinctive features of crested ibis by exhibiting the COL11A2-BLA-BLB-BRD2 cluster and presenting both ADPRH and odorant receptor (OR) gene in the MHC region. Furthermore, the conservation of the BF-TAP1-TAP2 structure in crested ibis and other vertebrate lineages is interesting in light of the hypothesis that coevolution of functionally related genes in the primordial MHC is responsible for the appearance of the antigen presentation pathways at the birth of the adaptive immune system.

]]>
<![CDATA[A Phylogenomic Approach Based on PCR Target Enrichment and High Throughput Sequencing: Resolving the Diversity within the South American Species of Bartsia L. (Orobanchaceae)]]> https://www.researchpad.co/article/5989da94ab0ee8fa60ba15b0

Advances in high-throughput sequencing (HTS) have allowed researchers to obtain large amounts of biological sequence information at speeds and costs unimaginable only a decade ago. Phylogenetics, and the study of evolution in general, is quickly migrating towards using HTS to generate larger and more complex molecular datasets. In this paper, we present a method that utilizes microfluidic PCR and HTS to generate large amounts of sequence data suitable for phylogenetic analyses. The approach uses the Fluidigm Access Array System (Fluidigm, San Francisco, CA, USA) and two sets of PCR primers to simultaneously amplify 48 target regions across 48 samples, incorporating sample-specific barcodes and HTS adapters (2,304 unique amplicons per Access Array). The final product is a pooled set of amplicons ready to be sequenced, and thus, there is no need to construct separate, costly genomic libraries for each sample. Further, we present a bioinformatics pipeline to process the raw HTS reads to either generate consensus sequences (with or without ambiguities) for every locus in every sample or—more importantly—recover the separate alleles from heterozygous target regions in each sample. This is important because it adds allelic information that is well suited for coalescent-based phylogenetic analyses that are becoming very common in conservation and evolutionary biology. To test our approach and bioinformatics pipeline, we sequenced 576 samples across 96 target regions belonging to the South American clade of the genus Bartsia L. in the plant family Orobanchaceae. After sequencing cleanup and alignment, the experiment resulted in ~25,300bp across 486 samples for a set of 48 primer pairs targeting the plastome, and ~13,500bp for 363 samples for a set of primers targeting regions in the nuclear genome. Finally, we constructed a combined concatenated matrix from all 96 primer combinations, resulting in a combined aligned length of ~40,500bp for 349 samples.

]]>
<![CDATA[Genome Survey Sequencing for the Characterization of the Genetic Background of Rosa roxburghii Tratt and Leaf Ascorbate Metabolism Genes]]> https://www.researchpad.co/article/5989da83ab0ee8fa60b9b41c

Rosa roxburghii Tratt is an important commercial horticultural crop in China that is recognized for its nutritional and medicinal values. In spite of the economic significance, genomic information on this rose species is currently unavailable. In the present research, a genome survey of R. roxburghii was carried out using next-generation sequencing (NGS) technologies. Total 30.29 Gb sequence data was obtained by HiSeq 2500 sequencing and an estimated genome size of R. roxburghii was 480.97 Mb, in which the guanine plus cytosine (GC) content was calculated to be 38.63%. All of these reads were technically assembled and a total of 627,554 contigs with a N50 length of 1.484 kb and furthermore 335,902 scaffolds with a total length of 409.36 Mb were obtained. Transposable elements (TE) sequence of 90.84 Mb which comprised 29.20% of the genome, and 167,859 simple sequence repeats (SSRs) were identified from the scaffolds. Among these, the mono-(66.30%), di-(25.67%), and tri-(6.64%) nucleotide repeats contributed to nearly 99% of the SSRs, and sequence motifs AG/CT (28.81%) and GAA/TTC (14.76%) were the most abundant among the dinucleotide and trinucleotide repeat motifs, respectively. Genome analysis predicted a total of 22,721 genes which have an average length of 2311.52 bp, an average exon length of 228.15 bp, and average intron length of 401.18 bp. Eleven genes putatively involved in ascorbate metabolism were identified and its expression in R. roxburghii leaves was validated by quantitative real-time PCR (qRT-PCR). This is the first report of genome-wide characterization of this rose species.

]]>
<![CDATA[Automated high throughput nucleic acid purification from formalin-fixed paraffin-embedded tissue samples for next generation sequence analysis]]> https://www.researchpad.co/article/5989db5cab0ee8fa60be01fe

Curation and storage of formalin-fixed, paraffin-embedded (FFPE) samples are standard procedures in hospital pathology laboratories around the world. Many thousands of such samples exist and could be used for next generation sequencing analysis. Retrospective analyses of such samples are important for identifying molecular correlates of carcinogenesis, treatment history and disease outcomes. Two major hurdles in using FFPE material for sequencing are the damaged nature of the nucleic acids and the labor-intensive nature of nucleic acid purification. These limitations and a number of other issues that span multiple steps from nucleic acid purification to library construction are addressed here. We optimized and automated a 96-well magnetic bead-based extraction protocol that can be scaled to large cohorts and is compatible with automation. Using sets of 32 and 91 individual FFPE samples respectively, we generated libraries from 100 ng of total RNA and DNA starting amounts with 95–100% success rate. The use of the resulting RNA in micro-RNA sequencing was also demonstrated. In addition to offering the potential of scalability and rapid throughput, the yield obtained with lower input requirements makes these methods applicable to clinical samples where tissue abundance is limiting.

]]>
<![CDATA[Whole Genome Mapping with Feature Sets from High-Throughput Sequencing Data]]> https://www.researchpad.co/article/5989dab6ab0ee8fa60baccdd

A good physical map is essential to guide sequence assembly in de novo whole genome sequencing, especially when sequences are produced by high-throughput sequencing such as next-generation-sequencing (NGS) technology. We here present a novel method, Feature sets-based Genome Mapping (FGM). With FGM, physical map and draft whole genome sequences can be generated, anchored and integrated using the same data set of NGS sequences, independent of restriction digestion. Method model was created and parameters were inspected by simulations using the Arabidopsis genome sequence. In the simulations, when ~4.8X genome BAC library including 4,096 clones was used to sequence the whole genome, ~90% of clones were successfully connected to physical contigs, and 91.58% of genome sequences were mapped and connected to chromosomes. This method was experimentally verified using the existing physical map and genome sequence of rice. Of 4,064 clones covering 115 Mb sequence selected from ~3 tiles of 3 chromosomes of a rice draft physical map, 3,364 clones were reconstructed into physical contigs and 98 Mb sequences were integrated into the 3 chromosomes. The physical map-integrated draft genome sequences can provide permanent frameworks for eventually obtaining high-quality reference sequences by targeted sequencing, gap filling and combining other sequences.

]]>
<![CDATA[Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data]]> https://www.researchpad.co/article/5989daf0ab0ee8fa60bc0f89

Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths.

]]>
<![CDATA[The Ebola virus VP35 protein binds viral immunostimulatory and host RNAs identified through deep sequencing]]> https://www.researchpad.co/article/5989db5fab0ee8fa60be1190

Ebola virus and Marburg virus are members of the Filovirdae family and causative agents of hemorrhagic fever with high fatality rates in humans. Filovirus virulence is partially attributed to the VP35 protein, a well-characterized inhibitor of the RIG-I-like receptor pathway that triggers the antiviral interferon (IFN) response. Prior work demonstrates the ability of VP35 to block potent RIG-I activators, such as Sendai virus (SeV), and this IFN-antagonist activity is directly correlated with its ability to bind RNA. Several structural studies demonstrate that VP35 binds short synthetic dsRNAs; yet, there are no data that identify viral immunostimulatory RNAs (isRNA) or host RNAs bound to VP35 in cells. Utilizing a SeV infection model, we demonstrate that both viral isRNA and host RNAs are bound to Ebola and Marburg VP35s in cells. By deep sequencing the purified VP35-bound RNA, we identified the SeV copy-back defective interfering (DI) RNA, previously identified as a robust RIG-I activator, as the isRNA bound by multiple filovirus VP35 proteins, including the VP35 protein from the West African outbreak strain (Makona EBOV). Moreover, RNAs isolated from a VP35 RNA-binding mutant were not immunostimulatory and did not include the SeV DI RNA. Strikingly, an analysis of host RNAs bound by wild-type, but not mutant, VP35 revealed that select host RNAs are preferentially bound by VP35 in cell culture. Taken together, these data support a model in which VP35 sequesters isRNA in virus-infected cells to avert RIG-I like receptor (RLR) activation.

]]>
<![CDATA[Chromosomal-Level Assembly of the Asian Seabass Genome Using Long Sequence Reads and Multi-layered Scaffolding]]> https://www.researchpad.co/article/5989da19ab0ee8fa60b7c3cb

We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species’ native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics.

]]>
<![CDATA[De Novo Genome Assembly Shows Genome Wide Similarity between Trypanosoma brucei brucei and Trypanosoma brucei rhodesiense]]> https://www.researchpad.co/article/5989d9e7ab0ee8fa60b6b952

Background

Trypanosoma brucei is a eukaryotic pathogen which causes African trypanosomiasis. It is notable for its variant surface glycoprotein (VSG) coat, which undergoes antigenic variation enabled by a large suite of VSG pseudogenes, allowing for persistent evasion of host adaptive immunity. While Trypanosoma brucei rhodesiense (Tbr) and T. b gambiense (Tbg) are human infective, related T. b. brucei (Tbb) is cleared by human sera. A single gene, the Serum Resistance Associated (SRA) gene, confers Tbr its human infectivity phenotype. Potential genetic recombination of this gene between Tbr and non-human infective Tbb strains has significant epidemiological consequences for Human African Trypanosomiasis outbreaks.

Results

Using long and short read whole genome sequencing, we generated a hybrid de novo assembly of a Tbr strain, producing 4,210 scaffolds totaling approximately 38.8 megabases, which comprise a significant proportion of the Tbr genome, and thus represents a valuable tool for a comparative genomics analyses among human and non-human infective T. brucei and future complete genome assembly. We detected 5,970 putative genes, of which two, an alcohol oxidoreductase and a pentatricopeptide repeat-containing protein, were members of gene families common to all T. brucei subspecies, but variants specific to the Tbr strain sequenced in this study. Our findings confirmed the extremely high level of genomic similarity between the two parasite subspecies found in other studies.

Conclusions

We confirm at the whole genome level high similarity between the two Tbb and Tbr strains studied. The discovery of extremely minor genomic differentiation between Tbb and Tbr suggests that the transference of the SRA gene via genetic recombination could potentially result in novel human infective strains, thus all genetic backgrounds of T. brucei should be considered potentially human infective in regions where Tbr is prevalent.

]]>