ResearchPad - sequence-analysis https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[TIM, a targeted insertional mutagenesis method utilizing CRISPR/Cas9 in <i>Chlamydomonas reinhardtii</i>]]> https://www.researchpad.co/article/elastic_article_13864 Generation and subsequent analysis of mutants is critical to understanding the functions of genes and proteins. Here we describe TIM, an efficient, cost-effective, CRISPR-based targeted insertional mutagenesis method for the model organism Chlamydomonas reinhardtii. TIM utilizes delivery into the cell of a Cas9-guide RNA (gRNA) ribonucleoprotein (RNP) together with exogenous double-stranded (donor) DNA. The donor DNA contains gene-specific homology arms and an integral antibiotic-resistance gene that inserts at the double-stranded break generated by Cas9. After optimizing multiple parameters of this method, we were able to generate mutants for six out of six different genes in two different cell-walled strains with mutation efficiencies ranging from 40% to 95%. Furthermore, these high efficiencies allowed simultaneous targeting of two separate genes in a single experiment. TIM is flexible with regard to many parameters and can be carried out using either electroporation or the glass-bead method for delivery of the RNP and donor DNA. TIM achieves a far higher mutation rate than any previously reported for CRISPR-based methods in C. reinhardtii and promises to be effective for many, if not all, non-essential nuclear genes.

]]>
<![CDATA[Using case-level context to classify cancer pathology reports]]> https://www.researchpad.co/article/elastic_article_7869 Individual electronic health records (EHRs) and clinical reports are often part of a larger sequence—for example, a single patient may generate multiple reports over the trajectory of a disease. In applications such as cancer pathology reports, it is necessary not only to extract information from individual reports, but also to capture aggregate information regarding the entire cancer case based off case-level context from all reports in the sequence. In this paper, we introduce a simple modular add-on for capturing case-level context that is designed to be compatible with most existing deep learning architectures for text classification on individual reports. We test our approach on a corpus of 431,433 cancer pathology reports, and we show that incorporating case-level context significantly boosts classification accuracy across six classification tasks—site, subsite, laterality, histology, behavior, and grade. We expect that with minimal modifications, our add-on can be applied towards a wide range of other clinical text-based tasks.

]]>
<![CDATA[ <i>BioSeqZip</i>: a collapser of NGS redundant reads for the optimization of sequence analysis]]> https://www.researchpad.co/article/N57483afe-1e29-4ccb-a124-b3461a285839 High-throughput next-generation sequencing can generate huge sequence files, whose analysis requires alignment algorithms that are typically very demanding in terms of memory and computational resources. This is a significant issue, especially for machines with limited hardware capabilities. As the redundancy of the sequences typically increases with coverage, collapsing such files into compact sets of non-redundant reads has the 2-fold advantage of reducing file size and speeding-up the alignment, avoiding to map the same sequence multiple times.Method BioSeqZip generates compact and sorted lists of alignment-ready non-redundant sequences, keeping track of their occurrences in the raw files as well as of their quality score information. By exploiting a memory-constrained external sorting algorithm, it can be executed on either single- or multi-sample datasets even on computers with medium computational capabilities. On request, it can even re-expand the compacted files to their original state.ResultsOur extensive experiments on RNA-Seq data show that BioSeqZip considerably brings down the computational costs of a standard sequence analysis pipeline, with particular benefits for the alignment procedures that typically have the highest requirements in terms of memory and execution time. In our tests, BioSeqZip was able to compact 2.7 billion of reads into 963 million of unique tags reducing the size of sequence files up to 70% and speeding-up the alignment by 50% at least.Availability and implementation BioSeqZip is available at https://github.com/bioinformatics-polito/BioSeqZip.Supplementary information Supplementary data are available at Bioinformatics online. ]]> <![CDATA[Bivartect: accurate and memory-saving breakpoint detection by direct read comparison]]> https://www.researchpad.co/article/Nef0c678a-8e44-48b4-b23e-6ad52ea03f7a Genetic variant calling with high-throughput sequencing data has been recognized as a useful tool for better understanding of disease mechanism and detection of potential off-target sites in genome editing. Since most of the variant calling algorithms rely on initial mapping onto a reference genome and tend to predict many variant candidates, variant calling remains challenging in terms of predicting variants with low false positives.ResultsHere we present Bivartect, a simple yet versatile variant caller based on direct comparison of short sequence reads between normal and mutated samples. Bivartect can detect not only single nucleotide variants but also insertions/deletions, inversions and their complexes. Bivartect achieves high predictive performance with an elaborate memory-saving mechanism, which allows Bivartect to run on a computer with a single node for analyzing small omics data. Tests with simulated benchmark and real genome-editing data indicate that Bivartect was comparable to state-of-the-art variant callers in positive predictive value for detection of single nucleotide variants, even though it yielded a substantially small number of candidates. These results suggest that Bivartect, a reference-free approach, will contribute to the identification of germline mutations as well as off-target sites introduced during genome editing with high accuracy.Availability and implementationBivartect is implemented in C++ and available along with in silico simulated data at https://github.com/ykat0/bivartect.Supplementary information Supplementary data are available at Bioinformatics online. ]]> <![CDATA[MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs]]> https://www.researchpad.co/article/N2b7a7074-1354-4430-9fc5-152fc1131146 Position-specific probability matrices (PPMs, also called position-specific weight matrices) have been the dominating model for transcription factor (TF)-binding motifs in DNA. There is, however, increasing recent evidence of better performance of higher order models such as Markov models of order one, also called adjacent dinucleotide matrices (ADMs). ADMs can model dependencies between adjacent nucleotides, unlike PPMs. A modeling technique and software tool that would estimate such models simultaneously both for monomers and their dimers have been missing.ResultsWe present an ADM-based mixture model for monomeric and dimeric TF-binding motifs and an expectation maximization algorithm MODER2 for learning such models from training data and seeds. The model is a mixture that includes monomers and dimers, built from the monomers, with a description of the dimeric structure (spacing, orientation). The technique is modular, meaning that the co-operative effect of dimerization is made explicit by evaluating the difference between expected and observed models. The model is validated using HT-SELEX and generated datasets, and by comparing to some earlier PPM and ADM techniques. The ADM models explain data slightly better than PPM models for 314 tested TFs (or their DNA-binding domains) from four families (bHLH, bZIP, ETS and Homeodomain), the ADM mixture models by MODER2 being the best on average.Availability and implementationSoftware implementation is available from https://github.com/jttoivon/moder2.Supplementary information Supplementary data are available at Bioinformatics online. ]]> <![CDATA[LAMPA, LArge Multidomain Protein Annotator, and its application to RNA virus polyproteins]]> https://www.researchpad.co/article/N73e7f13f-c395-44d7-9bdc-13b11c06733e To facilitate accurate estimation of statistical significance of sequence similarity in profile–profile searches, queries should ideally correspond to protein domains. For multidomain proteins, using domains as queries depends on delineation of domain borders, which may be unknown. Thus, proteins are commonly used as queries that complicate establishing homology for similarities close to cutoff levels of statistical significance.ResultsIn this article, we describe an iterative approach, called LAMPA, LArge Multidomain Protein Annotator, that resolves the above conundrum by gradual expansion of hit coverage of multidomain proteins through re-evaluating statistical significance of hit similarity using ever smaller queries defined at each iteration. LAMPA employs TMHMM and HHsearch for recognition of transmembrane regions and homology, respectively. We used Pfam database for annotating 2985 multidomain proteins (polyproteins) composed of >1000 amino acid residues, which dominate proteomes of RNA viruses. Under strict cutoffs, LAMPA outperformed HHsearch-mediated runs using intact polyproteins as queries by three measures: number of and coverage by identified homologous regions, and number of hit Pfam profiles. Compared to HHsearch, LAMPA identified 507 extra homologous regions in 14.4% of polyproteins. This Pfam-based annotation of RNA virus polyproteins by LAMPA was also superior to RefSeq expert annotation by two measures, region number and annotated length, for 69.3% of RNA virus polyprotein entries. We rationalized the obtained results based on dependencies of HHsearch hit statistical significance for local alignment similarity score from lengths and diversities of query-target pairs in computational experiments.Availability and implementationLAMPA 1.0.0 R package is placed at github (https://github.com/Gorbalenya-Lab/LAMPA).Supplementary information Supplementary data are available at Bioinformatics online. ]]> <![CDATA[Mining SARS-CoV protease cleavage data using non-orthogonal decision trees: a novel method for decisive template selection]]> https://www.researchpad.co/article/N2042bd9f-55ef-4b5f-abfd-004f60140823 Motivation: Although the outbreak of the severe acute respiratory syndrome (SARS) is currently over, it is expected that it will return to attack human beings. A critical challenge to scientists from various disciplines worldwide is to study the specificity of cleavage activity of SARS-related coronavirus (SARS-CoV) and use the knowledge obtained from the study for effective inhibitor design to fight the disease. The most commonly used inductive programming methods for knowledge discovery from data assume that the elements of input patterns are orthogonal to each other. Suppose a sub-sequence is denoted as P2P1P1′P2′, the conventional inductive programming method may result in a rule like ‘if P1 = Q, then the sub-sequence is cleaved, otherwise non-cleaved’. If the site P1 is not orthogonal to the others (for instance, P2, P1′ and P2′), the prediction power of these kind of rules may be limited. Therefore this study is aimed at developing a novel method for constructing non-orthogonal decision trees for mining protease data.

Result: Eighteen sequences of coronavirus polyprotein were downloaded from NCBI (http://www.ncbi.nlm.nih.gov). Among these sequences, 252 cleavage sites were experimentally determined. These sequences were scanned using a sliding window with size k to generate about 50 000 k-mer sub-sequences (for short, k-mers). The value of k varies from 4 to 12 with a gap of two. The bio-basis function proposed by Thomson et al. is used to transform the k-mers to a high-dimensional numerical space on which an inductive programming method is applied for the purpose of deriving a decision tree for decision-making. The process of this transform is referred to as a bio-mapping. The constructed decision trees select about 10 out of 50 000 k-mers. This small set of selected k-mers is regarded as a set of decisive templates. By doing so, non-orthogonal decision trees are constructed using the selected templates and the prediction accuracy is significantly improved.

Availability: The program for bio-mapping can be obtained by request to the author.

Contact: z.r.yang@exeter.ac.uk

]]>
<![CDATA[NAP (davunetide) preferential interaction with dynamic 3-repeat Tau explains differential protection in selected tauopathies]]> https://www.researchpad.co/article/5c92b379d5eed0c4843a4107

The microtubule (MT) associated protein Tau is instrumental for the regulation of MT assembly and dynamic instability, orchestrating MT-dependent cellular processes. Aberration in Tau post-translational modifications ratio deviation of spliced Tau isoforms 3 or 4 MT binding repeats (3R/4R) have been implicated in neurodegenerative tauopathies. Activity-dependent neuroprotective protein (ADNP) is vital for brain formation and cognitive function. ADNP deficiency in mice causes pathological Tau hyperphosphorylation and aggregation, correlated with impaired cognitive functions. It has been previously shown that the ADNP-derived peptide NAP protects against ADNP deficiency, exhibiting neuroprotection, MT interaction and memory protection. NAP prevents MT degradation by recruitment of Tau and end-binding proteins to MTs and expression of these proteins is required for NAP activity. Clinically, NAP (davunetide, CP201) exhibited efficacy in prodromal Alzheimer’s disease patients (Tau3R/4R tauopathy) but not in progressive supranuclear palsy (increased Tau4R tauopathy). Here, we examined the potential preferential interaction of NAP with 3R vs. 4R Tau, toward personalized treatment of tauopathies. Affinity-chromatography showed that NAP preferentially interacted with Tau3R protein from rat brain extracts and fluorescence recovery after photobleaching assay indicated that NAP induced increased recruitment of human Tau3R to MTs under zinc intoxication, in comparison to Tau4R. Furthermore, we showed that NAP interaction with tubulin (MTs) was inhibited by obstruction of Tau-binding sites on MTs, confirming the requirement of Tau-MT interaction for NAP activity. The preferential interaction of NAP with Tau3R may explain clinical efficacy in mixed vs. Tau4R pathologies, and suggest effectiveness in Tau3R neurodevelopmental disorders.

]]>
<![CDATA[Specific clones of Trichomonas tenax are associated with periodontitis]]> https://www.researchpad.co/article/5c900d3bd5eed0c48407e3b6

Trichomonas tenax, an anaerobic protist difficult to cultivate with an unreliable molecular identification, has been suspected of involvement in periodontitis, a multifactorial inflammatory dental disease affecting the soft tissue and bone of periodontium. A cohort of 106 periodontitis patients classified by stages of severity and 85 healthy adult control patients was constituted. An efficient culture protocol, a new identification tool by real-time qPCR of T. tenax and a Multi-Locus Sequence Typing system (MLST) based on T. tenax NIH4 reference strain were created. Fifty-three strains of Trichomonas sp. were obtained from periodontal samples. 37/106 (34.90%) T. tenax from patients with periodontitis and 16/85 (18.80%°) T. tenax from control patients were detected by culture (p = 0.018). Sixty of the 191 samples were tested positive for T. tenax by qPCR, 24/85 (28%) controls and 36/106 (34%) periodontitis patients (p = 0.089). By combining both results, 45/106 (42.5%) patients were positive by culture and/or PCR, as compared to 24/85 (28.2%) controls (p = 0.042). A link was established between the carriage in patients of Trichomonas tenax and the severity of the disease. Genotyping demonstrates the presence of strain diversity with three major different clusters and a relation between disease strains and the periodontitis severity (p<0.05). More frequently detected in periodontal cases, T. tenax is likely to be related to the onset or/and evolution of periodontal diseases.

]]>
<![CDATA[Quantitative real-time PCR as a promising tool for the detection and quantification of leaf-associated fungal species – A proof-of-concept using Alatospora pulchella]]> https://www.researchpad.co/article/5989db52ab0ee8fa60bdc5cf

Traditional methods to identify aquatic hyphomycetes rely on the morphology of released conidia, which can lead to misidentifications or underestimates of species richness due to convergent morphological evolution and the presence of non-sporulating mycelia. Molecular methods allow fungal identification irrespective of the presence of conidia or their morphology. As a proof-of-concept, we established a quantitative real-time polymerase chain reaction (qPCR) assay to accurately quantify the amount of DNA as a proxy for the biomass of an aquatic hyphomycete species (Alatospora pulchella). Our study showed discrimination even among genetically closely-related species, with a high sensitivity and a reliable quantification down to 9.9 fg DNA (3 PCR forming units; LoD) and 155.0 fg DNA (47 PCR forming units; LoQ), respectively. The assay’s specificity was validated for environmental samples that harboured diverse microbial communities and likely contained PCR-inhibiting substances. This makes qPCR a promising tool to gain deeper insights into the ecological roles of aquatic hyphomycetes and other microorganisms.

]]>
<![CDATA[UDSMProt: universal deep sequence models for protein classification]]> https://www.researchpad.co/article/N06ec7b02-5f84-40e3-9693-1aa1a2b9830a

Abstract

Motivation

Inferring the properties of a protein from its amino acid sequence is one of the key problems in bioinformatics. Most state-of-the-art approaches for protein classification are tailored to single classification tasks and rely on handcrafted features, such as position-specific-scoring matrices from expensive database searches. We argue that this level of performance can be reached or even be surpassed by learning a task-agnostic representation once, using self-supervised language modeling, and transferring it to specific tasks by a simple fine-tuning step.

Results

We put forward a universal deep sequence model that is pre-trained on unlabeled protein sequences from Swiss-Prot and fine-tuned on protein classification tasks. We apply it to three prototypical tasks, namely enzyme class prediction, gene ontology prediction and remote homology and fold detection. The proposed method performs on par with state-of-the-art algorithms that were tailored to these specific tasks or, for two out of three tasks, even outperforms them. These results stress the possibility of inferring protein properties from the sequence alone and, on more general grounds, the prospects of modern natural language processing methods in omics. Moreover, we illustrate the prospects for explainable machine learning methods in this field by selected case studies.

Availability and implementation

Source code is available under https://github.com/nstrodt/UDSMProt.

Supplementary information

Supplementary data are available at Bioinformatics online.

]]>
<![CDATA[FilTar: using RNA-Seq data to improve microRNA target prediction accuracy in animals]]> https://www.researchpad.co/article/N0870a0ef-ad7f-485c-8234-e0ef66109b19

Abstract

Motivation

MicroRNA (miRNA) target prediction algorithms do not generally consider biological context and therefore generic target prediction based on seed binding can lead to a high level of false-positive predictions. Here, we present FilTar, a method that incorporates RNA-Seq data to make miRNA target prediction specific to a given cell type or tissue of interest.

Results

We demonstrate that FilTar can be used to: (i) provide sample specific 3′-UTR reannotation; extending or truncating default annotations based on RNA-Seq read evidence and (ii) filter putative miRNA target predictions by transcript expression level, thus removing putative interactions where the target transcript is not expressed in the tissue or cell line of interest. We test the method on a variety of miRNA transfection datasets and demonstrate increased accuracy versus generic miRNA target prediction methods.

Availability and implementation

FilTar is freely available and can be downloaded from https://github.com/TBradley27/FilTar. The tool is implemented using the Python and R programming languages, and is supported on GNU/Linux operating systems.

Supplementary information

Supplementary data are available at Bioinformatics online.

]]>
<![CDATA[Identification of a novel archaea virus, detected in hydrocarbon polluted Hungarian and Canadian samples]]> https://www.researchpad.co/article/N5489318a-3499-4862-9afc-2378cea7eecb

Metagenomics is a helpful tool for the analysis of unculturable organisms and viruses. Viruses that target bacteria and archaea play important roles in the microbial diversity of various ecosystems. Here we show that Methanosarcina virus MV (MetMV), the second Methanosarcina sp. virus with a completely determined genome, is characteristic of hydrocarbon pollution in environmental (soil and water) samples. It was highly abundant in Hungarian hydrocarbon polluted samples and its genome was also present in the NCBI SRA database containing reads from hydrocarbon polluted samples collected in Canada, indicating the stability of its niche and the marker feature of this virus. MetMV, as the only currently identified marker virus for pollution in environmental samples, could contribute to the understanding of the complicated network of prokaryotes and their viruses driving the decomposition of environmental pollutants.

]]>
<![CDATA[Transcriptomic analysis of polyketide synthases in a highly ciguatoxic dinoflagellate, Gambierdiscus polynesiensis and low toxicity Gambierdiscus pacificus, from French Polynesia]]> https://www.researchpad.co/article/Nca210627-69b7-4a50-96ce-ecb4ce1a2ae1

Marine dinoflagellates produce a diversity of polyketide toxins that are accumulated in marine food webs and are responsible for a variety of seafood poisonings. Reef-associated dinoflagellates of the genus Gambierdiscus produce toxins responsible for ciguatera poisoning (CP), which causes over 50,000 cases of illness annually worldwide. The biosynthetic machinery for dinoflagellate polyketides remains poorly understood. Recent transcriptomic and genomic sequencing projects have revealed the presence of Type I modular polyketide synthases in dinoflagellates, as well as a plethora of single domain transcripts with Type I sequence homology. The current transcriptome analysis compares polyketide synthase (PKS) gene transcripts expressed in two species of Gambierdiscus from French Polynesia: a highly toxic ciguatoxin producer, G. polynesiensis, versus a non-ciguatoxic species G. pacificus, each assembled from approximately 180 million Illumina 125 nt reads using Trinity, and compares their PKS content with previously published data from other Gambierdiscus species and more distantly related dinoflagellates. Both modular and single-domain PKS transcripts were present. Single domain β-ketoacyl synthase (KS) transcripts were highly amplified in both species (98 in G. polynesiensis, 99 in G. pacificus), with smaller numbers of standalone acyl transferase (AT), ketoacyl reductase (KR), dehydratase (DH), enoyl reductase (ER), and thioesterase (TE) domains. G. polynesiensis expressed both a larger number of multidomain PKSs, and larger numbers of modules per transcript, than the non-ciguatoxic G. pacificus. The largest PKS transcript in G. polynesiensis encoded a 10,516 aa, 7 module protein, predicted to synthesize part of the polyether backbone. Transcripts and gene models representing portions of this PKS are present in other species, suggesting that its function may be performed in those species by multiple interacting proteins. This study contributes to the building consensus that dinoflagellates utilize a combination of Type I modular and single domain PKS proteins, in an as yet undefined manner, to synthesize polyketides.

]]>
<![CDATA[Chalcone synthase (CHS) family members analysis from eggplant (Solanum melongena L.) in the flavonoid biosynthetic pathway and expression patterns in response to heat stress]]> https://www.researchpad.co/article/N0c4703df-5c43-4557-a077-ba839b092c8d

Enzymes of the chalcone synthase (CHS) family participate in the synthesis of multiple secondary metabolites in plants, fungi and bacteria. CHS showed a significant correlation with the accumulation patterns of anthocyanin. The peel color, which is primarily determined by the content of anthocyanin, is an economically important trait for eggplants that is affected by heat stress. A total of 7 CHS (SmCHS1-7) putative genes were identified in a genome-wide analysis of eggplants (S. melongena L.). The SmCHS genes were distributed on 7 scaffolds and were classified into 3 clusters. Phylogenetic relationship analysis showed that 73 CHS genes from 7 Solanaceae species were classified into 10 groups. SmCHS5, SmCHS6 and SmCHS7 were continuously down-regulated under 38°C and 45°C treatment, while SmCHS4 was up-regulated under 38°C but showed little change at 45°C in peel. Expression profiles of key anthocyanin biosynthesis gene families showed that the PAL, 4CL and AN11 genes were primarily expressed in all five tissues. The CHI, F3H, F3’5’H, DFR, 3GT and bHLH1 genes were expressed in flower and peel. Under heat stress, the expression level of 52 key genes were reduced. In contrast, the expression patterns of eight key genes similar to SmCHS4 were up-regulated at a treatment of 38°C for 3 hour. Comparative analysis of putative CHS protein evolutionary relationships, cis-regulatory elements, and regulatory networks indicated that SmCHS gene family has a conserved gene structure and functional diversification. SmCHS showed two or more expression patterns, these results of this study may facilitate further research to understand the regulatory mechanism governing peel color in eggplants.

]]>
<![CDATA[Evidence for both sequential mutations and recombination in the evolution of kdr alleles in Aedes aegypti]]> https://www.researchpad.co/article/N8479e8f6-b6ad-4aa7-91b1-bf6bde90184a

Background

Aedes aegypti is a globally distributed vector of human diseases including dengue, yellow fever, chikungunya, and Zika. Pyrethroid insecticides are the primary means of controlling adult A. aegypti populations to suppress arbovirus outbreaks, but resistance to pyrethroid insecticides has become a global problem. Mutations in the voltage-sensitive sodium channel (Vssc) gene are a major mechanism of pyrethroid resistance in A. aegypti. Vssc resistance alleles in A. aegypti commonly have more than one mutation. However, our understanding of the evolutionary dynamics of how alleles with multiple mutations arose is poorly understood.

Methodology/Principal findings

We examined the geographic distribution and association between the common Vssc mutations (V410L, S989P, V1016G/I and F1534C) in A. aegypti by analyzing the relevant Vssc fragments in 25 collections, mainly from Asia and the Americas. Our results showed all 11 Asian populations had two types of resistance alleles: 1534C and 989P+1016G. The 1534C allele was more common with frequencies ranging from 0.31 to 0.88, while the 989P+1016G frequency ranged from 0.13 to 0.50. Four distinct alleles (410L, 1534C, 410L+1534C and 410L+1016I+1534C) were detected in populations from the Americas. The most common was 410L+1016I+1534C with frequencies ranging from 0.50 to 1.00, followed by 1534C with frequencies ranging from 0.13 to 0.50. Our phylogenetic analysis of Vssc supported multiple independent origins of the F1534C mutation. Our results indicated the 410L+1534C allele may have arisen by addition of the V410L mutation to the 1534C allele, or by a crossover event. The 410L+1016I+1534C allele was the result of one or two mutational steps from a 1534C background.

Conclusions/Significance

Our data corroborated previous geographic distributions of resistance mutations and provided evidence for both recombination and sequential accumulation of mutations contributing to the molecular evolution of resistance alleles in A. aegypti.

]]>
<![CDATA[Analysis of the nucleocytoplasmic shuttling RNA-binding protein HNRNPU using optimized HITS-CLIP method]]> https://www.researchpad.co/article/Nb5a6160c-8969-498c-b6ff-671487ce7810

RNA-binding proteins (RBPs) control many types of post-transcriptional regulation, including mRNA splicing, mRNA stability, and translational efficiency, by directly binding to their target RNAs and their mutation and dysfunction are often associated with several human neurological diseases and tumorigenesis. Crosslinking immunoprecipitation (CLIP), coupled with high-throughput sequencing (HITS-CLIP), is a powerful technique for investigating the molecular mechanisms underlying disease pathogenesis by comprehensive identification of RBP target sequences at the transcriptome level. However, HITS-CLIP protocol is still required for some optimization due to experimental complication, low efficiency and time-consuming, whose library has to be generated from very small amounts of RNAs. Here we improved a more efficient, rapid, and reproducible CLIP method by optimizing BrdU-CLIP. Our protocol produced a 10-fold greater yield of pre-amplified CLIP library, which resulted in a low duplicate rate of CLIP-tag reads because the number of PCR cycles required for library amplification was reduced. Variance of the yields was also reduced, and the experimental period was shortened by 2 days. Using this, we validated IL-6 expression by a nuclear RBP, HNRNPU, which directly binds the 3’-UTR of IL-6 mRNA in HeLa cells. Importantly, this interaction was only observed in the cytoplasmic fraction, suggesting a role of cytoplasmic HNRNPU in mRNA stability control. This optimized method enables us to accurately identify target genes and provides a snapshot of the protein-RNA interactions of nucleocytoplasmic shuttling RBPs.

]]>
<![CDATA[First description of a herpesvirus infection in genus Lepus]]> https://www.researchpad.co/article/N2b9a02c7-7220-4716-8700-9456c07e4236

During the necropsies of Iberian hares obtained in 2018/2019, along with signs of the nodular form of myxomatosis, other unexpected external lesions were also observed. Histopathology revealed nuclear inclusion bodies in stromal cells suggesting the additional presence of a nuclear replicating virus. Transmission electron microscopy further demonstrated the presence of herpesvirus particles in the tissues of affected hares. We confirmed the presence of herpesvirus in 13 MYXV-positive hares by PCR and sequencing analysis. Herpesvirus-DNA was also detected in seven healthy hares, suggesting its asymptomatic circulation. Phylogenetic analysis based on concatenated partial sequences of DNA polymerase gene and glycoprotein B gene enabled greater resolution than analysing the sequences individually. The hare’ virus was classified close to herpesviruses from rodents within the Rhadinovirus genus of the gammaherpesvirus subfamily. We propose to name this new virus Leporid gammaherpesvirus 5 (LeHV-5), according to the International Committee on Taxonomy of Viruses standards. The impact of herpesvirus infection on the reproduction and mortality of the Iberian hare is yet unknown but may aggravate the decline of wild populations caused by the recently emerged natural recombinant myxoma virus.

]]>
<![CDATA[Detection of novel coronaviruses in bats in Myanmar]]> https://www.researchpad.co/article/N3669ab46-787e-4c30-a451-397d479219b9

The recent emergence of bat-borne zoonotic viruses warrants vigilant surveillance in their natural hosts. Of particular concern is the family of coronaviruses, which includes the causative agents of severe acute respiratory syndrome (SARS), Middle East respiratory syndrome (MERS), and most recently, Coronavirus Disease 2019 (COVID-19), an epidemic of acute respiratory illness originating from Wuhan, China in December 2019. Viral detection, discovery, and surveillance activities were undertaken in Myanmar to identify viruses in animals at high risk contact interfaces with people. Free-ranging bats were captured, and rectal and oral swabs and guano samples collected for coronaviral screening using broadly reactive consensus conventional polymerase chain reaction. Sequences from positives were compared to known coronaviruses. Three novel alphacoronaviruses, three novel betacoronaviruses, and one known alphacoronavirus previously identified in other southeast Asian countries were detected for the first time in bats in Myanmar. Ongoing land use change remains a prominent driver of zoonotic disease emergence in Myanmar, bringing humans into ever closer contact with wildlife, and justifying continued surveillance and vigilance at broad scales.

]]>
<![CDATA[SPAAN: a software program for prediction of adhesins and adhesin-like proteins using neural networks]]> https://www.researchpad.co/article/N6075cbe7-4812-4f09-ace7-d31be376e7f5

Abstract

Motivation: The adhesion of microbial pathogens to host cells is mediated by adhesins. Experimental methods used for characterizing adhesins are time-consuming and demand large resources. The availability of specialized software can rapidly aid experimenters in simplifying this problem. We have employed 105 compositional properties and artificial neural networks to develop SPAAN, which predicts the probability of a protein being an adhesin (Pad).

Results: SPAAN had optimal sensitivity of 89% and specificity of 100% on a defined test set and could identify 97.4% of known adhesins at high Pad value from a wide range of bacteria. Furthermore, SPAAN facilitated improved annotation of several proteins as adhesins. Novel adhesins were identified in 17 pathogenic organisms causing diseases in humans and plants. In the severe acute respiratory syndrome (SARS) associated human corona virus, the spike glycoprotein and nsps (nsp2, nsp5, nsp6 and nsp7) were identified as having adhesin-like characteristics. These results offer new lead for rapid experimental testing.

Availability: SPAAN is freely available through ftp://203.195.151.45

Contact: ramu@igib.res.in

]]>