ResearchPad - nucleotide-sequencing https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[iterb-PPse: Identification of transcriptional terminators in bacterial by incorporating nucleotide properties into PseKNC]]> https://www.researchpad.co/article/elastic_article_14750 Terminator is a DNA sequence that gives the RNA polymerase the transcriptional termination signal. Identifying terminators correctly can optimize the genome annotation, more importantly, it has considerable application value in disease diagnosis and therapies. However, accurate prediction methods are deficient and in urgent need. Therefore, we proposed a prediction method “iterb-PPse” for terminators by incorporating 47 nucleotide properties into PseKNC-Ⅰ and PseKNC-Ⅱ and utilizing Extreme Gradient Boosting to predict terminators based on Escherichia coli and Bacillus subtilis. Combing with the preceding methods, we employed three new feature extraction methods K-pwm, Base-content, Nucleotidepro to formulate raw samples. The two-step method was applied to select features. When identifying terminators based on optimized features, we compared five single models as well as 16 ensemble models. As a result, the accuracy of our method on benchmark dataset achieved 99.88%, higher than the existing state-of-the-art predictor iTerm-PseKNC in 100 times five-fold cross-validation test. Its prediction accuracy for two independent datasets reached 94.24% and 99.45% respectively. For the convenience of users, we developed a software on the basis of “iterb-PPse” with the same name. The open software and source code of “iterb-PPse” are available at https://github.com/Sarahyouzi/iterb-PPse.

]]>
<![CDATA[Are pangolins the intermediate host of the 2019 novel coronavirus (SARS-CoV-2)?]]> https://www.researchpad.co/article/elastic_article_14545 Recently, a novel coronavirus, SARS-CoV-2, caused a still ongoing pandemic. Epidemiological study suggested this virus was associated with a wet market in Wuhan, China. However, the exact source of this virus is still unknown. In this study, we attempted to assemble the complete genome of a coronavirus identified from two groups of sick Malayan pangolins, which were likely to be smuggled for black market trade. The molecular and evolutionary analyses showed that this pangolin coronavirus we assembled was genetically associated with the SARS-CoV-2 but was not likely its precursor. This study suggested that pangolins are natural hosts of coronaviruses. Determining the spectrum of coronaviruses in pangolins can help understand the natural history of coronaviruses in wildlife and at the animal-human interface, and facilitate the prevention and control of coronavirus-associated emerging diseases.

]]>
<![CDATA[First description of a herpesvirus infection in genus Lepus]]> https://www.researchpad.co/article/N2b9a02c7-7220-4716-8700-9456c07e4236

During the necropsies of Iberian hares obtained in 2018/2019, along with signs of the nodular form of myxomatosis, other unexpected external lesions were also observed. Histopathology revealed nuclear inclusion bodies in stromal cells suggesting the additional presence of a nuclear replicating virus. Transmission electron microscopy further demonstrated the presence of herpesvirus particles in the tissues of affected hares. We confirmed the presence of herpesvirus in 13 MYXV-positive hares by PCR and sequencing analysis. Herpesvirus-DNA was also detected in seven healthy hares, suggesting its asymptomatic circulation. Phylogenetic analysis based on concatenated partial sequences of DNA polymerase gene and glycoprotein B gene enabled greater resolution than analysing the sequences individually. The hare’ virus was classified close to herpesviruses from rodents within the Rhadinovirus genus of the gammaherpesvirus subfamily. We propose to name this new virus Leporid gammaherpesvirus 5 (LeHV-5), according to the International Committee on Taxonomy of Viruses standards. The impact of herpesvirus infection on the reproduction and mortality of the Iberian hare is yet unknown but may aggravate the decline of wild populations caused by the recently emerged natural recombinant myxoma virus.

]]>
<![CDATA[Molecular analyses and phylogeny of the herpes simplex virus 2 US9 and glycoproteins gE/gI obtained from infected subjects during the Herpevac Trial for Women]]> https://www.researchpad.co/article/5c8c193cd5eed0c484b4d241

Herpes simplex virus 2 (HSV-2) is a large double-stranded DNA virus that causes genital sores when spread by sexual contact and is a principal cause of viral encephalitis in newborns and infants. Viral glycoproteins enable virion entry into and spread between cells, making glycoproteins a prime target for vaccine development. A truncated glycoprotein D2 (gD2) vaccine candidate, recently tested in the phase 3 Herpevac Trial for Women, did not prevent HSV-2 infection in initially seronegative women. Some women who became infected experienced multiple recurrences during the trial. The HSV US7, US8, and US9 genes encode glycoprotein I (gI), glycoprotein E (gE), and the US9 type II membrane protein, respectively. These proteins participate in viral spread across cell junctions and facilitate anterograde transport of virion components in neurons, prompting us to investigate whether sequence variants in these genes could be associated with frequent recurrence. The nucleotide sequences and dN/dS ratios of the US7-US9 region from viral isolates of individuals who experienced multiple recurrences were compared with those who had had a single episode of disease. No consistent polymorphism(s) distinguished the recurrent isolates. In frequently recurring isolates, the dN/dS ratio of US7 was low while greater variation (higher dN/dS ratio) occurred in US8, suggesting conserved function of the former during reactivation. Phylogenetic reconstruction of the US7-US9 region revealed eight strongly supported clusters within the 55 U.S. HSV-2 strains sampled, which were preserved in a second global phylogeny. Thus, although we have demonstrated evolutionary diversity in the US7-US9 complex, we found no molecular evidence of sequence variation in US7-US9 that distinguishes isolates from subjects with frequently recurrent episodes of disease.

]]>
<![CDATA[Profile of the tprK gene in primary syphilis patients based on next-generation sequencing]]> https://www.researchpad.co/article/5c784fecd5eed0c484007915

Background

The highly variable tprK gene of Treponema pallidum has been acknowledged to be one of the mechanisms that causes persistent infection. Previous studies have mainly focused on the heterogeneity in tprK in propagated strains using a clone-based Sanger approach. Few studies have investigated tprK directly from clinical samples using deep sequencing.

Methods/Principal findings

We conducted a comprehensive analysis of 14 primary syphilis clinical isolates of T. pallidum via next-generation sequencing to gain better insight into the profile of tprK in primary syphilis patients. Our results showed that there was a mixture of distinct sequences within each V region of tprK. Except for the predominant sequence for each V region as previously reported using the clone-based Sanger approach, there were many minor variants of all strains that were mainly observed at a frequency of 1–5%. Interestingly, the identified distinct sequences within the regions were variable in length and differed by only 3 bp or multiples of 3 bp. In addition, amino acid sequence consistency within each V region was found among the 14 strains. Among the regions, the sequence IASDGGAIKH in V1 and the sequence DVGHKKENAANVNGTVGA in V4 showed a high stability of inter-strain redundancy.

Conclusions

The seven V regions of the tprK gene in primary syphilis infection demonstrated high diversity; they generally contained a high proportion sequence and numerous low-frequency minor variants, most of which are far below the detection limit of Sanger sequencing. The rampant variation in each V region was regulated by a strict gene conversion mechanism that maintained the length difference to 3 bp or multiples of 3 bp. The highly stable sequence of inter-strain redundancy may indicate that the sequences play a critical role in T. pallidum virulence. These highly stable peptides are also likely to be potential targets for vaccine development.

]]>
<![CDATA[16S rRNA sequence embeddings: Meaningful numeric feature representations of nucleotide sequences that are convenient for downstream analyses]]> https://www.researchpad.co/article/5c7ee7c5d5eed0c4848f4d9c

Advances in high-throughput sequencing have increased the availability of microbiome sequencing data that can be exploited to characterize microbiome community structure in situ. We explore using word and sentence embedding approaches for nucleotide sequences since they may be a suitable numerical representation for downstream machine learning applications (especially deep learning). This work involves first encoding (“embedding”) each sequence into a dense, low-dimensional, numeric vector space. Here, we use Skip-Gram word2vec to embed k-mers, obtained from 16S rRNA amplicon surveys, and then leverage an existing sentence embedding technique to embed all sequences belonging to specific body sites or samples. We demonstrate that these representations are meaningful, and hence the embedding space can be exploited as a form of feature extraction for exploratory analysis. We show that sequence embeddings preserve relevant information about the sequencing data such as k-mer context, sequence taxonomy, and sample class. Specifically, the sequence embedding space resolved differences among phyla, as well as differences among genera within the same family. Distances between sequence embeddings had similar qualities to distances between alignment identities, and embedding multiple sequences can be thought of as generating a consensus sequence. In addition, embeddings are versatile features that can be used for many downstream tasks, such as taxonomic and sample classification. Using sample embeddings for body site classification resulted in negligible performance loss compared to using OTU abundance data, and clustering embeddings yielded high fidelity species clusters. Lastly, the k-mer embedding space captured distinct k-mer profiles that mapped to specific regions of the 16S rRNA gene and corresponded with particular body sites. Together, our results show that embedding sequences results in meaningful representations that can be used for exploratory analyses or for downstream machine learning applications that require numeric data. Moreover, because the embeddings are trained in an unsupervised manner, unlabeled data can be embedded and used to bolster supervised machine learning tasks.

]]>
<![CDATA[Secondary contact between diverged host lineages entails ecological speciation in a European hantavirus]]> https://www.researchpad.co/article/5c76fdefd5eed0c484e5b0f1

The diversity of viruses probably exceeds biodiversity of eukaryotes, but little is known about the origin and emergence of novel virus species. Experimentation and disease outbreak investigations have allowed the characterization of rapid molecular virus adaptation. However, the processes leading to the establishment of functionally distinct virus taxa in nature remain obscure. Here, we demonstrate that incipient speciation in a natural host species has generated distinct ecological niches leading to adaptive isolation in an RNA virus. We found a very strong association between the distributions of two major phylogenetic clades in Tula orthohantavirus (TULV) and the rodent host lineages in a natural hybrid zone of the European common vole (Microtus arvalis). The spatial transition between the virus clades in replicated geographic clines is at least eight times narrower than between the hybridizing host lineages. This suggests a strong barrier for effective virus transmission despite frequent dispersal and gene flow among local host populations, and translates to a complete turnover of the adaptive background of TULV within a few hundred meters in the open, unobstructed landscape. Genetic differences between TULV clades are homogenously distributed in the genomes and mostly synonymous (93.1%), except for a cluster of nonsynonymous changes in the 5′ region of the viral envelope glycoprotein gene, potentially involved in host-driven isolation. Evolutionary relationships between TULV clades indicate an emergence of these viruses through rapid differential adaptation to the previously diverged host lineages that resulted in levels of ecological isolation exceeding the progress of speciation in their vertebrate hosts.

]]>
<![CDATA[Pathogen diversity drives the evolution of generalist MHC-II alleles in human populations]]> https://www.researchpad.co/article/5c5ca310d5eed0c48441f094

Central players of the adaptive immune system are the groups of proteins encoded in the major histocompatibility complex (MHC), which shape the immune response against pathogens and tolerance to self-peptides. The corresponding genomic region is of particular interest, as it harbors more disease associations than any other region in the human genome, including associations with infectious diseases, autoimmune disorders, cancers, and neuropsychiatric diseases. Certain MHC molecules can bind to a much wider range of epitopes than others, but the functional implication of such an elevated epitope-binding repertoire has remained largely unclear. It has been suggested that by recognizing more peptide segments, such promiscuous MHC molecules promote immune response against a broader range of pathogens. If so, the geographical distribution of MHC promiscuity level should be shaped by pathogen diversity. Three lines of evidence support the hypothesis. First, we found that in pathogen-rich geographical regions, humans are more likely to carry highly promiscuous MHC class II DRB1 alleles. Second, the switch between specialist and generalist antigen presentation has occurred repeatedly and in a rapid manner during human evolution. Third, molecular positions that define promiscuity level of MHC class II molecules are especially diverse and are under positive selection in human populations. Taken together, our work indicates that pathogen load maintains generalist adaptive immune recognition, with implications for medical genetics and epidemiology.

]]>
<![CDATA[Identification and characterization of nonpolio enterovirus associated with nonpolio-acute flaccid paralysis in polio endemic state of Uttar Pradesh, Northern India]]> https://www.researchpad.co/article/5c5b5244d5eed0c4842bc5c5

Despite polio eradication, nonpolio enterovirus (NPEV) detection amid polio surveillance, which is considered to have implications in paralysis, requires attention. The attributes of NPEV infections in nonpolio-AFP (NPAFP) cases from Uttar Pradesh (UP), India, remain undetermined and are thus investigated. A total of 1839 stool samples collected from patients with acute flaccid paralysis (AFP) from UP, India, between January 2010 and October 2011 were analyzed as per the WHO algorithm. A total of 359 NPAFP cases yielded NPEVs, which were subjected to microneutralization assay, partial VP1 gene-based molecular serotyping and phylogenetic analysis. Demographic and clinical-epidemiological features were also ascertained. Echoviruses (29%) and Coxsackievirus (CV)-B (17%) were the most common viruses identified by the microneutralization assay. The molecular genotyping characterized the NPEVs into 34 different serotypes, corresponding to Enterovirus (EV)-A (1.6%), EV-B (94%) and EV-C (5.3%) species. The rarely described EV serotypes, such as EV-C95, CV-A20, EV-C105, EV-B75, EV-B101, and EV-B107, were also identified. NPEV-associated AFP was more prevalent in younger male children, peaked in the monsoon months and was predominantly found in the central part of the state. The NPEV strains isolated in the study exhibited genetic diversity from those isolated in other countries. These form part of a different cluster or subcluster existing in cocirculation, limited to India only. This study augments the understanding of epidemiological features and demonstrates the extensive diversity exhibited by the NPEV strains in NPAFP cases from the polio-endemic region. It also underscores the need or effective long-term strategies to monitor NPEV circulation and its associated health risks in the post-polio eradication era.

]]>
<![CDATA[Comprehensive profiling of translation initiation in influenza virus infected cells]]> https://www.researchpad.co/article/5c521870d5eed0c4847983a1

Translation can initiate at alternate, non-canonical start codons in response to stressful stimuli in mammalian cells. Recent studies suggest that viral infection and anti-viral responses alter sites of translation initiation, and in some cases, lead to production of novel immune epitopes. Here we systematically investigate the extent and impact of alternate translation initiation in cells infected with influenza virus. We perform evolutionary analyses that suggest selection against non-canonical initiation at CUG codons in influenza virus lineages that have adapted to mammalian hosts. We then use ribosome profiling with the initiation inhibitor lactimidomycin to experimentally delineate translation initiation sites in a human lung epithelial cell line infected with influenza virus. We identify several candidate sites of alternate initiation in influenza mRNAs, all of which occur at AUG codons that are downstream of canonical initiation codons. One of these candidate downstream start sites truncates 14 amino acids from the N-terminus of the N1 neuraminidase protein, resulting in loss of its cytoplasmic tail and a portion of the transmembrane domain. This truncated neuraminidase protein is expressed on the cell surface during influenza virus infection, is enzymatically active, and is conserved in most N1 viral lineages. We do not detect globally higher levels of alternate translation initiation on host transcripts upon influenza infection or during the anti-viral response, but the subset of host transcripts induced by the anti-viral response is enriched for alternate initiation sites. Together, our results systematically map the landscape of translation initiation during influenza virus infection, and shed light on the evolutionary forces shaping this landscape.

]]>
<![CDATA[Genome constellations of 24 porcine rotavirus group A strains circulating on commercial Thai swine farms between 2011 and 2016]]> https://www.researchpad.co/article/5c5217dbd5eed0c484794705

Rotavirus A (RVA) infection is a major cause of diarrhea-related illness in young children. RVA is also one of the most common enteric viruses detected on pig farms and contributes to substantial morbidity and mortality in piglets. Long-term multi-site surveillance of RVA on Thai swine farms to determine the diversity of RVA strains in circulation is currently lacking. In this study, we characterized the 11 segments of the RVA genome from 24 Thai porcine RVA strains circulating between 2011 and 2016. We identified G9 (15/24) and P[13] (12/24) as the dominant genotypes. The dominant G and P combinations were G9P[13] (n = 6), G9P[23] (n = 6), G3P[13] (n = 5), G9P[19] (n = 3), G4P[6] (n = 2), G4P[19] (n = 1), and G5P[13] (n = 1). Genome constellation of the Thai strains showed the predominance of Wa-like genotype (Gx-P[x]-I1/I5-R1-C1-M1-A8-N1-T1/T7-E1/E9-H1) with evidence of reassortment between the porcine and human RVA strains (e.g., G4-P[6]-I1-R1-C1-M1-A8-N1-T1-E1-H1 and G9-P[19]-I5-R1-C1-M1-A8-N1-T7-E9-H1). To assess the potential effectiveness of rotavirus vaccination, the Thai RVA strains were compared to the RVA strains represented in the swine rotavirus vaccine, which showed residue variations in the antigenic epitope on VP7 and shared amino acid identity below 90% for G4 and G5 strain. Several previous studies suggested these variations might effect on virus neutralization specificity and vaccine efficacy. Our study illustrates the importance of RVA surveillance beyond the G/P genotyping on commercial swine farms, which is crucial for controlling viral transmission.

]]>
<![CDATA[Genetic diversity of the enteroviruses detected from cerebrospinal fluid (CSF) samples of patients with suspected aseptic meningitis in northern West Bank, Palestine in 2017]]> https://www.researchpad.co/article/5c1813b8d5eed0c484775a65

Background

Human enterovirus genus showed a wide range of genetic diversity.

Objectives

To investigate the genetic diversity of the enteroviruses isolated in 2017 in northern West Bank, Palestine.

Study design

249 CSF samples from aseptic meningitis cases were investigated for HEV using two RT-PCR protocols targeting the 5’ NCR and the VP1 region of the HEV genome. The phylogenetic characterization of the sequenced VP1 region of Echovirus18 (E18) and Coxsackievirus B5 (CVB5) isolated in Palestine along with 27 E18 and 27 CVB5 sequences available from the Genbank were described.

Results

E18 and CVB5 account for 50% and 35% of the successfully HEV types, respectively. Phylogenetic tree of E18 and CVB5 showed three main clusters, with all Palestinian isolates uniquely clustering together with those from China and from different countries, respectively. Cluster I of E18, with 13 Palestinian and 6 Chinese isolates, showed the lowest haplotype-to-sequence ratio (0.6:1), haplotype diversity (Hd), nucleotide diversity (π), and number of segregating sites (S) compared to clusters II and III. Furthermore, cluster I showed negative Tajima’s D and Fu-Li’sF tests with statistically significant departure from neutrality (P<0.01). In both E18 and CVB5 populations, high haplotype diversity, but low genetic diversity was evident. Inter-population pairwise genetic distance (Fst) and gene flow (Nm) showed that the Palestinian E18 and CVB5 clusters were highly differentiated from the other clusters.

Conclusions

The study divulged close genetic relationship between Palestinian HEV strains as confirmed by population genetics and phylogenetic analyses.

]]>
<![CDATA[Guinea pig immunoglobulin VH and VL naïve repertoire analysis]]> https://www.researchpad.co/article/5c1c0af3d5eed0c484426f9b

The guinea pig has been used as a model to study various human infectious diseases because of its similarity to humans regarding symptoms and immune response, but little is known about the humoral immune response. To better understand the mechanism underlying the generation of the antibody repertoire in guinea pigs, we performed deep sequencing of full-length immunoglobulin variable chains from naïve B and plasma cells. We gathered and analyzed nearly 16,000 full-length VH, Vκ and Vλ genes and analyzed V and J gene segment usage profiles and mutation statuses by annotating recently reported genome data of guinea pig immunoglobulin genes. We found that approximately 70% of heavy, 73% of kappa and 81% of lambda functional germline V gene segments are integrated into the actual V(D)J recombination events. We also found preferential use of a particular V gene segment and accumulated mutation in CDRs 1 and 2 in antigen-specific plasma cells. Our study represents the first attempt to characterize sequence diversity in the expressed guinea pig antibody repertoire and provides significant insight into antibody repertoire generation and Ig-based immunity of guinea pigs.

]]>
<![CDATA[Synthetic STARR-seq reveals how DNA shape and sequence modulate transcriptional output and noise]]> https://www.researchpad.co/article/5c256c83d5eed0c484474f5a

The binding of transcription factors to short recognition sequences plays a pivotal role in controlling the expression of genes. The sequence and shape characteristics of binding sites influence DNA binding specificity and have also been implicated in modulating the activity of transcription factors downstream of binding. To quantitatively assess the transcriptional activity of tens of thousands of designed synthetic sites in parallel, we developed a synthetic version of STARR-seq (synSTARR-seq). We used the approach to systematically analyze how variations in the recognition sequence of the glucocorticoid receptor (GR) affect transcriptional regulation. Our approach resulted in the identification of a novel highly active functional GR binding sequence and revealed that sequence variation both within and flanking GR’s core binding site can modulate GR activity without apparent changes in DNA binding affinity. Notably, we found that the sequence composition of variants with similar activity profiles was highly diverse. In contrast, groups of variants with similar activity profiles showed specific DNA shape characteristics indicating that DNA shape may be a better predictor of activity than DNA sequence. Finally, using single cell experiments with individual enhancer variants, we obtained clues indicating that the architecture of the response element can independently tune expression mean and cell-to cell variability in gene expression (noise). Together, our studies establish synSTARR as a powerful method to systematically study how DNA sequence and shape modulate transcriptional output and noise.

]]>
<![CDATA[G-quadruplexes formation in the 5’UTRs of mRNAs associated with colorectal cancer pathways]]> https://www.researchpad.co/article/5c0ed76ad5eed0c484f140d7

RNA G-quadruplexes (rG4) are stable non-canonical secondary structures composed of G-rich sequences. Many rG4 structures located in the 5’UTRs of mRNAs act as translation repressors due to their high stability which is thought to impede ribosomal scanning. That said, it is not known if these are mRNA-specific examples, or if they are indicative of a global expression regulation mechanism of the mRNAs involved in a common pathway based on structure folding recognition. Gene-ontology analysis of mRNAs bearing a predicted rG4 motif in their 5’UTRs revealed an enrichment for mRNAs associated with the colorectal cancer pathway. Bioinformatic tools for rG4 prediction, and experimental in vitro validations were used to confirm and compare the folding of the predicted rG4s of the mRNAs associated with dysregulated pathways in colorectal cancer. The rG4 folding was confirmed for the first time for 9 mRNAs. A repressive effect of 3 rG4 candidates on the expression of a reporter gene was also measured in colorectal cancer cell lines. This work highlights the fact that rG4 prediction is not yet accurate, and that experimental characterization is still essential in order to identify the precise rG4 folding sequences and the possible common features shared between the rG4 overrepresented in important biological pathways.

]]>
<![CDATA[Multi locus sequence typing of Burkholderia pseudomallei isolates from India unveils molecular diversity and confers regional association in Southeast Asia]]> https://www.researchpad.co/article/5b49f0ae463d7e3adec7b97a

Objectives

Burkholderia pseudomallei, the causative agent for melioidosis, has become a public health problem in India and across the world. Melioidosis can be difficult to diagnose because of the inconsistent clinical presentations of the disease. This study aims to determine the genetic diversity among the clinical isolates of B. pseudomaelli from India in order to establish a molecular epidemiology and elucidate the Southeast Asian association.

Methods

Molecular typing using multi locus sequence typing was performed on thirty one archived B. pseudomallei clinical isolates, previously characterised from specimens obtained from patients admitted to the Christian Medical College & Hospital, Vellore from 2015 to 2016. Further investigations into the genetic heterogeneity and evolution at a regional and global level were performed using insilico tools.

Results

Multi locus sequence typing (MLST) of the isolates from systemic and localized forms of melioidosis, including blood, pus, tissue, and urine specimens, revealed twenty isolates with novel sequence types and eleven with previously reported sequence types. High genetic diversity was observed using MLST with a strong association within the Southeast Asian region.

Conclusions

Molecular typing of B. pseudomallei clinical isolates using MLST revealed high genetic diversity and provided a baseline molecular epidemiology of the disease in India with a strong Southeast Asian association of the strains. Future studies should focus on whole genome based Single-Nucleotide-Polymorphism (SNP) which has the advantage of a high discriminatory power, to further understand the novel sequence types reported in this study.

]]>
<![CDATA[First Description of Hepacivirus and Pegivirus Infection in Domestic Horses in China: A Study in Guangdong Province, Heilongjiang Province and Hong Kong District]]> https://www.researchpad.co/article/5989daa7ab0ee8fa60ba7d02

Since 2012, three viruses, known as equine hepacivirus (EqHV), equine pegivirus (EPgV) and Theiler’s disease-associated virus (TDAV), have been discovered in equines. Given that these viruses are the newest members of the Flaviviridae family, genomic information concerning circulating EqHV, EPgV and TDAV strains around the world is limited. To date, no genetic surveillance studies have been performed on these three viruses in the equine population of China. Here, a total of 177 serum samples were collected from equines across China between 2014 and 2015. Using PCR, we detected viral RNA in the serum samples, six of which were EqHV positive and two of which were EPgV positive. Co-infection with the two viruses was not observed among the Chinese equines studied, and TDAV RNA was not detected in the equine serum samples collected for this study. Phylogenetic analysis of partial NS5B open reading frame (ORF), NS3 ORF, and 5’ untranslated region nucleotide sequences from EqHV as well as partial NS3 ORF sequence from EPgV indicated that EqHV and EPgV have evolved into two main clades by themselves, both of which are circulating in China. Based on the partial NS5B and NS3 ORF sequences of EqHV, the sequences of one clade were also split into two subclades. This study enriches our knowledge of the geographic distribution of these three equine viruses.

]]>
<![CDATA[Characterization of MazF-Mediated Sequence-Specific RNA Cleavage in Pseudomonas putida Using Massive Parallel Sequencing]]> https://www.researchpad.co/article/5989da31ab0ee8fa60b849d2

Under environmental stress, microbes are known to alter their translation patterns using sequence-specific endoribonucleases that we call RNA interferases. However, there has been limited insight regarding which RNAs are specifically cleaved by these RNA interferases, hence their physiological functions remain unknown. In the current study, we developed a novel method to effectively identify cleavage specificities with massive parallel sequencing. This approach uses artificially designed RNAs composed of diverse sequences, which do not form extensive secondary structures, and it correctly identified the cleavage sequence of a well-characterized Escherichia coli RNA interferase, MazF, as ACA. In addition, we also determined that an uncharacterized MazF homologue isolated from Pseudomonas putida specifically recognizes the unique triplet, UAC. Using a real-time fluorescence resonance energy transfer assay, the UAC triplet was further proved to be essential for cleavage in P. putida MazF. These results highlight an effective method to determine cleavage specificity of RNA interferases.

]]>
<![CDATA[Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features]]> https://www.researchpad.co/article/5989da97ab0ee8fa60ba2643

Background

Piwi-interacting RNA (piRNA) is the largest class of small non-coding RNA molecules. The transposon-derived piRNA prediction can enrich the research contents of small ncRNAs as well as help to further understand generation mechanism of gamete.

Methods

In this paper, we attempt to differentiate transposon-derived piRNAs from non-piRNAs based on their sequential and physicochemical features by using machine learning methods. We explore six sequence-derived features, i.e. spectrum profile, mismatch profile, subsequence profile, position-specific scoring matrix, pseudo dinucleotide composition and local structure-sequence triplet elements, and systematically evaluate their performances for transposon-derived piRNA prediction. Finally, we consider two approaches: direct combination and ensemble learning to integrate useful features and achieve high-accuracy prediction models.

Results

We construct three datasets, covering three species: Human, Mouse and Drosophila, and evaluate the performances of prediction models by 10-fold cross validation. In the computational experiments, direct combination models achieve AUC of 0.917, 0.922 and 0.992 on Human, Mouse and Drosophila, respectively; ensemble learning models achieve AUC of 0.922, 0.926 and 0.994 on the three datasets.

Conclusions

Compared with other state-of-the-art methods, our methods can lead to better performances. In conclusion, the proposed methods are promising for the transposon-derived piRNA prediction. The source codes and datasets are available in S1 File.

]]>
<![CDATA[Loss of genes related to Nucleotide Excision Repair (NER) and implications for reductive genome evolution in symbionts of deep-sea vesicomyid clams]]> https://www.researchpad.co/article/5989db52ab0ee8fa60bdc68e

Intracellular thioautotrophic symbionts of deep-sea vesicomyid clams lack some DNA repair genes and are thought to be undergoing reductive genome evolution (RGE). In this study, we addressed two questions, 1) how these symbionts lost their DNA repair genes and 2) how such losses affect RGE. For the first question, we examined genes associated with nucleotide excision repair (NER; uvrA, uvrB, uvrC, uvrD, uvrD paralog [uvrDp] and mfd) in 12 symbionts of vesicomyid clams belonging to two clades (5 clade I and 7 clade II symbionts). While uvrA, uvrDp and mfd were conserved in all symbionts, uvrB and uvrC were degraded in all clade I symbionts but were apparently intact in clade II symbionts. UvrD was disrupted in two clade II symbionts. Among the intact genes in Ca. Vesicomyosocius okutanii (clade I), expressions of uvrD and mfd were detected by reverse transcription-polymerase chain reaction (RT-PCR), but those of uvrA and uvrDp were not. In contrast, all intact genes were expressed in the symbiont of Calyptogena pacifica (clade II). To assess how gene losses affect RGE (question 2), genetic distances of the examined genes in symbionts from Bathymodiolus septemdierum were shown to be larger in clade I than clade II symbionts. In addition, these genes had lower guanine+cytosine (GC) content and higher repeat sequence densities in clade I than measured in clade II. Our results suggest that NER genes are currently being lost from the extant lineages of vesicomyid clam symbionts. The loss of NER genes and mutY in these symbionts is likely to promote increases in genetic distance and repeat sequence density as well as reduced GC content in genomic genes, and may have facilitated reductive evolution of the genome.

]]>