ResearchPad - sequence-databases https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Within-patient plasmid dynamics in <i>Klebsiella pneumoniae</i> during an outbreak of a carbapenemase-producing <i>Klebsiella pneumoniae</i>]]> https://www.researchpad.co/article/elastic_article_15752 Knowledge of within-patient dynamics of resistance plasmids during outbreaks is important for understanding the persistence and transmission of plasmid-mediated antimicrobial resistance. During an outbreak of a Klebsiella pneumoniae carbapenemase-producing (KPC) K. pneumoniae, the plasmid and chromosomal dynamics of K. pneumoniae within-patients were investigated.MethodsDuring the outbreak, all K. pneumoniae isolates of colonized or infected patients were collected, regardless of their susceptibility pattern. A selection of isolates was short-read and long-read sequenced. A hybrid assembly of the short-and long-read sequence data was performed. Plasmid contigs were extracted from the hybrid assembly, annotated, and within patient plasmid comparisons were performed.ResultsFifteen K. pneumoniae isolates of six patients were short-read whole-genome sequenced. Whole-genome multi-locus sequence typing revealed a maximum of 4 allele differences between the sequenced isolates. Within patients 1 and 2 the resistance gene- and plasmid replicon-content did differ between the isolates sequenced. Long-read sequencing and hybrid assembly of 4 isolates revealed loss of the entire KPC-gene containing plasmid in the isolates of patient 2 and a recombination event between the plasmids in the isolates of patient 1. This resulted in two different KPC-gene containing plasmids being simultaneously present during the outbreak.ConclusionDuring a hospital outbreak of a KPC-producing K. pneumoniae isolate, plasmid loss of the KPC-gene carrying plasmid and plasmid recombination was detected within the isolates from two patients. When investigating outbreaks, one should be aware that plasmid transmission can occur and the possibility of within- and between-patient plasmid variation needs to be considered. ]]> <![CDATA[iterb-PPse: Identification of transcriptional terminators in bacterial by incorporating nucleotide properties into PseKNC]]> https://www.researchpad.co/article/elastic_article_14750 Terminator is a DNA sequence that gives the RNA polymerase the transcriptional termination signal. Identifying terminators correctly can optimize the genome annotation, more importantly, it has considerable application value in disease diagnosis and therapies. However, accurate prediction methods are deficient and in urgent need. Therefore, we proposed a prediction method “iterb-PPse” for terminators by incorporating 47 nucleotide properties into PseKNC-Ⅰ and PseKNC-Ⅱ and utilizing Extreme Gradient Boosting to predict terminators based on Escherichia coli and Bacillus subtilis. Combing with the preceding methods, we employed three new feature extraction methods K-pwm, Base-content, Nucleotidepro to formulate raw samples. The two-step method was applied to select features. When identifying terminators based on optimized features, we compared five single models as well as 16 ensemble models. As a result, the accuracy of our method on benchmark dataset achieved 99.88%, higher than the existing state-of-the-art predictor iTerm-PseKNC in 100 times five-fold cross-validation test. Its prediction accuracy for two independent datasets reached 94.24% and 99.45% respectively. For the convenience of users, we developed a software on the basis of “iterb-PPse” with the same name. The open software and source code of “iterb-PPse” are available at https://github.com/Sarahyouzi/iterb-PPse.

]]>
<![CDATA[Are pangolins the intermediate host of the 2019 novel coronavirus (SARS-CoV-2)?]]> https://www.researchpad.co/article/elastic_article_14545 Recently, a novel coronavirus, SARS-CoV-2, caused a still ongoing pandemic. Epidemiological study suggested this virus was associated with a wet market in Wuhan, China. However, the exact source of this virus is still unknown. In this study, we attempted to assemble the complete genome of a coronavirus identified from two groups of sick Malayan pangolins, which were likely to be smuggled for black market trade. The molecular and evolutionary analyses showed that this pangolin coronavirus we assembled was genetically associated with the SARS-CoV-2 but was not likely its precursor. This study suggested that pangolins are natural hosts of coronaviruses. Determining the spectrum of coronaviruses in pangolins can help understand the natural history of coronaviruses in wildlife and at the animal-human interface, and facilitate the prevention and control of coronavirus-associated emerging diseases.

]]>
<![CDATA[Evidence of recombination of vaccine strains of lumpy skin disease virus with field strains, causing disease]]> https://www.researchpad.co/article/elastic_article_14489 Vaccination against lumpy skin disease (LSD) is crucial for maintaining the health of animals and the economic sustainability of farming. Either homologous vaccines consisting of live attenuated LSD virus (LSDV) or heterologous vaccines consisting of live attenuated sheeppox or goatpox virus (SPPV/GPPV) can be used for control of LSDV. Although SPPV/GTPV-based vaccines exhibit slightly lower efficacy than live attenuated LSDV vaccines, they do not cause vaccine-induced viremia, fever, and clinical symptoms of the disease following vaccination, caused by the replication capacity of live attenuated LSDVs. Recombination of capripoxviruses in the field was a long-standing hypothesis until a naturally occurring recombinant LSDV vaccine isolate was detected in Russia, where the sheeppox vaccine alone is used. This occurred after the initiation of vaccination campaigns using LSDV vaccines in the neighboring countries in 2017, when the first cases of presumed vaccine-like isolate circulation were documented with concurrent detection of a recombinant vaccine isolate in the field. The follow-up findings presented herein show that during the period from 2015 to 2018, the molecular epidemiology of LSDV in Russia split into two independent waves. The 2015–2016 epidemic was attributable to the field isolate. Whereas the 2017 epidemic and, in particular, the 2018 epidemic represented novel disease importations that were not genetically linked to the 2015–2016 field-type incursions. This demonstrated a new emergence rather than the continuation of the field-type epidemic. Since recombinant vaccine-like LSDV isolates appear to have entrenched across the country’s border, the policy of using certain live vaccines requires revision in the context of the biosafety threat it presents.

]]>
<![CDATA[Isolation of a novel species in the genus <i>Cupriavidus</i> from a patient with sepsis using whole genome sequencing]]> https://www.researchpad.co/article/elastic_article_14469 Whole genome sequencing (WGS) has become an accessible tool in clinical microbiology, and it allowed us to identify a novel Cupriavidus species. We isolated Gram-negative bacillus from the blood of an immunocompromised patient, and phenotypical and molecular identifications were performed. Phenotypic identification discrepancies were noted between the Vitek 2 (bioMérieux, Marcy-l’Étoile, France) and Vitek MS systems (bioMérieux). Using 16S rRNA gene sequencing, it was impossible to identify the pathogen to the species levels. WGS was performed using the Illumina MiSeq platform (Illumina, San Diego, CA), and genomic sequence database searching with a TrueBacTM ID-Genome system (ChunLab, Inc., Seoul, Republic of Korea) showed no strains with average nucleotide identity values higher than 95.0%, which is the cut-off for species-level identification. Phylogenetic analysis indicated that the bacteria was a new Cupriavidus species that formed a subcluster with Cupriavidus gilardii. WGS holds great promise for accurate molecular identification beyond 16S rRNA gene sequencing in clinical microbiology.

]]>
<![CDATA[Specific clones of Trichomonas tenax are associated with periodontitis]]> https://www.researchpad.co/article/5c900d3bd5eed0c48407e3b6

Trichomonas tenax, an anaerobic protist difficult to cultivate with an unreliable molecular identification, has been suspected of involvement in periodontitis, a multifactorial inflammatory dental disease affecting the soft tissue and bone of periodontium. A cohort of 106 periodontitis patients classified by stages of severity and 85 healthy adult control patients was constituted. An efficient culture protocol, a new identification tool by real-time qPCR of T. tenax and a Multi-Locus Sequence Typing system (MLST) based on T. tenax NIH4 reference strain were created. Fifty-three strains of Trichomonas sp. were obtained from periodontal samples. 37/106 (34.90%) T. tenax from patients with periodontitis and 16/85 (18.80%°) T. tenax from control patients were detected by culture (p = 0.018). Sixty of the 191 samples were tested positive for T. tenax by qPCR, 24/85 (28%) controls and 36/106 (34%) periodontitis patients (p = 0.089). By combining both results, 45/106 (42.5%) patients were positive by culture and/or PCR, as compared to 24/85 (28.2%) controls (p = 0.042). A link was established between the carriage in patients of Trichomonas tenax and the severity of the disease. Genotyping demonstrates the presence of strain diversity with three major different clusters and a relation between disease strains and the periodontitis severity (p<0.05). More frequently detected in periodontal cases, T. tenax is likely to be related to the onset or/and evolution of periodontal diseases.

]]>
<![CDATA[Identification of a novel archaea virus, detected in hydrocarbon polluted Hungarian and Canadian samples]]> https://www.researchpad.co/article/N5489318a-3499-4862-9afc-2378cea7eecb

Metagenomics is a helpful tool for the analysis of unculturable organisms and viruses. Viruses that target bacteria and archaea play important roles in the microbial diversity of various ecosystems. Here we show that Methanosarcina virus MV (MetMV), the second Methanosarcina sp. virus with a completely determined genome, is characteristic of hydrocarbon pollution in environmental (soil and water) samples. It was highly abundant in Hungarian hydrocarbon polluted samples and its genome was also present in the NCBI SRA database containing reads from hydrocarbon polluted samples collected in Canada, indicating the stability of its niche and the marker feature of this virus. MetMV, as the only currently identified marker virus for pollution in environmental samples, could contribute to the understanding of the complicated network of prokaryotes and their viruses driving the decomposition of environmental pollutants.

]]>
<![CDATA[Detection of novel coronaviruses in bats in Myanmar]]> https://www.researchpad.co/article/N3669ab46-787e-4c30-a451-397d479219b9

The recent emergence of bat-borne zoonotic viruses warrants vigilant surveillance in their natural hosts. Of particular concern is the family of coronaviruses, which includes the causative agents of severe acute respiratory syndrome (SARS), Middle East respiratory syndrome (MERS), and most recently, Coronavirus Disease 2019 (COVID-19), an epidemic of acute respiratory illness originating from Wuhan, China in December 2019. Viral detection, discovery, and surveillance activities were undertaken in Myanmar to identify viruses in animals at high risk contact interfaces with people. Free-ranging bats were captured, and rectal and oral swabs and guano samples collected for coronaviral screening using broadly reactive consensus conventional polymerase chain reaction. Sequences from positives were compared to known coronaviruses. Three novel alphacoronaviruses, three novel betacoronaviruses, and one known alphacoronavirus previously identified in other southeast Asian countries were detected for the first time in bats in Myanmar. Ongoing land use change remains a prominent driver of zoonotic disease emergence in Myanmar, bringing humans into ever closer contact with wildlife, and justifying continued surveillance and vigilance at broad scales.

]]>
<![CDATA[RNAmountAlign: Efficient software for local, global, semiglobal pairwise and multiple RNA sequence/structure alignment]]> https://www.researchpad.co/article/N67fc2065-7e6a-4783-aab9-eb74d3ac0a95

Alignment of structural RNAs is an important problem with a wide range of applications. Since function is often determined by molecular structure, RNA alignment programs should take into account both sequence and base-pairing information for structural homology identification. This paper describes C++ software, RNAmountAlign, for RNA sequence/structure alignment that runs in O(n3) time and O(n2) space for two sequences of length n; moreover, our software returns a p-value (transformable to expect value E) based on Karlin-Altschul statistics for local alignment, as well as parameter fitting for local and global alignment. Using incremental mountain height, a representation of structural information computable in cubic time, RNAmountAlign implements quadratic time pairwise local, global and global/semiglobal (query search) alignment using a weighted combination of sequence and structural similarity. RNAmountAlign is capable of performing progressive multiple alignment as well. Benchmarking of RNAmountAlign against LocARNA, LARA, FOLDALIGN, DYNALIGN, STRAL, MXSCARNA, and MUSCLE shows that RNAmountAlign has reasonably good accuracy and faster run time supporting all alignment types. Additionally, our extension of RNAmountAlign, called RNAmountAlignScan, which scans a target genome sequence to find hits having high sequence and structural similarity to a given query sequence, outperforms RSEARCH and sequence-only query scans and runs faster than FOLDALIGN query scan.

]]>
<![CDATA[MLST-based genetic relatedness of Campylobacter jejuni isolated from chickens and humans in Poland]]> https://www.researchpad.co/article/N2eb0d267-f054-40f4-b445-0c8d9725ee43

Campylobacter jejuni infection is one of the most frequently reported foodborne bacterial diseases worldwide. The main transmission route of these microorganisms to humans is consumption of contaminated food, especially of chicken origin. The aim of this study was to analyze the genetic relatedness of C. jejuni from chicken sources (feces, carcasses, and meat) and from humans with diarrhea as well as to subtype the isolates to gain better insight into their population structure present in Poland. C. jejuni were genotyped using multilocus sequence typing (MLST) and sequence types (STs) were assigned in the MLST database. Among 602 isolates tested, a total of 121 different STs, including 70 (57.9%) unique to the isolates' origin, and 32 STs that were not present in the MLST database were identified. The most prevalent STs were ST464 and ST257, with 58 (9.6%) and 52 (8.6%) C. jejuni isolates, respectively. Isolates with some STs (464, 6411, 257, 50) were shown to be common in chickens, whereas others (e.g. ST21 and ST572) were more often identified among human C. jejuni. It was shown that of 47 human sequence types, 26 STs (106 isolates), 23 STs (102 isolates), and 29 STs (100 isolates) were also identified in chicken feces, meat, and carcasses, respectively. These results, together with the high and similar proportional similarity indexes (PSI) calculated for C. jejuni isolated from patients and chickens, may suggest that human campylobacteriosis was associated with contaminated chicken meat or meat products or other kinds of food cross-contaminated with campylobacters of chicken origin. The frequency of various sequence types identified in the present study generally reflects of the prevalence of STs in other countries which may suggest that C. jejuni with some STs have a global distribution, while other genotypes may be more restricted to certain countries.

]]>
<![CDATA[All of gene expression (AOE): An integrated index for public gene expression databases]]> https://www.researchpad.co/article/N65b3f432-723a-4d59-a70d-2c0d696b62b7

Gene expression data have been archived as microarray and RNA-seq datasets in two public databases, Gene Expression Omnibus (GEO) and ArrayExpress (AE). In 2018, the DNA DataBank of Japan started a similar repository called the Genomic Expression Archive (GEA). These databases are useful resources for the functional interpretation of genes, but have been separately maintained and may lack RNA-seq data, while the original sequence data are available in the Sequence Read Archive (SRA). We constructed an index for those gene expression data repositories, called All Of gene Expression (AOE), to integrate publicly available gene expression data. The web interface of AOE can graphically query data in addition to the application programming interface. By collecting gene expression data from RNA-seq in the SRA, AOE also includes data not included in GEO and AE. AOE is accessible as a search tool from the GEA website and is freely available at https://aoe.dbcls.jp/.

]]>
<![CDATA[The genetic diversity and population structure of Sophora alopecuroides (Faboideae) as determined by microsatellite markers developed from transcriptome]]> https://www.researchpad.co/article/N8ed88142-6689-430c-b82a-b033b4ff58ac

Sophora alopecuroides (Faboideae) is an endemic species, mainly distributed in northwest China. However, the limited molecular markers range for this species hinders breeding and genetic studies. A total of 20,324 simple sequence repeat (SSR) markers were identified from 118,197 assembled transcripts and 18 highly polymorphic SSR markers were used to explore the genetic diversity and population structure of S. alopecuroides from 23 different geographical populations. A relatively low genetic diversity was found in S. alopecuroides based on mean values of the number of effective alleles (Ne = 1.81), expected heterozygosity (He = 0.39) and observed heterozygosity (Ho = 0.55). The results of AMOVA indicated higher levels of variation within populations than between populations. Bayesian-based cluster analysis, principal coordinates analysis and Neighbor-Joining phylogeny analysis roughly divided all genotypes into four major groups with some admixtures. Meanwhile, geographic barriers would have restricted gene flow between the northern and southern regions (separated by Tianshan Mountains), wherein the two relatively ancestral and independent clusters of S. alopecuroides occur. History trade and migration along the Silk Road would together have promoted the spread of S. alopecuroides from the western to the eastern regions of the northwest plateau in China, resulting in the current genetic diversity and population structure. The transcriptomic SSR markers provide a valuable resource for understanding the genetic diversity and population structure of S. alopecuroides, and will assist effective conservation management.

]]>
<![CDATA[Profile of the tprK gene in primary syphilis patients based on next-generation sequencing]]> https://www.researchpad.co/article/5c784fecd5eed0c484007915

Background

The highly variable tprK gene of Treponema pallidum has been acknowledged to be one of the mechanisms that causes persistent infection. Previous studies have mainly focused on the heterogeneity in tprK in propagated strains using a clone-based Sanger approach. Few studies have investigated tprK directly from clinical samples using deep sequencing.

Methods/Principal findings

We conducted a comprehensive analysis of 14 primary syphilis clinical isolates of T. pallidum via next-generation sequencing to gain better insight into the profile of tprK in primary syphilis patients. Our results showed that there was a mixture of distinct sequences within each V region of tprK. Except for the predominant sequence for each V region as previously reported using the clone-based Sanger approach, there were many minor variants of all strains that were mainly observed at a frequency of 1–5%. Interestingly, the identified distinct sequences within the regions were variable in length and differed by only 3 bp or multiples of 3 bp. In addition, amino acid sequence consistency within each V region was found among the 14 strains. Among the regions, the sequence IASDGGAIKH in V1 and the sequence DVGHKKENAANVNGTVGA in V4 showed a high stability of inter-strain redundancy.

Conclusions

The seven V regions of the tprK gene in primary syphilis infection demonstrated high diversity; they generally contained a high proportion sequence and numerous low-frequency minor variants, most of which are far below the detection limit of Sanger sequencing. The rampant variation in each V region was regulated by a strict gene conversion mechanism that maintained the length difference to 3 bp or multiples of 3 bp. The highly stable sequence of inter-strain redundancy may indicate that the sequences play a critical role in T. pallidum virulence. These highly stable peptides are also likely to be potential targets for vaccine development.

]]>
<![CDATA[Genome-wide analysis, expansion and expression of the NAC family under drought and heat stresses in bread wheat (T. aestivum L.)]]> https://www.researchpad.co/article/5c897798d5eed0c4847d30f2

The NAC family is one of the largest plant-specific transcription factor families, and some of its members are known to play major roles in plant development and response to biotic and abiotic stresses. Here, we inventoried 488 NAC members in bread wheat (Triticum aestivum). Using the recent release of the wheat genome (IWGS RefSeq v1.0), we studied duplication events focusing on genomic regions from 4B-4D-5A chromosomes as an example of the family expansion and neofunctionalization of TaNAC members. Differentially expressed TaNAC genes in organs and in response to abiotic stresses were identified using publicly available RNAseq data. Expression profiling of 23 selected candidate TaNAC genes was studied in leaf and grain from two bread wheat genotypes at two developmental stages in field drought conditions and revealed insights into their specific and/or overlapping expression patterns. This study showed that, of the 23 TaNAC genes, seven have a leaf-specific expression and five have a grain-specific expression. In addition, the grain-specific genes profiles in response to drought depend on the genotype. These genes may be considered as potential candidates for further functional validation and could present an interest for crop improvement programs in response to climate change. Globally, the present study provides new insights into evolution, divergence and functional analysis of NAC gene family in bread wheat.

]]>
<![CDATA[PhyloPi: An affordable, purpose built phylogenetic pipeline for the HIV drug resistance testing facility]]> https://www.researchpad.co/article/5c8823b3d5eed0c484638e7d

Introduction

Phylogenetic analysis plays a crucial role in quality control in the HIV drug resistance testing laboratory. If previous patient sequence data is available sample swaps can be detected and investigated. As Antiretroviral treatment coverage is increasing in many developing countries, so is the need for HIV drug resistance testing. In countries with multiple languages, transcription errors are easily made with patient identifiers. Here a self-contained blastn integrated phylogenetic pipeline can be especially useful. Even though our pipeline can run on any unix based system, a Raspberry Pi 3 is used here as a very affordable and integrated solution.

Performance benchmarks

The computational capability of this single board computer is demonstrated as well as the utility thereof in the HIV drug resistance laboratory. Benchmarking analysis against a large public database shows excellent time performance with minimal user intervention. This pipeline also contains utilities to find previous sequences as well as phylogenetic analysis and a graphical sequence mapping utility against the pol area of the HIV HXB2 reference genome. Sequence data from the Los Alamos HIV database was analyzed for inter- and intra-patient diversity and logistic regression was conducted on the calculated genetic distances. These findings show that allowable clustering and genetic distance between viral sequences from different patients is very dependent on subtype as well as the area of the viral genome being analyzed.

Availability

The Raspberry Pi image for PhyloPi, source code of the pipeline, sequence data, bash-, python- and R-scripts for the logistic regression, benchmarking as well as helper scripts are available at http://scholar.ufs.ac.za:8080/xmlui/handle/11660/7638 and https://github.com/ArmandBester/phylopi. The PhyloPi image and the source code are published under the GPLv3 license. A demo version of the PhyloPi pipeline is available at http://phylopi.hpc.ufs.ac.za/.

]]>
<![CDATA[Assessing the role of transmission chains in the spread of HIV-1 among men who have sex with men in Quebec, Canada]]> https://www.researchpad.co/article/5c89773dd5eed0c4847d27bf

Background

Phylogenetics has been used to investigate HIV transmission among men who have sex with men. This study compares several methodologies to elucidate the role of transmission chains in the dynamics of HIV spread in Quebec, Canada.

Methods

The Quebec Human Immunodeficiency Virus (HIV) genotyping program database now includes viral sequences from close to 4,000 HIV-positive individuals classified as Men who have Sex with Men (MSMs), collected between 1996 and early 2016. Assessment of chain expansion may depend on the partitioning scheme used, and so, we produce estimates from several methods: the conventional Bayesian and maximum likelihood-bootstrap methods, in combination with a variety of schemes for applying a maximum distance criterion, and two other algorithms, DM-PhyClus, a Bayesian algorithm that produces a measure of uncertainty for proposed partitions, and the Gap Procedure, a fast non-phylogenetic approach. Sequences obtained from individuals in the Primary HIV Infection (PHI) stage serve to identify incident cases. We focus on the period ranging from January 1st 2012 to February 1st 2016.

Results and conclusion

The analyses reveal considerable overlap between chain estimates obtained from conventional methods, thus leading to similar estimates of recent temporal expansion. The Gap Procedure and DM-PhyClus suggest however moderately different chains. Nevertheless, all estimates stress that longer older chains are responsible for a sizeable proportion of the sampled incident cases among MSMs. Curbing the HIV epidemic will require strategies aimed specifically at preventing such growth.

]]>
<![CDATA[16S rRNA sequence embeddings: Meaningful numeric feature representations of nucleotide sequences that are convenient for downstream analyses]]> https://www.researchpad.co/article/5c7ee7c5d5eed0c4848f4d9c

Advances in high-throughput sequencing have increased the availability of microbiome sequencing data that can be exploited to characterize microbiome community structure in situ. We explore using word and sentence embedding approaches for nucleotide sequences since they may be a suitable numerical representation for downstream machine learning applications (especially deep learning). This work involves first encoding (“embedding”) each sequence into a dense, low-dimensional, numeric vector space. Here, we use Skip-Gram word2vec to embed k-mers, obtained from 16S rRNA amplicon surveys, and then leverage an existing sentence embedding technique to embed all sequences belonging to specific body sites or samples. We demonstrate that these representations are meaningful, and hence the embedding space can be exploited as a form of feature extraction for exploratory analysis. We show that sequence embeddings preserve relevant information about the sequencing data such as k-mer context, sequence taxonomy, and sample class. Specifically, the sequence embedding space resolved differences among phyla, as well as differences among genera within the same family. Distances between sequence embeddings had similar qualities to distances between alignment identities, and embedding multiple sequences can be thought of as generating a consensus sequence. In addition, embeddings are versatile features that can be used for many downstream tasks, such as taxonomic and sample classification. Using sample embeddings for body site classification resulted in negligible performance loss compared to using OTU abundance data, and clustering embeddings yielded high fidelity species clusters. Lastly, the k-mer embedding space captured distinct k-mer profiles that mapped to specific regions of the 16S rRNA gene and corresponded with particular body sites. Together, our results show that embedding sequences results in meaningful representations that can be used for exploratory analyses or for downstream machine learning applications that require numeric data. Moreover, because the embeddings are trained in an unsupervised manner, unlabeled data can be embedded and used to bolster supervised machine learning tasks.

]]>
<![CDATA[A precedented nuclear genetic code with all three termination codons reassigned as sense codons in the syndinean Amoebophrya sp. ex Karlodinium veneficum]]> https://www.researchpad.co/article/5c818e8fd5eed0c484cc2557

Amoebophrya is part of an enigmatic, diverse, and ubiquitous marine alveolate lineage known almost entirely from anonymous environmental sequencing. Two cultured Amoebophrya strains grown on core dinoflagellate hosts were used for transcriptome sequencing. BLASTx using different genetic codes suggests that Amoebophyra sp. ex Karlodinium veneficum uses the three typical stop codons (UAA, UAG, and UGA) to encode amino acids. When UAA and UAG are translated as glutamine about half of the alignments have better BLASTx scores, and when UGA is translated as tryptophan one fifth have better scores. However, the sole stop codon appears to be UGA based on conserved genes, suggesting contingent translation of UGA. Neither host sequences, nor sequences from the second strain, Amoebophrya sp. ex Akashiwo sanguinea had similar results in BLASTx searches. A genome survey of Amoebophyra sp. ex K. veneficum showed no evidence for transcript editing aside from mitochondrial transcripts. The dynein heavy chain (DHC) gene family was surveyed and of 14 transcripts only two did not use UAA, UAG, or UGA in a coding context. Overall the transcriptome displayed strong bias for A or U in third codon positions, while the tRNA genome survey showed bias against codons ending in U, particularly for amino acids with two codons ending in either C or U. Together these clues suggest contingent translation mechanisms in Amoebophyra sp. ex K. veneficum and a phylogenetically distinct instance of genetic code modification.

]]>
<![CDATA[Identification of French Guiana sand flies using MALDI-TOF mass spectrometry with a new mass spectra library]]> https://www.researchpad.co/article/5c5df366d5eed0c48458120f

Phlebotomine sand flies are insects that are highly relevant in medicine, particularly as the sole proven vectors of leishmaniasis. Accurate identification of sand fly species is an essential prerequisite for eco-epidemiological studies aiming to better understand the disease. Traditional morphological identification is painstaking and time-consuming, and molecular methods for extensive screening remain expensive. Recent studies have shown that matrix-assisted laser desorption and ionization time-of-flight mass spectrometry (MALDI-TOF MS) is a promising tool for rapid and cost-effective identification of arthropod vectors, including sand flies. The aim of this study was to validate the use of MALDI-TOF MS for the identification of Northern Amazonian sand flies. We constituted a MALDI-TOF MS reference database comprising 29 species of sand flies that were field-collected in French Guiana, which are expected to cover many of the more common species of the Northern Amazonian region, including known vectors of leishmaniasis. Carrying out a blind test, all the sand flies tested (n = 157) with a log (score) threshold greater than 1.7 were correctly identified at the species level. We confirmed that MALDI-TOF MS protein profiling is a useful tool for the study of sand flies, including neotropical species, known for their great diversity. An application that includes the spectra generated here will be available to the scientific community in the near future via an online platform.

]]>
<![CDATA[Long live the queen, the king and the commoner? Transcript expression differences between old and young in the termite Cryptotermes secundus]]> https://www.researchpad.co/article/5c6dc99ad5eed0c484529eb1

Social insects provide promising new avenues for aging research. Within a colony, individuals that share the same genetic background can differ in lifespan by up to two orders of magnitude. Reproducing queens (and in termites also kings) can live for more than 20 years, extraordinary lifespans for insects. We studied aging in a termite species, Cryptotermes secundus, which lives in less socially complex societies with a few hundred colony members. Reproductives develop from workers which are totipotent immatures. Comparing transcriptomes of young and old individuals, we found evidence for aging in reproductives that was especially associated with DNA and protein damage and the activity of transposable elements. By contrast, workers seemed to be better protected against aging. Thus our results differed from those obtained for social insects that live in more complex societies. Yet, they are in agreement with lifespan estimates for the study species. Our data are also in line with expectations from evolutionary theory. For individuals that are able to reproduce, it predicts that aging should only start after reaching maturity. As C. secundus workers are immatures with full reproductive options we expect them to invest into anti-aging processes. Our study illustrates that the degree of aging can differ between social insects and that it may be associated with caste-specific opportunities for reproduction.

]]>