ResearchPad - sequence-alignment https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Polyploidy breaks speciation barriers in Australian burrowing frogs <i>Neobatrachus</i>]]> https://www.researchpad.co/article/elastic_article_16332 Polyploidy or whole genome duplication is rare in animals and usually polyploid animals reproduce asexually. The Australian burrowing frogs of the genus Neobatrachus form an interesting exception amongst vertebrates with multiple independently originated autotetraploid sexual species. We generated population genomic data from 87 animals representing all six diploid and three tetraploid species of Neobatrachus. We show that, while diploid Neobatrachus species seem to be isolated from each other, their sister tetraploid species experience substantial levels of gene flow, and have wider distributions. Furthermore, we observe asymmetric gene flow from diploids to tetraploids. Based on our genomic and climate analyses we suggest that such inter-specific hybridization mediated by whole genome duplication rescues species diversity and allows tetraploids to more easily avoid impacts of climate-induced habitat loss.

]]>
<![CDATA[Sequence-structure-function relationships in class I MHC: A local frustration perspective]]> https://www.researchpad.co/article/elastic_article_15751 Class I Major Histocompatibility Complex (MHC) binds short antigenic peptides with the help of Peptide Loading Complex (PLC), and presents them to T-cell Receptors (TCRs) of cytotoxic T-cells and Killer-cell Immunglobulin-like Receptors (KIRs) of Natural Killer (NK) cells. With more than 10000 alleles, human MHC (Human Leukocyte Antigen, HLA) is the most polymorphic protein in humans. This allelic diversity provides a wide coverage of peptide sequence space, yet does not affect the three-dimensional structure of the complex. Moreover, TCRs mostly interact with HLA in a common diagonal binding mode, and KIR-HLA interaction is allele-dependent. With the aim of establishing a framework for understanding the relationships between polymorphism (sequence), structure (conserved fold) and function (protein interactions) of the human MHC, we performed here a local frustration analysis on pMHC homology models covering 1436 HLA I alleles. An analysis of local frustration profiles indicated that (1) variations in MHC fold are unlikely due to minimally-frustrated and relatively conserved residues within the HLA peptide-binding groove, (2) high frustration patches on HLA helices are either involved in or near interaction sites of MHC with the TCR, KIR, or tapasin of the PLC, and (3) peptide ligands mainly stabilize the F-pocket of HLA binding groove.

]]>
<![CDATA[Single-cell amplicon sequencing reveals community structures and transmission trends of protist-associated bacteria in a termite host]]> https://www.researchpad.co/article/elastic_article_14746 The hindgut protists of wood-feeding termites are usually colonized by prokaryotic symbionts. Many of the hurdles that have prevented a better understanding of these symbionts arise from variation among protist and termite host species and the inability to maintain prominent community members in culture. These issues have made it difficult to study the fidelity, acquisition, and differences in colonization of protists by bacterial symbionts. In this study, we use high throughput amplicon sequencing of the V4 region of 16S rRNA genes to determine the composition of bacterial communities associated with single protist cells of six protist species, from the genera Pyrsonympha, Dinenympha, and Trichonympha that are present in the hindgut of the termite Reticulitermes flavipes. By analyzing amplicon sequence variants (ASVs), the diversity and distribution of protist-associated bacteria was compared within and across these six different protist species. ASV analysis showed that, in general, each protist genus associated with a distinct community of bacterial symbionts which were conserved across different termite colonies. However, some ASVs corresponding to ectosymbionts (Spirochaetes) were shared between different Dinenympha species and to a lesser extent with Pyrsonympha and Trichonympha hosts. This suggested that certain bacterial symbionts may be cosmopolitan to some degree and perhaps acquired by horizontal transmission. Using a fluorescence-based cell assay, we could observe the horizontal acquisition of surface-bound bacteria. This acquisition was shown to be time-dependent, involve active processes, and was non-random with respect to binding locations on some protists.

]]>
<![CDATA[Differences in study workload stress and its associated factors between transfer students and freshmen entrants in an Asian higher education context]]> https://www.researchpad.co/article/elastic_article_14715 Unlike the studies of freshmen entrants, the learning experiences of community college transfer (CCT) students in the receiving university is a topic that has only started to gain attention in recent decades. Little is known about the differences between CCT and freshmen entrants with regard to their study workload stress and its relationship with their perceptions of the teaching and learning environment, approaches to learning, self-efficacy and generic skills. The purpose of our study was to address this gap. This was a cross-sectional survey study conducted from April 2018 to November 2018 in a university in Hong Kong. The HowULearn questionnaire was adapted to local usage and validated for data collection. In total, 841 CCT students and 978 freshmen entrants completed the survey. The respondents were aged between 19 and 52 years (mean = 21.6, SD = 1.92), and 66.0% were women. The HowULearn questionnaire was determined by factor analyses to have eight factors. The reliabilities of the eight factors were found to be acceptable (Cronbach alphas = 0.709–0.918). The CCT students scored significantly higher than the freshmen entrants for perceived study workload stress and surface approaches to learning, but lower on teaching for understanding & encouraging learning, peer support, and self-efficacy beliefs. The surface approach to learning, deep & organized studying, alignment & constructive feedback, and generic skills were found to be predictors of study workload stress in both groups of students, and in the overall student data. This study has shown that CCT students and freshmen entrants differed with regard to their study workload stress and learning experiences. Our findings provide a message, both for educators in higher education and policy makers in the government—there is not a one-size-fits-all approach to different student populations when it comes to enhancing their learning experiences.

]]>
<![CDATA[A new neuropeptide insect parathyroid hormone iPTH in the red flour beetle <i>Tribolium castaneum</i>]]> https://www.researchpad.co/article/elastic_article_14647 Vertebrate parathyroid hormone (PTH) and its receptors have been extensively studied with respect to their function in bone remodeling and calcium metabolism. Insect parathyroid hormone receptors (iPTHRs) have been previously described as counterparts of vertebrate PTHRs, however, they are still orphan receptors for which the authentic ligands and biological functions remain unknown. We describe an insect form of parathyroid hormone (iPTH) by analyzing its interactions with iPTHRs. Identification of this new insect peptidergic system proved that the PTH system is an ancestral signaling system dating back to the evolutionary time before the divergence of protostomes and deuterostomes. We also investigated the functions of the iPTH system in a model beetle Tribolium castaneum by using RNA interference. RNA interference of iPTHR resulted in defects in wing exoskeleton maturation and fecundity. Based on the differential gene expression patterns and the phenotype induced by RNAi, we propose that the iPTH system is likely involved in the regulation of exoskeletal cuticle formation and fecundity in insects.

]]>
<![CDATA[Comparative mitochondrial genome analysis of <i>Dendrolimus houi</i> (Lepidoptera: Lasiocampidae) and phylogenetic relationship among Lasiocampidae species]]> https://www.researchpad.co/article/elastic_article_14575 Dendrolimus houi is one of the most common caterpillars infesting Gymnosperm trees, and widely distributed in several countries in Southeast Asia, and exists soley or coexists with several congeners and some Lasiocampidae species in various forest habitats. However, natural hybrids occasionally occur among some closely related species in the same habitat, and host preference, extreme climate stress, and geographic isolation probably lead to their uncertain taxonomic consensus. The mitochondrial DNA (mtDNA) of D. houi was extracted and sequenced by using high-throughput technology, and the mitogenome composition and characteristics were compared and analyzed of these species, then the phylogenetic relationship was constructed using the maximum likelihood method (ML) and the Bayesian method (BI) based on their 13 protein-coding genes (PCGs) dataset, which were combined and made available to download which were combined and made available to download among global Lasiocampidae species data. Mitogenome of D. houi was 15,373 bp in length, with 37 genes, including 13 PCGs, 22 tRNA genes (tRNAs) and 2 rRNA genes (rRNAs). The positions and sequences of genes were consistent with those of most known Lasiocampidae species. The nucleotide composition was highly A+T biased, accounting for ~80% of the whole mitogenome. All start codons of PCGs belonged to typical start codons ATN except for COI which used CGA, and most stop codons ended with standard TAA or TAG, while COI, COII, ND4 ended with incomplete T. Only tRNASer (AGN) lacked DHU arm, while the remainder formed a typical “clover-shaped” secondary structure. For Lasiocampidae species, their complete mitochondrial genomes ranged from 15,281 to 15,570 bp in length, and all first genes started from trnM in the same direction. And base composition was biased toward A and T. Finally, both two methods (ML and BI) separately revealed that the same phylogenetic relationship of D. spp. as ((((D. punctatus + D. tabulaeformis) + D. spectabilis) + D. superans) + (D. kikuchii of Hunan population + D. houi) as in previous research, but results were different in that D. kikuchii from a Yunnan population was included, indicating that different geographical populations of insects have differentiated. And the phylogenetic relationship among Lasiocampidae species was ((((Dendrolimus) + Kunugia) + Euthrix) + Trabala). This provides a better theoretical basis for Lasiocampidae evolution and classification for future research directions.

]]>
<![CDATA[Codon Pairs are Phylogenetically Conserved: A comprehensive analysis of codon pairing conservation across the Tree of Life]]> https://www.researchpad.co/article/elastic_article_14487 Identical codon pairing and co-tRNA codon pairing increase translational efficiency within genes when two codons that encode the same amino acid are translated by the same tRNA before it diffuses from the ribosome. We examine the phylogenetic signal in both identical and co-tRNA codon pairing across 23 428 species using alignment-free and parsimony methods. We determined that conserved codon pairing typically has a smaller window size than the length of a ribosome, and codon pairing tracks phylogenies across various taxonomic groups. We report a comprehensive analysis of codon pairing, including the extent to which each codon pairs. Our parsimony method generally recovers phylogenies that are more congruent with the established phylogenies than our alignment-free method. However, four of the ten taxonomic groups did not have sufficient orthologous codon pairings and were therefore analyzed using only the alignment-free methods. Since the recovered phylogenies using only codon pairing largely match phylogenies from the Open Tree of Life and the NCBI taxonomy, and are comparable to trees recovered by other algorithms, we propose that codon pairing biases are phylogenetically conserved and should be considered in conjunction with other phylogenomic techniques.

]]>
<![CDATA[Rediscovering an old foe: Optimised molecular methods for DNA extraction and sequencing applications for fungarium specimens of powdery mildew (Erysiphales)]]> https://www.researchpad.co/article/elastic_article_14476 The purpose of this study was to identify a reliable DNA extraction protocol to use on 25-year-old powdery mildew specimens from the reference collection VPRI in order to produce high quality sequences suitable to address taxonomic phylogenetic questions. We tested 13 extraction protocols and two library preparation kits and found the combination of the E.Z.N.A.® Forensic DNA kit for DNA extraction and the NuGen Ovation® Ultralow System library preparation kit was the most suitable for this purpose.

]]>
<![CDATA[Specific clones of Trichomonas tenax are associated with periodontitis]]> https://www.researchpad.co/article/5c900d3bd5eed0c48407e3b6

Trichomonas tenax, an anaerobic protist difficult to cultivate with an unreliable molecular identification, has been suspected of involvement in periodontitis, a multifactorial inflammatory dental disease affecting the soft tissue and bone of periodontium. A cohort of 106 periodontitis patients classified by stages of severity and 85 healthy adult control patients was constituted. An efficient culture protocol, a new identification tool by real-time qPCR of T. tenax and a Multi-Locus Sequence Typing system (MLST) based on T. tenax NIH4 reference strain were created. Fifty-three strains of Trichomonas sp. were obtained from periodontal samples. 37/106 (34.90%) T. tenax from patients with periodontitis and 16/85 (18.80%°) T. tenax from control patients were detected by culture (p = 0.018). Sixty of the 191 samples were tested positive for T. tenax by qPCR, 24/85 (28%) controls and 36/106 (34%) periodontitis patients (p = 0.089). By combining both results, 45/106 (42.5%) patients were positive by culture and/or PCR, as compared to 24/85 (28.2%) controls (p = 0.042). A link was established between the carriage in patients of Trichomonas tenax and the severity of the disease. Genotyping demonstrates the presence of strain diversity with three major different clusters and a relation between disease strains and the periodontitis severity (p<0.05). More frequently detected in periodontal cases, T. tenax is likely to be related to the onset or/and evolution of periodontal diseases.

]]>
<![CDATA[Quantitative real-time PCR as a promising tool for the detection and quantification of leaf-associated fungal species – A proof-of-concept using Alatospora pulchella]]> https://www.researchpad.co/article/5989db52ab0ee8fa60bdc5cf

Traditional methods to identify aquatic hyphomycetes rely on the morphology of released conidia, which can lead to misidentifications or underestimates of species richness due to convergent morphological evolution and the presence of non-sporulating mycelia. Molecular methods allow fungal identification irrespective of the presence of conidia or their morphology. As a proof-of-concept, we established a quantitative real-time polymerase chain reaction (qPCR) assay to accurately quantify the amount of DNA as a proxy for the biomass of an aquatic hyphomycete species (Alatospora pulchella). Our study showed discrimination even among genetically closely-related species, with a high sensitivity and a reliable quantification down to 9.9 fg DNA (3 PCR forming units; LoD) and 155.0 fg DNA (47 PCR forming units; LoQ), respectively. The assay’s specificity was validated for environmental samples that harboured diverse microbial communities and likely contained PCR-inhibiting substances. This makes qPCR a promising tool to gain deeper insights into the ecological roles of aquatic hyphomycetes and other microorganisms.

]]>
<![CDATA[RNAmountAlign: Efficient software for local, global, semiglobal pairwise and multiple RNA sequence/structure alignment]]> https://www.researchpad.co/article/N67fc2065-7e6a-4783-aab9-eb74d3ac0a95

Alignment of structural RNAs is an important problem with a wide range of applications. Since function is often determined by molecular structure, RNA alignment programs should take into account both sequence and base-pairing information for structural homology identification. This paper describes C++ software, RNAmountAlign, for RNA sequence/structure alignment that runs in O(n3) time and O(n2) space for two sequences of length n; moreover, our software returns a p-value (transformable to expect value E) based on Karlin-Altschul statistics for local alignment, as well as parameter fitting for local and global alignment. Using incremental mountain height, a representation of structural information computable in cubic time, RNAmountAlign implements quadratic time pairwise local, global and global/semiglobal (query search) alignment using a weighted combination of sequence and structural similarity. RNAmountAlign is capable of performing progressive multiple alignment as well. Benchmarking of RNAmountAlign against LocARNA, LARA, FOLDALIGN, DYNALIGN, STRAL, MXSCARNA, and MUSCLE shows that RNAmountAlign has reasonably good accuracy and faster run time supporting all alignment types. Additionally, our extension of RNAmountAlign, called RNAmountAlignScan, which scans a target genome sequence to find hits having high sequence and structural similarity to a given query sequence, outperforms RSEARCH and sequence-only query scans and runs faster than FOLDALIGN query scan.

]]>
<![CDATA[A graph-based algorithm for RNA-seq data normalization]]> https://www.researchpad.co/article/N0b813aa9-b155-4778-93ba-b0f37d26ae8a

The use of RNA-sequencing has garnered much attention in recent years for characterizing and understanding various biological systems. However, it remains a major challenge to gain insights from a large number of RNA-seq experiments collectively, due to the normalization problem. Normalization has been challenging due to an inherent circularity, requiring that RNA-seq data be normalized before any pattern of differential (or non-differential) expression can be ascertained; meanwhile, the prior knowledge of non-differential transcripts is crucial to the normalization process. Some methods have successfully overcome this problem by the assumption that most transcripts are not differentially expressed. However, when RNA-seq profiles become more abundant and heterogeneous, this assumption fails to hold, leading to erroneous normalization. We present a normalization procedure that does not rely on this assumption, nor prior knowledge about the reference transcripts. This algorithm is based on a graph constructed from intrinsic correlations among RNA-seq transcripts and seeks to identify a set of densely connected vertices as references. Application of this algorithm on our synthesized validation data showed that it could recover the reference transcripts with high precision, thus resulting in high-quality normalization. On a realistic data set from the ENCODE project, this algorithm gave good results and could finish in a reasonable time. These preliminary results imply that we may be able to break the long persisting circularity problem in RNA-seq normalization.

]]>
<![CDATA[Disease-relevant mutations alter amino acid co-evolution networks in the second nucleotide binding domain of CFTR]]> https://www.researchpad.co/article/N211c75a7-eaac-4644-b655-cac4e239c2e4

Cystic Fibrosis (CF) is an inherited disease caused by mutations in the cystic fibrosis transmembrane conductance regulator (CFTR) ion channel. Mutations in CFTR cause impaired chloride ion transport in the epithelial tissues of patients leading to cardiopulmonary decline and pancreatic insufficiency in the most severely affected patients. CFTR is composed of twelve membrane-spanning domains, two nucleotide-binding domains (NBDs), and a regulatory domain. The most common mutation in CFTR is a deletion of phenylalanine at position 508 (ΔF508) in NBD1. Previous research has primarily concentrated on the structure and dynamics of the NBD1 domain; However numerous pathological mutations have also been found in the lesser-studied NBD2 domain. We have investigated the amino acid co-evolved network of interactions in NBD2, and the changes that occur in that network upon the introduction of CF and CF-related mutations (S1251N(T), S1235R, D1270N, N1303K(T)). Extensive coupling between the α- and β-subdomains were identified with residues in, or near Walker A, Walker B, H-loop and C-loop motifs. Alterations in the predicted residue network varied from moderate for the S1251T perturbation to more severe for N1303T. The S1235R and D1270N networks varied greatly compared to the wildtype, but these CF mutations only affect ion transport preference and do not severely disrupt CFTR function, suggesting dynamic flexibility in the network of interactions in NBD2. Our results also suggest that inappropriate interactions between the β-subdomain and Q-loop could be detrimental. We also identified mutations predicted to stabilize the NBD2 residue network upon introduction of the CF and CF-related mutations, and these predicted mutations are scored as benign by the MUTPRED2 algorithm. Our results suggest the level of disruption of the co-evolution predictions of the amino acid networks in NBD2 does not have a straightforward correlation with the severity of the CF phenotypes observed.

]]>
<![CDATA[Reticulate evolution in eukaryotes: Origin and evolution of the nitrate assimilation pathway]]> https://www.researchpad.co/article/5c784fefd5eed0c484007967

Genes and genomes can evolve through interchanging genetic material, this leading to reticular evolutionary patterns. However, the importance of reticulate evolution in eukaryotes, and in particular of horizontal gene transfer (HGT), remains controversial. Given that metabolic pathways with taxonomically-patchy distributions can be indicative of HGT events, the eukaryotic nitrate assimilation pathway is an ideal object of investigation, as previous results revealed a patchy distribution and suggested that the nitrate assimilation cluster of dikaryotic fungi (Opisthokonta) could have been originated and transferred from a lineage leading to Oomycota (Stramenopiles). We studied the origin and evolution of this pathway through both multi-scale bioinformatic and experimental approaches. Our taxon-rich genomic screening shows that nitrate assimilation is present in more lineages than previously reported, although being restricted to autotrophs and osmotrophs. The phylogenies indicate a pervasive role of HGT, with three bacterial transfers contributing to the pathway origin, and at least seven well-supported transfers between eukaryotes. In particular, we propose a distinct and more complex HGT path between Opisthokonta and Stramenopiles than the one previously suggested, involving at least two transfers of a nitrate assimilation gene cluster. We also found that gene fusion played an essential role in this evolutionary history, underlying the origin of the canonical eukaryotic nitrate reductase, and of a chimeric nitrate reductase in Ichthyosporea (Opisthokonta). We show that the ichthyosporean pathway, including this novel nitrate reductase, is physiologically active and transcriptionally co-regulated, responding to different nitrogen sources; similarly to distant eukaryotes with independent HGT-acquisitions of the pathway. This indicates that this pattern of transcriptional control evolved convergently in eukaryotes, favoring the proper integration of the pathway in the metabolic landscape. Our results highlight the importance of reticulate evolution in eukaryotes, by showing the crucial contribution of HGT and gene fusion in the evolutionary history of the nitrate assimilation pathway.

]]>
<![CDATA[Characterization of mammalian Lipocalin UTRs in silico: Predictions for their role in post-transcriptional regulation]]> https://www.researchpad.co/article/5c897780d5eed0c4847d2e76

The Lipocalin family is a group of homologous proteins characterized by its big array of functional capabilities. As extracellular proteins, they can bind small hydrophobic ligands through a well-conserved β-barrel folding. Lipocalins evolutionary history sprawls across many different taxa and shows great divergence even within chordates. This variability is also found in their heterogeneous tissue expression pattern. Although a handful of promoter regions have been previously described, studies on UTR regulatory roles in Lipocalin gene expression are scarce. Here we report a comprehensive bioinformatic analysis showing that complex post-transcriptional regulation exists in Lipocalin genes, as suggested by the presence of alternative UTRs with substantial sequence conservation in mammals, alongside a high diversity of transcription start sites and alternative promoters. Strong selective pressure could have operated upon Lipocalins UTRs, leading to an enrichment in particular sequence motifs that limit the choice of secondary structures. Mapping these regulatory features to the expression pattern of early and late diverging Lipocalins suggests that UTRs represent an additional phylogenetic signal, which may help to uncover how functional pleiotropy originated within the Lipocalin family.

]]>
<![CDATA[Molecular analyses and phylogeny of the herpes simplex virus 2 US9 and glycoproteins gE/gI obtained from infected subjects during the Herpevac Trial for Women]]> https://www.researchpad.co/article/5c8c193cd5eed0c484b4d241

Herpes simplex virus 2 (HSV-2) is a large double-stranded DNA virus that causes genital sores when spread by sexual contact and is a principal cause of viral encephalitis in newborns and infants. Viral glycoproteins enable virion entry into and spread between cells, making glycoproteins a prime target for vaccine development. A truncated glycoprotein D2 (gD2) vaccine candidate, recently tested in the phase 3 Herpevac Trial for Women, did not prevent HSV-2 infection in initially seronegative women. Some women who became infected experienced multiple recurrences during the trial. The HSV US7, US8, and US9 genes encode glycoprotein I (gI), glycoprotein E (gE), and the US9 type II membrane protein, respectively. These proteins participate in viral spread across cell junctions and facilitate anterograde transport of virion components in neurons, prompting us to investigate whether sequence variants in these genes could be associated with frequent recurrence. The nucleotide sequences and dN/dS ratios of the US7-US9 region from viral isolates of individuals who experienced multiple recurrences were compared with those who had had a single episode of disease. No consistent polymorphism(s) distinguished the recurrent isolates. In frequently recurring isolates, the dN/dS ratio of US7 was low while greater variation (higher dN/dS ratio) occurred in US8, suggesting conserved function of the former during reactivation. Phylogenetic reconstruction of the US7-US9 region revealed eight strongly supported clusters within the 55 U.S. HSV-2 strains sampled, which were preserved in a second global phylogeny. Thus, although we have demonstrated evolutionary diversity in the US7-US9 complex, we found no molecular evidence of sequence variation in US7-US9 that distinguishes isolates from subjects with frequently recurrent episodes of disease.

]]>
<![CDATA[Prevalence of infection by the microsporidian Nosema spp. in native bumblebees (Bombus spp.) in northern Thailand]]> https://www.researchpad.co/article/5c8accecd5eed0c48499033b

Bumblebees (tribe Bombini, genus Bombus Latreille) play a pivotal role as pollinators in mountain regions for both native plants and for agricultural systems. In our survey of northern Thailand, four species of bumblebees (Bombus (Megabombus) montivagus Smith, B. (Alpigenobombus) breviceps Smith, B. (Orientalibombus) haemorrhoidalis Smith and B. (Melanobombus) eximius Smith), were present in 11 localities in 4 provinces (Chiang Mai, Mae Hong Son, Chiang Rai and Nan). We collected and screened 280 foraging worker bumblebees for microsporidia (Nosema spp.) and trypanosomes (Crithidia spp.). Our study is the first to demonstrate the parasite infection in bumblebees in northern Thailand. We found N. ceranae in B. montivagus (5.35%), B. haemorrhoidalis (4.76%), and B. breviceps (14.28%) and N. bombi in B. montivagus (14.28%), B. haemorrhoidalis (11.64%), and B. breviceps (28.257%).

]]>
<![CDATA[Genome-wide analysis, expansion and expression of the NAC family under drought and heat stresses in bread wheat (T. aestivum L.)]]> https://www.researchpad.co/article/5c897798d5eed0c4847d30f2

The NAC family is one of the largest plant-specific transcription factor families, and some of its members are known to play major roles in plant development and response to biotic and abiotic stresses. Here, we inventoried 488 NAC members in bread wheat (Triticum aestivum). Using the recent release of the wheat genome (IWGS RefSeq v1.0), we studied duplication events focusing on genomic regions from 4B-4D-5A chromosomes as an example of the family expansion and neofunctionalization of TaNAC members. Differentially expressed TaNAC genes in organs and in response to abiotic stresses were identified using publicly available RNAseq data. Expression profiling of 23 selected candidate TaNAC genes was studied in leaf and grain from two bread wheat genotypes at two developmental stages in field drought conditions and revealed insights into their specific and/or overlapping expression patterns. This study showed that, of the 23 TaNAC genes, seven have a leaf-specific expression and five have a grain-specific expression. In addition, the grain-specific genes profiles in response to drought depend on the genotype. These genes may be considered as potential candidates for further functional validation and could present an interest for crop improvement programs in response to climate change. Globally, the present study provides new insights into evolution, divergence and functional analysis of NAC gene family in bread wheat.

]]>
<![CDATA[PhyloPi: An affordable, purpose built phylogenetic pipeline for the HIV drug resistance testing facility]]> https://www.researchpad.co/article/5c8823b3d5eed0c484638e7d

Introduction

Phylogenetic analysis plays a crucial role in quality control in the HIV drug resistance testing laboratory. If previous patient sequence data is available sample swaps can be detected and investigated. As Antiretroviral treatment coverage is increasing in many developing countries, so is the need for HIV drug resistance testing. In countries with multiple languages, transcription errors are easily made with patient identifiers. Here a self-contained blastn integrated phylogenetic pipeline can be especially useful. Even though our pipeline can run on any unix based system, a Raspberry Pi 3 is used here as a very affordable and integrated solution.

Performance benchmarks

The computational capability of this single board computer is demonstrated as well as the utility thereof in the HIV drug resistance laboratory. Benchmarking analysis against a large public database shows excellent time performance with minimal user intervention. This pipeline also contains utilities to find previous sequences as well as phylogenetic analysis and a graphical sequence mapping utility against the pol area of the HIV HXB2 reference genome. Sequence data from the Los Alamos HIV database was analyzed for inter- and intra-patient diversity and logistic regression was conducted on the calculated genetic distances. These findings show that allowable clustering and genetic distance between viral sequences from different patients is very dependent on subtype as well as the area of the viral genome being analyzed.

Availability

The Raspberry Pi image for PhyloPi, source code of the pipeline, sequence data, bash-, python- and R-scripts for the logistic regression, benchmarking as well as helper scripts are available at http://scholar.ufs.ac.za:8080/xmlui/handle/11660/7638 and https://github.com/ArmandBester/phylopi. The PhyloPi image and the source code are published under the GPLv3 license. A demo version of the PhyloPi pipeline is available at http://phylopi.hpc.ufs.ac.za/.

]]>
<![CDATA[16S rRNA sequence embeddings: Meaningful numeric feature representations of nucleotide sequences that are convenient for downstream analyses]]> https://www.researchpad.co/article/5c7ee7c5d5eed0c4848f4d9c

Advances in high-throughput sequencing have increased the availability of microbiome sequencing data that can be exploited to characterize microbiome community structure in situ. We explore using word and sentence embedding approaches for nucleotide sequences since they may be a suitable numerical representation for downstream machine learning applications (especially deep learning). This work involves first encoding (“embedding”) each sequence into a dense, low-dimensional, numeric vector space. Here, we use Skip-Gram word2vec to embed k-mers, obtained from 16S rRNA amplicon surveys, and then leverage an existing sentence embedding technique to embed all sequences belonging to specific body sites or samples. We demonstrate that these representations are meaningful, and hence the embedding space can be exploited as a form of feature extraction for exploratory analysis. We show that sequence embeddings preserve relevant information about the sequencing data such as k-mer context, sequence taxonomy, and sample class. Specifically, the sequence embedding space resolved differences among phyla, as well as differences among genera within the same family. Distances between sequence embeddings had similar qualities to distances between alignment identities, and embedding multiple sequences can be thought of as generating a consensus sequence. In addition, embeddings are versatile features that can be used for many downstream tasks, such as taxonomic and sample classification. Using sample embeddings for body site classification resulted in negligible performance loss compared to using OTU abundance data, and clustering embeddings yielded high fidelity species clusters. Lastly, the k-mer embedding space captured distinct k-mer profiles that mapped to specific regions of the 16S rRNA gene and corresponded with particular body sites. Together, our results show that embedding sequences results in meaningful representations that can be used for exploratory analyses or for downstream machine learning applications that require numeric data. Moreover, because the embeddings are trained in an unsupervised manner, unlabeled data can be embedded and used to bolster supervised machine learning tasks.

]]>