ResearchPad - sequence-tagged-site-analysis https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Using case-level context to classify cancer pathology reports]]> https://www.researchpad.co/article/elastic_article_7869 Individual electronic health records (EHRs) and clinical reports are often part of a larger sequence—for example, a single patient may generate multiple reports over the trajectory of a disease. In applications such as cancer pathology reports, it is necessary not only to extract information from individual reports, but also to capture aggregate information regarding the entire cancer case based off case-level context from all reports in the sequence. In this paper, we introduce a simple modular add-on for capturing case-level context that is designed to be compatible with most existing deep learning architectures for text classification on individual reports. We test our approach on a corpus of 431,433 cancer pathology reports, and we show that incorporating case-level context significantly boosts classification accuracy across six classification tasks—site, subsite, laterality, histology, behavior, and grade. We expect that with minimal modifications, our add-on can be applied towards a wide range of other clinical text-based tasks.

]]>
<![CDATA[Identifying functional groups among the diverse, recombining antigenic var genes of the malaria parasite Plasmodium falciparum from a local community in Ghana]]> https://www.researchpad.co/article/5b3099a2463d7e6a0ede32be

A challenge in studying diverse multi-copy gene families is deciphering distinct functional types within immense sequence variation. Functional changes can in some cases be tracked through the evolutionary history of a gene family; however phylogenetic approaches are not possible in cases where gene families diversify primarily by recombination. We take a network theoretical approach to functionally classify the highly recombining var antigenic gene family of the malaria parasite Plasmodium falciparum. We sample var DBLα sequence types from a local population in Ghana, and classify 9,276 of these variants into just 48 functional types. Our approach is to first decompose each sequence type into its constituent, recombining parts; we then use a stochastic block model to identify functional groups among the parts; finally, we classify the sequence types based on which functional groups they contain. This method for functional classification does not rely on an inferred phylogenetic history, nor does it rely on inferring function based on conserved sequence features. Instead, it infers functional similarity among recombining parts based on the sharing of similar co-occurrence interactions with other parts. This method can therefore group sequences that have undetectable sequence homology or even distinct origination. Describing these 48 var functional types allows us to simplify the antigenic diversity within our dataset by over two orders of magnitude. We consider how the var functional types are distributed in isolates, and find a nonrandom pattern reflecting that common var functional types are non-randomly distinct from one another in terms of their functional composition. The coarse-graining of var gene diversity into biologically meaningful functional groups has important implications for understanding the disease ecology and evolution of this system, as well as for designing effective epidemiological monitoring and intervention.

]]>
<![CDATA[AGIA Tag System Based on a High Affinity Rabbit Monoclonal Antibody against Human Dopamine Receptor D1 for Protein Analysis]]> https://www.researchpad.co/article/5989da7fab0ee8fa60b99e0b

Polypeptide tag technology is widely used for protein detection and affinity purification. It consists of two fundamental elements: a peptide sequence and a binder which specifically binds to the peptide tag. In many tag systems, antibodies have been used as binder due to their high affinity and specificity. Recently, we obtained clone Ra48, a high-affinity rabbit monoclonal antibody (mAb) against dopamine receptor D1 (DRD1). Here, we report a novel tag system composed of Ra48 antibody and its epitope sequence. Using a deletion assay, we identified EEAAGIARP in the C-terminal region of DRD1 as the minimal epitope of Ra48 mAb, and we named this sequence the “AGIA” tag, based on its central sequence. The tag sequence does not include the four amino acids, Ser, Thr, Tyr, or Lys, which are susceptible to post-translational modification. We demonstrated performance of this new tag system in biochemical and cell biology applications. SPR analysis demonstrated that the affinity of the Ra48 mAb to the AGIA tag was 4.90 × 10−9 M. AGIA tag showed remarkably high sensitivity and specificity in immunoblotting. A number of AGIA-fused proteins overexpressed in animal and plant cells were detected by anti-AGIA antibody in immunoblotting and immunostaining with low background, and were immunoprecipitated efficiently. Furthermore, a single amino acid substitution of the second Glu to Asp (AGIA/E2D) enabled competitive dissociation of AGIA/E2D-tagged protein by adding wild-type AGIA peptide. It enabled one-step purification of AGIA/E2D-tagged recombinant proteins by peptide competition under physiological conditions. The sensitivity and specificity of the AGIA system makes it suitable for use in multiple methods for protein analysis.

]]>
<![CDATA[Investigation of Proposed Ladderane Biosynthetic Genes from Anammox Bacteria by Heterologous Expression in E. coli]]> https://www.researchpad.co/article/5989d9fbab0ee8fa60b7218f

Ladderanes are hydrocarbon chains with three or five linearly concatenated cyclobutane rings that are uniquely produced as membrane lipid components by anammox (anaerobic ammonia-oxidizing) bacteria. By virtue of their angle and torsional strain, ladderanes are unusually energetic compounds, and if produced biochemically by engineered microbes, could serve as renewable, high-energy-density jet fuel components. The biochemistry and genetics underlying the ladderane biosynthetic pathway are unknown, however, previous studies have identified a pool of 34 candidate genes from the anammox bacterium, Kuenenia stuttgartiensis, some or all of which may be involved with ladderane fatty acid biosynthesis. The goal of the present study was to establish a systematic means of testing the candidate genes from K. stuttgartiensis for involvement in ladderane biosynthesis through heterologous expression in E. coli under anaerobic conditions. This study describes an efficient means of assembly of synthesized, codon-optimized candidate ladderane biosynthesis genes in synthetic operons that allows for changes to regulatory element sequences, as well as modular assembly of multiple operons for simultaneous heterologous expression in E. coli (or potentially other microbial hosts). We also describe in vivo functional tests of putative anammox homologs of the phytoene desaturase CrtI, which plays an important role in the hypothesized ladderane pathway, and a method for soluble purification of one of these enzymes. This study is, to our knowledge, the first experimental effort focusing on the role of specific anammox genes in the production of ladderanes, and lays the foundation for future efforts toward determination of the ladderane biosynthetic pathway. Our substantial, but far from comprehensive, efforts at elucidating the ladderane biosynthetic pathway were not successful. We invite the scientific community to take advantage of the considerable synthetic biology resources and experimental results developed in this study to elucidate the biosynthetic pathway that produces unique and intriguing ladderane lipids.

]]>
<![CDATA[A Signature of Genomic Instability Resulting from Deficient Replication Licensing]]> https://www.researchpad.co/article/5989db53ab0ee8fa60bdce28

Insufficient licensing of DNA replication origins has been shown to result in genome instability, stem cell deficiency, and cancers. However, it is unclear whether the DNA damage resulting from deficient replication licensing occurs generally or if specific sites are preferentially affected. To map locations of ongoing DNA damage in vivo, the DNAs present in red blood cell micronuclei were sequenced. Many micronuclei are the product of DNA breaks that leave acentromeric remnants that failed to segregate during mitosis and should reflect the locations of breaks. To validate the approach we show that micronuclear sequences identify known common fragile sites under conditions that induce breaks at these locations (hydroxyurea). In MCM2 deficient mice a different set of preferred breakage sites is identified that includes the tumor suppressor gene Tcf3, which is known to contribute to T-lymphocytic leukemias that arise in these mice, and the 45S rRNA gene repeats.

]]>
<![CDATA[First Comparative Analysis of the Community Structures and Carbon Metabolic Pathways of the Bacteria Associated with Alvinocaris longirostris in a Hydrothermal Vent of Okinawa Trough]]> https://www.researchpad.co/article/5989d9f3ab0ee8fa60b6f280

Alvinocaris longirostris is a species of shrimp existing in the hydrothermal fields of Okinawa Trough. To date the structure and function of the microbial community associated with A. longirostris are essentially unknown. In this study, by employment of the techniques of high through-put sequencing and clone library construction and analysis, we compared for the first time the community structures and metabolic profiles of microbes associated with the gill and gut of A. longirostris in a hydrothermal field of Okinawa Trough. Fourteen phyla were detected in the gill and gut communities, of which 11 phyla were shared by both tissues. Proteobacteria made up a substantial proportion in both tissues, while Firmicutes was abundant only in gut. Although gill and gut communities were similar in bacterial diversities, the bacterial community structures in these two tissues were significantly different. Further, we discovered for the first time the existence in the gill and gut communities of A. longirostris the genes (cbbM and aclB) encoding the key enzymes of Calvin-Benson-Bassham (CBB) cycle and the reductive tricarboxylic acid (rTCA) cycle, and that both cbbM and aclB were significantly more abundant in gill than in gut. Taken together, these results provide the first evidence that at least two carbon fixation pathways are present in both the gill and the gut communities of A. longirostris, and that the communities in different tissues likely differ in autotrophic productivity.

]]>