ResearchPad - languages https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Using case-level context to classify cancer pathology reports]]> https://www.researchpad.co/article/elastic_article_7869 Individual electronic health records (EHRs) and clinical reports are often part of a larger sequence—for example, a single patient may generate multiple reports over the trajectory of a disease. In applications such as cancer pathology reports, it is necessary not only to extract information from individual reports, but also to capture aggregate information regarding the entire cancer case based off case-level context from all reports in the sequence. In this paper, we introduce a simple modular add-on for capturing case-level context that is designed to be compatible with most existing deep learning architectures for text classification on individual reports. We test our approach on a corpus of 431,433 cancer pathology reports, and we show that incorporating case-level context significantly boosts classification accuracy across six classification tasks—site, subsite, laterality, histology, behavior, and grade. We expect that with minimal modifications, our add-on can be applied towards a wide range of other clinical text-based tasks.

]]>
<![CDATA[Modeling competitive evolution of multiple languages]]> https://www.researchpad.co/article/elastic_article_7854 Increasing evidence demonstrates that in many places language coexistence has become ubiquitous and essential for supporting language and cultural diversity and associated with its financial and economic benefits. The competitive evolution among multiple languages determines the evolution outcome, either coexistence, or decline, or extinction. Here, we extend the Abrams-Strogatz model of language competition to multiple languages and then validate it by analyzing the behavioral transitions of language usage over the recent several decades in Singapore and Hong Kong. In each case, we estimate from data the model parameters that measure each language utility for its speakers and the strength of two biases, the majority preference for their language, and the minority aversion to it. The values of these two biases decide which language is the fastest growing in the competition and what would be the stable state of the system. We also study the system convergence time to stable states and discover the existence of tipping points with multiple attractors. Moreover, the critical slowdown of convergence to the stable fractions of language users appears near and peaks at the tipping points, signaling when the system approaches them. Our analysis furthers our understanding of evolution of various languages and the role of tipping points in behavioral transitions. These insights may help to protect languages from extinction and retain the language and cultural diversity.

]]>
<![CDATA[pyKNEEr: An image analysis workflow for open and reproducible research on femoral knee cartilage]]> https://www.researchpad.co/article/N0686bd46-1746-4f66-8610-270f1b75b482

Transparent research in musculoskeletal imaging is fundamental to reliably investigate diseases such as knee osteoarthritis (OA), a chronic disease impairing femoral knee cartilage. To study cartilage degeneration, researchers have developed algorithms to segment femoral knee cartilage from magnetic resonance (MR) images and to measure cartilage morphology and relaxometry. The majority of these algorithms are not publicly available or require advanced programming skills to be compiled and run. However, to accelerate discoveries and findings, it is crucial to have open and reproducible workflows. We present pyKNEEr, a framework for open and reproducible research on femoral knee cartilage from MR images. pyKNEEr is written in python, uses Jupyter notebook as a user interface, and is available on GitHub with a GNU GPLv3 license. It is composed of three modules: 1) image preprocessing to standardize spatial and intensity characteristics; 2) femoral knee cartilage segmentation for intersubject, multimodal, and longitudinal acquisitions; and 3) analysis of cartilage morphology and relaxometry. Each module contains one or more Jupyter notebooks with narrative, code, visualizations, and dependencies to reproduce computational environments. pyKNEEr facilitates transparent image-based research of femoral knee cartilage because of its ease of installation and use, and its versatility for publication and sharing among researchers. Finally, due to its modular structure, pyKNEEr favors code extension and algorithm comparison. We tested our reproducible workflows with experiments that also constitute an example of transparent research with pyKNEEr, and we compared pyKNEEr performances to existing algorithms in literature review visualizations. We provide links to executed notebooks and executable environments for immediate reproducibility of our findings.

]]>
<![CDATA[Parametric CAD modeling for open source scientific hardware: Comparing OpenSCAD and FreeCAD Python scripts]]> https://www.researchpad.co/article/N836ce8d9-e17d-43c0-9509-c554011a4818

Open source hardware for scientific equipment needs to provide source files and enough documentation to allow the study, replication and modification of the design. In addition, parametric modeling is encouraged in order to facilitate customization for other experiments. Parametric design using a solid modeling programming language allows customization and provides a source file for the design. OpenSCAD is the most widely used scripting tool for parametric modeling of open source labware. However, OpenSCAD lacks the ability to export to standard parametric formats; thus, the parametric dimensional information of the model is lost. This is an important deficiency because it is key to share the design in the most accessible formats with no information loss. In this work we analyze OpenSCAD and compare it with FreeCAD Python scripts. We have created a parametric open source hardware design to compare these tools. Our findings show that although Python for FreeCAD is more arduous to learn, its advantages counterbalance the initial difficulties. The main benefits are being able to export to standard parametric models; using Python language with its libraries; and the ability to use and integrate the models in its graphical interface. Thus, making it more appropriate to design open source hardware for scientific equipment.

]]>
<![CDATA[A season for all things: Phenological imprints in Wikipedia usage and their relevance to conservation]]> https://www.researchpad.co/article/5c882408d5eed0c4846395ce

Phenology plays an important role in many human–nature interactions, but these seasonal patterns are often overlooked in conservation. Here, we provide the first broad exploration of seasonal patterns of interest in nature across many species and cultures. Using data from Wikipedia, a large online encyclopedia, we analyzed 2.33 billion pageviews to articles for 31,751 species across 245 languages. We show that seasonality plays an important role in how and when people interact with plants and animals online. In total, over 25% of species in our data set exhibited a seasonal pattern in at least one of their language-edition pages, and seasonality is significantly more prevalent in pages for plants and animals than it is in a random selection of Wikipedia articles. Pageview seasonality varies across taxonomic clades in ways that reflect observable patterns in phenology, with groups such as insects and flowering plants having higher seasonality than mammals. Differences between Wikipedia language editions are significant; pages in languages spoken at higher latitudes exhibit greater seasonality overall, and species seldom show the same pattern across multiple language editions. These results have relevance to conservation policy formulation and to improving our understanding of what drives human interest in biodiversity.

]]>
<![CDATA[Mandarin Chinese modality exclusivity norms]]> https://www.researchpad.co/article/5c76fe72d5eed0c484e5ba83

Modality exclusivity norms have been developed in different languages for research on the relationship between perceptual and conceptual systems. This paper sets up the first modality exclusivity norms for Chinese, a Sino-Tibetan language with semantics as its orthographically relevant level. The norms are collected through two studies based on Chinese sensory words. The experimental designs take into consideration the morpho-lexical and orthographic structures of Chinese. Study 1 provides a set of norms for Mandarin Chinese single-morpheme words in mean ratings of the extent to which a word is experienced through the five sense modalities. The degrees of modality exclusivity are also provided. The collected norms are further analyzed to examine how sub-lexical orthographic representations of sense modalities in Chinese characters affect speakers’ interpretation of the sensory words. In particular, we found higher modality exclusivity rating for the sense modality explicitly represented by a semantic radical component, as well as higher auditory dominant modality rating for characters with transparent phonetic symbol components. Study 2 presents the mean ratings and modality exclusivity of coordinate disyllabic compounds involving multiple sense modalities. These studies open new perspectives in the study of modality exclusivity. First, links between modality exclusivity and writing systems have been established which has strengthened previous accounts of the influence of orthography in the processing of visual information in reading. Second, a new set of modality exclusivity norms of compounds is proposed to show the competition of influence on modality exclusivity from different linguistic factors and potentially allow such norms to be linked to studies on synesthesia and semantic transparency.

]]>
<![CDATA[RaCaT: An open source and easy to use radiomics calculator tool]]> https://www.researchpad.co/article/5c76fe64d5eed0c484e5b9d0

Purpose

The widely known field ‘Radiomics’ aims to provide an extensive image based phenotyping of e.g. tumors using a wide variety of feature values extracted from medical images. Therefore, it is of utmost importance that feature values calculated by different institutes follow the same feature definitions. For this purpose, the imaging biomarker standardization initiative (IBSI) provides detailed mathematical feature descriptions, as well as (mathematical) test phantoms and corresponding reference feature values. We present here an easy to use radiomic feature calculator, RaCaT, which provides the calculation of a large number of radiomic features for all kind of medical images which are in compliance with the standard.

Methods

The calculator is implemented in C++ and comes as a standalone executable. Therefore, it can be easily integrated in any programming language, but can also be called from the command line. No programming skills are required to use the calculator. The software architecture is highly modularized so that it is easily extendible. The user can also download the source code, adapt it if needed and build the calculator from source. The calculated feature values are compliant with the ones provided by the IBSI standard. Source code, example files for the software configuration, and documentation can be found online on GitHub (https://github.com/ellipfaehlerUMCG/RaCat).

Results

The comparison with the standard values shows that all calculated features as well as image preprocessing steps, comply with the IBSI standard. The performance is also demonstrated on clinical examples.

Conclusions

The authors successfully implemented an easy to use Radiomics calculator that can be called from any programming language or from the command line. Image preprocessing and feature settings and calculations can be adjusted by the user.

]]>
<![CDATA[The impact of bilingualism on executive functions and working memory in young adults]]> https://www.researchpad.co/article/5c6dc9a6d5eed0c484529f7c

A bilingual advantage in a form of a better performance of bilinguals in tasks tapping into executive function abilities has been reported repeatedly in the literature. However, recent research defends that this advantage does not stem from bilingualism, but from uncontrolled factors or imperfectly matched samples. In this study we explored the potential impact of bilingualism on executive functioning abilities by testing large groups of young adult bilinguals and monolinguals in the tasks that were most extensively used when the advantages were reported. Importantly, the recently identified factors that could be disrupting the between groups comparisons were controlled for, and both groups were matched. We found no differences between groups in their performance. Additional bootstrapping analyses indicated that, when the bilingual advantage appeared, it very often co-occurred with unmatched socio-demographic factors. The evidence presented here indicates that the bilingual advantage might indeed be caused by spurious uncontrolled factors rather than bilingualism per se. Secondly, bilingualism has been argued to potentially affect working memory also. Therefore, we tested the same participants in both a forward and a backward version of a visual and an auditory working memory task. We found no differences between groups in either of the forward versions of the tasks, but bilinguals systematically outperformed monolinguals in the backward conditions. The results are analysed and interpreted taking into consideration different perspectives in the domain-specificity of the executive functions and working memory.

]]>
<![CDATA[elPrep 4: A multithreaded framework for sequence analysis]]> https://www.researchpad.co/article/5c6dc9a8d5eed0c484529f91

We present elPrep 4, a reimplementation from scratch of the elPrep framework for processing sequence alignment map files in the Go programming language. elPrep 4 includes multiple new features allowing us to process all of the preparation steps defined by the GATK Best Practice pipelines for variant calling. This includes new and improved functionality for sorting, (optical) duplicate marking, base quality score recalibration, BED and VCF parsing, and various filtering options. The implementations of these options in elPrep 4 faithfully reproduce the outcomes of their counterparts in GATK 4, SAMtools, and Picard, even though the underlying algorithms are redesigned to take advantage of elPrep’s parallel execution framework to vastly improve the runtime and resource use compared to these tools. Our benchmarks show that elPrep executes the preparation steps of the GATK Best Practices up to 13x faster on WES data, and up to 7.4x faster for WGS data compared to running the same pipeline with GATK 4, while utilizing fewer compute resources.

]]>
<![CDATA[Bioinformatics calls the school: Use of smartphones to introduce Python for bioinformatics in high schools]]> https://www.researchpad.co/article/5c6f1486d5eed0c48467a231

The dynamic nature of technological developments invites us to rethink the learning spaces. In this context, science education can be enriched by the contribution of new computational resources, making the educational process more up-to-date, challenging, and attractive. Bioinformatics is a key interdisciplinary field, contributing to the understanding of biological processes that is often underrated in secondary schools. As a useful resource in learning activities, bioinformatics could help in engaging students to integrate multiple fields of knowledge (logical-mathematical, biological, computational, etc.) and generate an enriched and long-lasting learning environment. Here, we report our recent project in which high school students learned basic concepts of programming applied to solving biological problems. The students were taught the Python syntax, and they coded simple tools to answer biological questions using resources at hand. Notably, these were built mostly on the students’ own smartphones, which proved to be capable, readily available, and relevant complementary tools for teaching. This project resulted in an empowering and inclusive experience that challenged differences in social background and technological accessibility.

]]>
<![CDATA[Developing a modern data workflow for regularly updated data]]> https://www.researchpad.co/article/5c59fef0d5eed0c4841357ed

Over the past decade, biology has undergone a data revolution in how researchers collect data and the amount of data being collected. An emerging challenge that has received limited attention in biology is managing, working with, and providing access to data under continual active collection. Regularly updated data present unique challenges in quality assurance and control, data publication, archiving, and reproducibility. We developed a workflow for a long-term ecological study that addresses many of the challenges associated with managing this type of data. We do this by leveraging existing tools to 1) perform quality assurance and control; 2) import, restructure, version, and archive data; 3) rapidly publish new data in ways that ensure appropriate credit to all contributors; and 4) automate most steps in the data pipeline to reduce the time and effort required by researchers. The workflow leverages tools from software development, including version control and continuous integration, to create a modern data management system that automates the pipeline.

]]>
<![CDATA[Switching between reading tasks leads to phase-transitions in reading times in L1 and L2 readers]]> https://www.researchpad.co/article/5c63394bd5eed0c484ae6445

Reading research uses different tasks to investigate different levels of the reading process, such as word recognition, syntactic parsing, or semantic integration. It seems to be tacitly assumed that the underlying cognitive process that constitute reading are stable across those tasks. However, nothing is known about what happens when readers switch from one reading task to another. The stability assumptions of the reading process suggest that the cognitive system resolves this switching between two tasks quickly. Here, we present an alternative language-game hypothesis (LGH) of reading that begins by treating reading as a softly-assembled process and that assumes, instead of stability, context-sensitive flexibility of the reading process. LGH predicts that switching between two reading tasks leads to longer lasting phase-transition like patterns in the reading process. Using the nonlinear-dynamical tool of recurrence quantification analysis, we test these predictions by examining series of individual word reading times in self-paced reading tasks where native (L1) and second language readers (L2) transition between random word and ordered text reading tasks. We find consistent evidence for phase-transitions in the reading times when readers switch from ordered text to random-word reading, but we find mixed evidence when readers transition from random-word to ordered-text reading. In the latter case, L2 readers show moderately stronger signs for phase-transitions compared to L1 readers, suggesting that familiarity with a language influences whether and how such transitions occur. The results provide evidence for LGH and suggest that the cognitive processes underlying reading are not fully stable across tasks but exhibit soft-assembly in the interaction between task and reader characteristics.

]]>
<![CDATA[Ten simple rules for writing statistical book reviews]]> https://www.researchpad.co/article/5c536c2bd5eed0c484a49b27

Statistical books can provide deep insights into statistics and software. There are, however, many resources available to the practitioner. Book reviews have the capacity to function as a critical mechanism for the learner to assess the merits of engaging in part, in full, or at all with a book. The “ten simple rules” format, pioneered in computational biology, was applied here to writing effective book reviews for statistics because of the wide breadth of offerings in this domain, including topical introductions, computational solutions, and theory. Learning by doing is a popular paradigm in statistics and computation, but there is still a niche for books in the pedagogy of self-taught and instruction-based learning. Primarily, these rules ensure that book reviews function as a form of short syntheses to inform and guide readers in deciding to use a specific book relative to other options for resolving statistical challenges.

]]>
<![CDATA[No evidence for effects of Turkish immigrant children‘s bilingualism on executive functions]]> https://www.researchpad.co/article/5c61b7ccd5eed0c484937fd6

Recent research has increasingly questioned the bilingual advantage for executive functions (EF). We used structural equation modeling in a large sample of Turkish immigrant and German monolingual children (N = 337; aged 5–15 years) to test associations between bilingualism and EF. Our data showed no significant group differences between Turkish immigrant and German children’s EF skills while taking into account maternal education, child gender, age, and working memory (i.e., digit span backwards). Moreover, neither Turkish immigrant children’s proficiency in either language nor their home language environment predicted EF. Our findings offer important new evidence in light of the ongoing debate about the existence of a bilingual advantage for EF.

]]>
<![CDATA[Evaluating probabilistic programming languages for simulating quantum correlations]]> https://www.researchpad.co/article/5c390bbad5eed0c48491e06f

This article explores how probabilistic programming can be used to simulate quantum correlations in an EPR experimental setting. Probabilistic programs are based on standard probability which cannot produce quantum correlations. In order to address this limitation, a hypergraph formalism was programmed which both expresses the measurement contexts of the EPR experimental design as well as associated constraints. Four contemporary open source probabilistic programming frameworks were used to simulate an EPR experiment in order to shed light on their relative effectiveness from both qualitative and quantitative dimensions. We found that all four probabilistic languages successfully simulated quantum correlations. Detailed analysis revealed that no language was clearly superior across all dimensions, however, the comparison does highlight aspects that can be considered when using probabilistic programs to simulate experiments in quantum physics.

]]>
<![CDATA[Dragonfly Hunter CZ: Mobile application for biological species recognition in citizen science]]> https://www.researchpad.co/article/5c3fa56cd5eed0c484ca42a4

Citizen science and data collected from various volunteers have an interesting potential in aiding the understanding of many biological and ecological processes. We describe a mobile application that allows the public to map and report occurrences of the odonata species (dragonflies and damselflies) found in the Czech Republic. The application also helps in species classification based on observation details such as date, GPS coordinates, and the altitude, biotope, suborder, and colour. Dragonfly Hunter CZ is a free Android application built on the open-source framework NativeScript using the JavaScript programming language which is now fully available on Google Play. The server side is powered by Apache Server with PHP and MariaDB SQL database. A mobile application is a fast and accurate way to obtain data pertaining to the odonata species, which can be used after expert verification for ecological studies and conservation basis like Red Lists and policy instruments. We expect it to be effective in encouraging Citizen Science and in promoting the proactive reporting of odonates. It can also be extended to the reporting and monitoring of other plant and animal species.

]]>
<![CDATA[QTM: Computational package using MPI protocol for Quantum Trajectories Method]]> https://www.researchpad.co/article/5c1813c5d5eed0c484775c4a

The Quantum Trajectories Method (QTM) is one of the frequently used methods for studying open quantum systems. The main idea of this method is the evolution of wave functions which describe the system (as functions of time). Then, so-called quantum jumps are applied at a randomly selected point in time. The obtained system state is called as a trajectory. After averaging many single trajectories, we obtain the approximation of the behavior of a quantum system. This fact also allows us to use parallel computation methods. In the article, we discuss the QTM package which is supported by the MPI technology. Using MPI allowed utilizing the parallel computing for calculating the trajectories and averaging them—as the effect of these actions, the time taken by calculations is shorter. In spite of using the C++ programming language, the presented solution is easy to utilize and does not need any advanced programming techniques. At the same time, it offers a higher performance than other packages realizing the QTM. It is especially important in the case of harder computational tasks, and the use of MPI allows improving the performance of particular problems which can be solved in the field of open quantum systems.

]]>
<![CDATA[Effect of language proficiency on proactive occulo-motor control among bilinguals]]> https://www.researchpad.co/article/5c1ab870d5eed0c4840280de

We examined the effect of language proficiency on the status and dynamics of proactive inhibitory control in an occulo-motor cued go-no-go task. The first experiment was designed to demonstrate the effect of second language proficiency on proactive inhibitory cost and adjustments in control by evaluating previous trial effects. This was achieved by introducing uncertainty about the upcoming event (go or no-go stimulus). High- and low- proficiency Hindi-English bilingual adults participated in the study. Saccadic latencies and errors were taken as the measures of performance. The results demonstrate a significantly lower proactive inhibitory cost and better up-regulation of proactive control under uncertainty among high- proficiency bilinguals. An analysis based on previous trial effects suggests that high- proficiency bilinguals were found to be better at releasing inhibition and adjustments in control, in an ongoing response activity in the case of uncertainty. To further understand the dynamics of proactive inhibitory control as a function of proficiency, the second experiment was designed to test the default versus temporary state hypothesis of proactive inhibitory control. Certain manipulations were introduced in the cued go-no-go task in order to make the upcoming go or no-go trial difficult to predict, which increased the demands on the implementation and maintenance of proactive control. High- proficiency bilinguals were found to rely on a default state of proactive inhibitory control whereas low- proficiency bilinguals were found to rely on temporary/transient proactive inhibition. Language proficiency, as one of the measures of bilingualism, was found to influence proactive inhibitory control and appears to modulate the dynamics of proactive inhibitory control.

]]>
<![CDATA[The relation of culture, socio-economics, and friendship to music preferences: A large-scale, cross-country study]]> https://www.researchpad.co/article/5c1d5b4ed5eed0c4846eb4ed

Music listening is an inherently cultural behavior, which may be shaped by users’ backgrounds and contextual characteristics. Due to geographical, socio-economic, linguistic, and cultural factors as well as friendship networks, users in different countries may have different music preferences. Investigating cultural-socio-economic factors that might be associated with between-country differences in music preferences can facilitate music information retrieval, contribute to the prediction of users’ music preferences, and improve music recommendation in cross-country contexts. However, previous literature provides limited empirical evidence of the relationships between possible cross-country differences on a wide range of socio-economic aspects and those in music preferences. To bridge this research gap, and drawing on a large-scale dataset, LFM-1b, this study examines the possible relationship between cross-country differences in artist, album, and genre listening frequencies as well as the cross-country distance in geographical, socio-economic, linguistic, cultural, and friendship connections using the Quadratic Assignment Procedure. Results indicate: (1) there is no significant relationship between geographical and economic distance on album, artist, and genre preferences’ distance at the country-level; (2) the cross-country distance of three cultural dimensions (masculinity, long-term orientation, and indulgence) is positively associated with both the album and artist preferences distances; (3) the between-country distance in main languages has a positive relationship with the album, artist, and genre preferences distances across countries; (4) the density of friendship connections among countries negatively correlates to the cross-country preference distances in terms of artist and genre. Findings from this study not only expand knowledge of factors related to music preferences at the country level, but also can be integrated into real-world music recommendation systems that consider country-level music preferences.

]]>
<![CDATA[Are tones in the expressive lexicon iconic? Evidence from three Chinese languages]]> https://www.researchpad.co/article/5c1028a4d5eed0c484247b18

Recent advances in the literature have focused on sketching phonosemantic mappings of imitative or iconic utterances by relying on vowels and consonants, leaving the suprasegmental information unexplored. To begin bridging this gap, this study looks at the interaction of lexical tone and iconicity by comparing sound symbolic (i.e., mimetic, expressive, ideophonic) strata and general (i.e., arbitrary, prosaic, non-iconic) strata from three Chinese languages (Mandarin, Taiwanese Southern Min, Hong Kong Cantonese) using corpus-based means. For all three languages the distribution of tones in the sound symbolic strata are skewed so that the majority of syllables are largely confined to two tonal categories per language, one of which is high level, while the general strata exhibit no such tonal bias. These results indicate that phonological systematicity at the prosodic level might play an important role in demarcating an iconic class of words. This cross-linguistic tendency towards high tone mappings may be derived from phonotactic strategies to facilitate prosodic foregrounding of iconic utterances as well as an embodiment of expressive voice and marked pitch use like that of Infant Directed Speech.

]]>