ResearchPad - information-technology https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[The Language of Innovation]]> https://www.researchpad.co/article/elastic_article_10245 Predicting innovation is a peculiar problem in data science. Following its definition, an innovation is always a never-seen-before event, leaving no room for traditional supervised learning approaches. Here we propose a strategy to address the problem in the context of innovative patents, by defining innovations as never-seen-before associations of technologies and exploiting self-supervised learning techniques. We think of technological codes present in patents as a vocabulary and the whole technological corpus as written in a specific, evolving language. We leverage such structure with techniques borrowed from Natural Language Processing by embedding technologies in a high dimensional euclidean space where relative positions are representative of learned semantics. Proximity in this space is an effective predictor of specific innovation events, that outperforms a wide range of standard link-prediction metrics. The success of patented innovations follows a complex dynamics characterized by different patterns which we analyze in details with specific examples. The methods proposed in this paper provide a completely new way of understanding and forecasting innovation, by tackling it from a revealing perspective and opening interesting scenarios for a number of applications and further analytic approaches.

]]>
<![CDATA[Using case-level context to classify cancer pathology reports]]> https://www.researchpad.co/article/elastic_article_7869 Individual electronic health records (EHRs) and clinical reports are often part of a larger sequence—for example, a single patient may generate multiple reports over the trajectory of a disease. In applications such as cancer pathology reports, it is necessary not only to extract information from individual reports, but also to capture aggregate information regarding the entire cancer case based off case-level context from all reports in the sequence. In this paper, we introduce a simple modular add-on for capturing case-level context that is designed to be compatible with most existing deep learning architectures for text classification on individual reports. We test our approach on a corpus of 431,433 cancer pathology reports, and we show that incorporating case-level context significantly boosts classification accuracy across six classification tasks—site, subsite, laterality, histology, behavior, and grade. We expect that with minimal modifications, our add-on can be applied towards a wide range of other clinical text-based tasks.

]]>
<![CDATA[Adherence to antiretroviral therapy and associated factors among Human immunodeficiency virus positive patients accessing treatment at Nekemte referral hospital, west Ethiopia, 2019]]> https://www.researchpad.co/article/elastic_article_7637 Antiretroviral therapy has a remarkable clinical effect in reducing the progress of Acquired Immune Deficiency Syndrome. The clinical outcome of Anti-Retroviral therapy depends on strict adherence. Poor adherence reduces the effectiveness of antiretroviral therapy and increases viral replication. With changes in service delivery over time and differences in socio-demographic status from region to region, it is essential to measure adherence. Therefore, this study aimed to assess adherence to antiretroviral therapy and its associated factors among HIV/AIDS patients accessing treatment at Nekemte referral hospital, West Ethiopia.MethodsInstitutional based cross-sectional study was conducted on 311 HIV/AIDS patients from March 01 to March 30, 2019. The study participants were selected by a simple random sampling method and interviewed using structured questionnaires. Bivariable logistic regression was conducted to find an association between each independent variable and adherence to antiretroviral medication. Multivariable logistic regression was used to find the independent variables which best predict adherence. The statistical significance was measured using odds ratio at a 95% confidence interval with a p-value of less than 0.05.ResultsOut of a total of 311 patients sampled, 305 were participated in the study, making a response rate of 98.07%. From these 305 study participants,73.1% (95% CI = 68.2, 78.0) were adherent to their medication. Having knowledge about HIV and its treatment (AOR = 8.24, 95% CI: 3.10, 21.92), having strong family/social support (AOR = 6.21, 95% CI: 1.39, 27.62), absence of adverse drug reaction (AOR = 5.33, 95% CI: 1.95, 14.57), absence of comorbidity of other chronic diseases (AOR = 5.72, 95% CI: 1.91, 17.16) and disclosing HIV status to the family (AOR = 5.08, 95% CI: 2.09, 12.34) were significantly associated with an increased likelihood of adherence to antiretroviral medication.ConclusionThe level of adherence to antiretroviral therapy was found low compared to WHO recommendation. The clinician should emphasize reducing adverse drug reaction, detecting and treating co-morbidities early, improving knowledge through health education, and encouraging the patients to disclose their HIV status to their families. ]]> <![CDATA[Applicability of personal laser scanning in forestry inventory]]> https://www.researchpad.co/article/5c803c6ad5eed0c484ad8913

Light Detection and Ranging (LiDAR) technology has been widely used in forestry surveys in the form of airborne laser scanning (ALS), terrestrial laser scanning (TLS), and mobile laser scanning (MLS). The acquisition of important basic tree parameters (e.g., diameter at breast height and tree position) in forest inventory did not solve the problem of low measurement efficiency or weak GNSS signal under the canopy. A personal laser scanning (PLS) device combined with SLAM technology provides an effective solution for forest inventory under complex conditions with its light weight and flexible mobility. This study proposes a new method for calculating the volume of a cylinder using point cloud data obtained by a PLS device by fitting to a polygonal cylinder to calculate the diameter of the trunk. The point cloud data of tree trunks of different thickness were modeled using different fitting methods. The rate of correct tree trunk detection was 93.3% and the total deviation of the estimations of tree diameter at breast height (DBH) was -1.26 cm. The root mean square errors (RMSEs) of the estimations of the extracted DBH and the tree position were 1.58 cm and 26 cm, respectively. The survey efficiency of the personal laser scanning (PLS) device was 30m2/min for each investigator, compared with 0.91m2/min for the field survey. The test demonstrated that the PLS device combined with the SLAM algorithm provides an efficient and convenient solution for forest inventory.

]]>
<![CDATA[Factors influencing performance of community-based health volunteers’ activities in the Kassena-Nankana Districts of Northern Ghana]]> https://www.researchpad.co/article/5c76fe2ad5eed0c484e5b61f

Background

An increasing demand for health care services and getting health care closer to doorsteps of communities has made health managers to use trained community-based health volunteers to support in providing health services to people in rural communities. Community volunteerism in Ghana has been identified as an effective strategy in the implementation of Primary Health Care activities since 1970s. However, little is known about the performance of these volunteers engaged in health interventions activities at the community level. This study assessed the level of performance and factors that affect the performance of health volunteers’ activities in Northern Ghana.

Methods

This was a cross-sectional study using quantitative method of data collection. Two hundred structured interviews were conducted with health volunteers. Data collectors visited health volunteers at home and conducted the interviews after informed consent was obtained. STATA Version 11.2 was used to analyze the data. Descriptive statistics were used to assess the level of performance of the health volunteers. Multiple logistic regression models were then used to assess factors that influence the performance of health volunteers.

Results

About 45% of volunteers scored high on performance. In the multivariate analysis, educational status [OR = 4.64 95% CI (1.22–17.45)] and ethnicity [OR = 1.85 95% CI (1.00–3.41)] were the factors that influenced the performance of health volunteers. Other intermediary factors such as incentives and means of transport also affected the performance of health volunteers engaged in health intervention activities at the community level.

Conclusion

The results suggest that higher educational status of health volunteers is more likely to increase their performance. In addition, providing non-monetary incentives and logistics such as bicycles, raincoats, torch lights and wellington boots will enhance the performance of health volunteers and also motivate them to continue to provide health services to their own people at the community level.

]]>
<![CDATA[Applications of artificial neural networks in health care organizational decision-making: A scoping review]]> https://www.researchpad.co/article/5c75ac5bd5eed0c484d08619

Health care organizations are leveraging machine-learning techniques, such as artificial neural networks (ANN), to improve delivery of care at a reduced cost. Applications of ANN to diagnosis are well-known; however, ANN are increasingly used to inform health care management decisions. We provide a seminal review of the applications of ANN to health care organizational decision-making. We screened 3,397 articles from six databases with coverage of Health Administration, Computer Science and Business Administration. We extracted study characteristics, aim, methodology and context (including level of analysis) from 80 articles meeting inclusion criteria. Articles were published from 1997–2018 and originated from 24 countries, with a plurality of papers (26 articles) published by authors from the United States. Types of ANN used included ANN (36 articles), feed-forward networks (25 articles), or hybrid models (23 articles); reported accuracy varied from 50% to 100%. The majority of ANN informed decision-making at the micro level (61 articles), between patients and health care providers. Fewer ANN were deployed for intra-organizational (meso- level, 29 articles) and system, policy or inter-organizational (macro- level, 10 articles) decision-making. Our review identifies key characteristics and drivers for market uptake of ANN for health care organizational decision-making to guide further adoption of this technique.

]]>
<![CDATA[The impact of individual differences on jurors’ note taking during trials and recall of trial evidence, and the association between the type of evidence recalled and verdicts]]> https://www.researchpad.co/article/5c75ac61d5eed0c484d08683

Although note taking during trials is known to enhance jurors’ recall of trial evidence, little is known about whether individual differences in note taking underpin this effect. Individual differences in handwriting speed, working memory, and attention may influence juror’s note taking. This, in turn, may influence their recall. It may also be the case that if jurors note down and recall more incriminating than non-incriminating evidence (or vice versa), then this may predict their verdict. Three studies examined the associations between the aforementioned individual differences, the amount of critical evidence jurors noted down during a trial, the amount of critical evidence they recalled, and the verdicts they reached. Participants had their handwriting speed, short-term memory, working memory, and attention assessed. They then watched a trial video (some took notes), reached a verdict, and recalled as much trial information as possible. We found that jurors with faster handwriting speed (Study 1), higher short-term memory capacity (Study 2), and higher sustained attention capacity (Study 3) noted down, and later recalled, the most critical trial evidence. However, working memory storage capacity, information processing ability (Study 2) and divided attention (Study 3) were not associated with note taking or recall. Further, the type of critical evidence jurors predominantly recalled predicted their verdicts, such that jurors who recalled more incriminating evidence were more likely to reach a guilty verdict, and jurors who recalled more non-incriminating evidence were less likely to do so. The implications of these findings are discussed.

]]>
<![CDATA[A metamorphic testing approach for event sequences]]> https://www.researchpad.co/article/5c75ac5fd5eed0c484d0865d

Test oracles are commonly used in software testing to determine the correctness of the execution results of test cases. However, the testing of many software systems faces the test oracle problem: a test oracle may not always be available, or it may be available but too expensive to apply. One such software system is a system involving abundant business processes. This paper focuses on the testing of business-process-based software systems and proposes a metamorphic testing approach for event sequences, called MTES, to alleviate the oracle problem. We utilized event sequences to represent business processes and then applied the technique of metamorphic testing to test the system without using test oracles. To apply metamorphic testing, we studied the general rules for identifying metamorphic relations for business processes and further demonstrated specific metamorphic relations for individual case studies. Three case studies were conducted to evaluate the effectiveness of our approach. The experimental results show that our approach is feasible and effective in testing the applications with rich business processes. In addition, this paper summarizes the experimental findings and proposes guidelines for selecting good metamorphic relations for business processes.

]]>
<![CDATA[An open source algorithm to detect natural gas leaks from mobile methane survey data]]> https://www.researchpad.co/article/5c6dc9e7d5eed0c48452a459

The data collected by mobile methane (CH4) sensors can be used to find natural gas (NG) leaks in urban distribution systems. Extracting actionable insights from the large volumes of data collected by these sensors requires several data processing steps. While these survey platforms are commercially available, the associated data processing software largely constitute a black box due to their proprietary nature. In this paper we describe a step-by-step algorithm for developing leak indications using data from mobile CH4 surveys, providing an under-the-hood look at the choices and challenges associated with data analysis. We also describe how our algorithm has evolved over time, and the data-driven insights that have prompted these changes. Applying our algorithm to data collected in 15 cities produced more than 6100 leak indications and estimates of the leaks’ size. We use these results to characterize the distribution of leak sizes in local NG distribution systems. Mobile surveys are already an effective and necessary tool for managing NG distribution systems, but improvements in the technology and software will continue to increase its value.

]]>
<![CDATA[Developing a modern data workflow for regularly updated data]]> https://www.researchpad.co/article/5c59fef0d5eed0c4841357ed

Over the past decade, biology has undergone a data revolution in how researchers collect data and the amount of data being collected. An emerging challenge that has received limited attention in biology is managing, working with, and providing access to data under continual active collection. Regularly updated data present unique challenges in quality assurance and control, data publication, archiving, and reproducibility. We developed a workflow for a long-term ecological study that addresses many of the challenges associated with managing this type of data. We do this by leveraging existing tools to 1) perform quality assurance and control; 2) import, restructure, version, and archive data; 3) rapidly publish new data in ways that ensure appropriate credit to all contributors; and 4) automate most steps in the data pipeline to reduce the time and effort required by researchers. The workflow leverages tools from software development, including version control and continuous integration, to create a modern data management system that automates the pipeline.

]]>
<![CDATA[University-industry-government relations of the Ministry of Industry and Information Technology (MIIT) universities: The perspective of the mutual information]]> https://www.researchpad.co/article/5c67306ed5eed0c484f37b1a

The Ministry of Industry and Information Technology (MIIT) universities are important bases for science and technology research and play a critical role in China’s National Innovation System. Based on the Web of Science (WoS), this article analyzes the statistics of paper published by MIIT universities and universities from across China including MIIT universities. The results are as follows: (1) Both the MIIT universities and universities nationwide in China have increased their international academic publications, and MIIT have shown a greater increase for the past decade. (2) In terms of U-I-G interaction, for UG relations, the Tug value of MIIT universities has remained stable, while that of universities in China has become declined. For UI relations, the Tui value of both MIIT universities and universities in China has shown steady growth. For UIG relations, MIIT universities have a greater synergistic effect of Triple Helix relationship than universities in China. (3) For more details in seven MIIT universities, universities elected into “Project 985”, including HIT, BUAA, BIT and NPU, have published more papers, and been more synergistic with government and industry (UIG relations) than other three universities, including NUAA, NUST and HEU. Based on the empirical results, we discuss our findings, and make certain suggestions regarding policy incentives, reasonable administrative system and U-I-G interaction mode, which is significant not only for Chinese universities but also for universities in other developing countries.

]]>
<![CDATA[Effort-aware and just-in-time defect prediction with neural network]]> https://www.researchpad.co/article/5c5df316d5eed0c484580cec

Effort-aware just-in-time (JIT) defect prediction is to rank source code changes based on the likelihood of detects as well as the effort to inspect such changes. Accurate defect prediction algorithms help to find more defects with limited effort. To improve the accuracy of defect prediction, in this paper, we propose a deep learning based approach for effort-aware just-in-time defect prediction. The key idea of the proposed approach is that neural network and deep learning could be exploited to select useful features for defect prediction because they have been proved excellent at selecting useful features for classification and regression. First, we preprocess ten numerical metrics of code changes, and then feed them to a neural network whose output indicates how likely the code change under test contains bugs. Second, we compute the benefit cost ratio for each code change by dividing the likelihood by its size. Finally, we rank code changes according to their benefit cost ratio. Evaluation results on a well-known data set suggest that the proposed approach outperforms the state-of-the-art approaches on each of the subject projects. It improves the average recall and popt by 15.6% and 8.1%, respectively.

]]>
<![CDATA[A tactical comparison of the 4-2-3-1 and 3-5-2 formation in soccer: A theory-oriented, experimental approach based on positional data in an 11 vs. 11 game set-up]]> https://www.researchpad.co/article/5c5b52cdd5eed0c4842bd050

The presented field experiment in an 11 vs. 11 soccer game set-up is the first to examine the impact of different formations (e.g. 4-2-3-1 vs. 3-5-2) on tactical key performance indicators (KPIs) using positional data in a controlled experiment. The data were gathered using player tracking systems (1 Hz) in a standardized 11 vs. 11 soccer game. The KPIs were measured using dynamical positioning variables like Effective Playing Space, Player Length per Width ratio, Team Separateness, Space Control Gain, and Pressure Passing Efficiency. Within the experimental positional data analysis paradigm, neither of the team formations showed differences in Effective Playing Space, Team Separateness, or Space Control Gain. However, as a theory-based approach predicted, a 3-5-2 formation for the Player Length per Width ratio and Pressure Passing Efficiency exceeded the 4-2-3-1 formation. Practice task designs which manipulate team formations therefore significantly influence the emergent behavioral dynamics and need to be considered when planning and monitoring performance. Accordingly, an experimental positional data analysis paradigm is a useful approach to enable the development and validation of theory-oriented models in the area of performance analysis in sports games.

]]>
<![CDATA[Community knowledge, attitude, and perceived stigma of leprosy amongst community members living in Dhanusha and Parsa districts of Southern Central Nepal]]> https://www.researchpad.co/article/5c424369d5eed0c4845e00dc

Background

Though Nepal declared leprosy elimination in 2010, its burden is constantly rising in Terai communities for the past 2 years with 3000 new leprosy cases being diagnosed annually. Community’s perception is important for prevention and control of leprosy and enhancing quality of life of leprosy patients. Poor knowledge, unfavorable attitude and stigma create a hindrance to leprosy control. The main objective of this study was to assess the knowledge, attitude and stigma of leprosy amongst the community members living in Dhanusha and Parsa districts of Southern Central Nepal.

Methods

A total of 423 individuals were interviewed using a structured questionnaire in Dhanusha and Parsa districts. Data was analyzed using both descriptive (frequency, percentage, median) and statistical inferences (Chi-square test, Kruskal Wallis H test, Mann Whitney U test, binary logistic regression) using SPSSvs20.

Results

All respondents had heard about leprosy. Source of information on leprosy was mainly found to be health workers/hospitals (33.1%). Only 62.6% reported bacteria being its cause followed by other myths such as bad blood/curse/heredity/bad deeds (36%). Only 43.8% responded that leprosy is transmitted by prolonged close contact with leprosy patients and 25.7% reported religious rituals as the treatment. Only 42.1% had good knowledge and 40.9% had favorable attitude. Good knowledge of leprosy was highly associated with favorable attitude towards leprosy (P<0.001). The outcome variables- knowledge, attitude and EMIC score were found to have highly significant association with age, sex, ethnicity, religion, education and occupation of the respondents (P<0.001). Having knowledge on leprosy transmission was positively associated with favorable attitude towards leprosy (P<0.001).

Conclusions

Strategizing the awareness programmes according to socio-demographic characteristics for enhancing the knowledge regarding leprosy cause, symptoms, transmission, prevention and treatment, can foster the positive community attitude towards leprosy affected persons. Enhancing positive attitude towards leprosy affected persons can reduce the community stigma, thus may increase their participation in the community. Positive attitude may further increase their early health seeking behaviour including their quality of life.

]]>
<![CDATA[PAIRUP-MS: Pathway analysis and imputation to relate unknowns in profiles from mass spectrometry-based metabolite data]]> https://www.researchpad.co/article/5c466521d5eed0c48451791d

Metabolomics is a powerful approach for discovering biomarkers and for characterizing the biochemical consequences of genetic variation. While untargeted metabolite profiling can measure thousands of signals in a single experiment, many biologically meaningful signals cannot be readily identified as known metabolites nor compared across datasets, making it difficult to infer biology and to conduct well-powered meta-analyses across studies. To overcome these challenges, we developed a suite of computational methods, PAIRUP-MS, to match metabolite signals across mass spectrometry-based profiling datasets and to generate metabolic pathway annotations for these signals. To pair up signals measured in different datasets, where retention times (RT) are often not comparable or even available, we implemented an imputation-based approach that only requires mass-to-charge ratios (m/z). As validation, we treated each shared known metabolite as an unmatched signal and showed that PAIRUP-MS correctly matched 70–88% of these metabolites from among thousands of signals, equaling or outperforming a standard m/z- and RT-based approach. We performed further validation using genetic data: the most stringent set of matched signals and shared knowns showed comparable consistency of genetic associations across datasets. Next, we developed a pathway reconstitution method to annotate unknown signals using curated metabolic pathways containing known metabolites. We performed genetic validation for the generated annotations, showing that annotated signals associated with gene variants were more likely to be enriched for pathways functionally related to the genes compared to random expectation. Finally, we applied PAIRUP-MS to study associations between metabolites and genetic variants or body mass index (BMI) across multiple datasets, identifying up to ~6 times more significant signals and many more BMI-associated pathways compared to the standard practice of only analyzing known metabolites. These results demonstrate that PAIRUP-MS enables analysis of unknown signals in a robust, biologically meaningful manner and provides a path to more comprehensive, well-powered studies of untargeted metabolomics data.

]]>
<![CDATA[EMBL2checklists: A Python package to facilitate the user-friendly submission of plant and fungal DNA barcoding sequences to ENA]]> https://www.researchpad.co/article/5c40f7a5d5eed0c48438651a

Background

The submission of DNA sequences to public sequence databases is an essential, but insufficiently automated step in the process of generating and disseminating novel DNA sequence data. Despite the centrality of database submissions to biological research, the range of available software tools that facilitate the preparation of sequence data for database submissions is low, especially for sequences generated via plant and fungal DNA barcoding. Current submission procedures can be complex and prohibitively time expensive for any but a small number of input sequences. A user-friendly software tool is needed that streamlines the file preparation for database submissions of DNA sequences that are commonly generated in plant and fungal DNA barcoding.

Methods

A Python package was developed that converts DNA sequences from the common EMBL and GenBank flat file formats to submission-ready, tab-delimited spreadsheets (so-called ‘checklists’) for a subsequent upload to the annotated sequence section of the European Nucleotide Archive (ENA). The software tool, titled ‘EMBL2checklists’, automatically converts DNA sequences, their annotation features, and associated metadata into the idiosyncratic format of marker-specific ENA checklists and, thus, generates files that can be uploaded via the interactive Webin submission system of ENA.

Results

EMBL2checklists provides a simple, platform-independent tool that automates the conversion of common DNA barcoding sequences into easily editable spreadsheets that require no further processing but their upload to ENA via the interactive Webin submission system. The software is equipped with an intuitive graphical as well as an efficient command-line interface for its operation. The utility of the software is illustrated by its application in four recent investigations, including plant phylogenetic and fungal metagenomic studies.

Discussion

EMBL2checklists bridges the gap between common software suites for DNA sequence assembly and annotation and the interactive data submission process of ENA. It represents an easy-to-use solution for plant and fungal biologists without bioinformatics expertise to generate submission-ready checklists from common DNA sequence data. It allows the post-processing of checklists as well as work-sharing during the submission process and solves a critical bottleneck in the effort to increase participation in public data sharing.

]]>
<![CDATA[Susceptibility of consolidated procedural memory to interference is independent of its active task-based retrieval]]> https://www.researchpad.co/article/5c6059ecd5eed0c4847cc45c

Reconsolidation theory posits that upon retrieval, consolidated memories are destabilized and need to be restabilized in order to persist. It has been suggested that experience with a competitive task immediately after memory retrieval may interrupt these restabilization processes leading to memory loss. Indeed, using a motor sequence learning paradigm, we have recently shown that, in humans, interference training immediately after active task-based retrieval of the consolidated motor sequence knowledge may negatively affect its performance levels. Assessing changes in tapping pattern before and after interference training, we also demonstrated that this performance deficit more likely indicates a genuine memory loss rather than an initial failure of memory retrieval. Here, applying a similar approach, we tested the necessity of the hypothetical retrieval-induced destabilization of motor memory to allow its impairment. The impact of memory retrieval on performance of a new motor sequence knowledge acquired during the interference training was also evaluated. Similar to the immediate post-retrieval interference, interference training alone without the preceding active task-based memory retrieval was also associated with impairment of the pre-established motor sequence memory. Performance levels of the sequence trained during the interference training, on the other hand, were impaired only if this training was given immediately after memory retrieval. Noteworthy, an 8-hour interval between memory retrieval and interference allowed to express intact performance levels for both sequences. The current results suggest that susceptibility of the consolidated motor memory to behavioral interference is independent of its active task-based retrieval. Differential effects of memory retrieval on performance levels of the new motor sequence encoded during the interference training further suggests that memory retrieval may influence the way new information is stored by facilitating its integration within the retrieved memory trace. Thus, impairment of the pre-established motor memory may reflect interference from a competing memory trace rather than involve interruption of reconsolidation.

]]>
<![CDATA[A method for the detection and characterization of technology fronts: Analysis of the dynamics of technological change in 3D printing technology]]> https://www.researchpad.co/article/5c3d0112d5eed0c4840380e9

This paper presents a method for the identification of the “technology fronts”—core technological solutions—underlying a certain broad technology, and the characterization of their change dynamics. We propose an approach based on the Latent Dirichlet Allocation (LDA) model combined with patent data analysis and text mining techniques for the identification and dynamic characterization of the main fronts where actual technological solutions are put into practice. 3D printing technology has been selected to put our method into practice for its market emergence and multidisciplinarity. The results show two highly relevant and specialized fronts strongly related with mechanical design that evolve gradually, in our opinion acting as enabling technologies. On the other side, we detected three fronts undergoing significant changes, namely layer-by-layer multimaterial manufacturing, data processing and stereolithograpy techniques. Laser and electron-beam based technologies take shape in the latter years and show signs of becoming enabling technologies in the future. The technology fronts and data revealed by our method have been convincing to experts and coincident with many technology trends already pointed out in technical reports and scientific literature.

]]>
<![CDATA[Beyond opinion classification: Extracting facts, opinions and experiences from health forums]]> https://www.researchpad.co/article/5c3fa56ad5eed0c484ca4115

Introduction

Surveys indicate that patients, particularly those suffering from chronic conditions, strongly benefit from the information found in social networks and online forums. One challenge in accessing online health information is to differentiate between factual and more subjective information. In this work, we evaluate the feasibility of exploiting lexical, syntactic, semantic, network-based and emotional properties of texts to automatically classify patient-generated contents into three types: “experiences”, “facts” and “opinions”, using machine learning algorithms. In this context, our goal is to develop automatic methods that will make online health information more easily accessible and useful for patients, professionals and researchers.

Material and methods

We work with a set of 3000 posts to online health forums in breast cancer, morbus crohn and different allergies. Each sentence in a post is manually labeled as “experience”, “fact” or “opinion”. Using this data, we train a support vector machine algorithm to perform classification. The results are evaluated in a 10-fold cross validation procedure.

Results

Overall, we find that it is possible to predict the type of information contained in a forum post with a very high accuracy (over 80 percent) using simple text representations such as word embeddings and bags of words. We also analyze more complex features such as those based on the network properties, the polarity of words and the verbal tense of the sentences and show that, when combined with the previous ones, they can boost the results.

]]>
<![CDATA[Validation of modified radio-frequency identification tag firmware, using an equine population case study]]> https://www.researchpad.co/article/5c3fa5fcd5eed0c484caad7f

Background

Contact networks can be used to assess disease spread potential within a population. However, the data required to generate the networks can be challenging to collect. One method of collecting this type of data is by using radio-frequency identification (RFID) tags. The OpenBeacon RFID system generally consists of tags and readers. Communicating tags should be within 10m of the readers, which are powered by an external power source. The readers are challenging to implement in agricultural settings due to the lack of a power source and the large area needed to be covered.

Methods

OpenBeacon firmware was modified to use the tag’s onboard flash memory for data storage. The tags were deployed within an equine facility for a 7-day period. Tags were attached to the horses’ halters, worn by facility staff, and placed in strategic locations around the facility to monitor which participants had contact with the specified locations during the study period. When the tags came within 2m of each other, they recorded the contact event participant IDs, and start and end times. At the end of the study period, the data were downloaded to a computer and analyzed using network analysis methods.

Results

The resulting networks were plausible given the facility schedule as described in a survey completed by the facility manager. Furthermore, changes in the daily facility operations as described in the survey were reflected in the tag-collected data. In terms of the battery life, 88% of batteries maintained a charge for at least 6 days. Lastly, no consistent trends were evident in the horses’ centrality metrics.

Discussion

This study demonstrates the utility of RFID tags for the collection of equine contact data. Future work should include the collection of contact data from multiple equine facilities to better characterize equine disease spread potential in Ontario.

]]>