ResearchPad - operator-theory https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Association test using Copy Number Profile Curves (CONCUR) enhances power in rare copy number variant analysis]]> https://www.researchpad.co/article/elastic_article_14642 Copy number variants comprise a large proportion of variation in human genomes. Large rare CNVs, especially those disrupting genes or changing the dosages of genes, can carry relatively strong risks for neurodevelopmental and neuropsychiatric disorders. Kernel-based association methods have been developed for the analysis of rare CNVs and shown to be a valuable tool. Kernel methods model the collective effect of rare CNVs using flexible kernel functions that capture the characteristics of CNVs and measure CNV similarity of individual pairs. Typically kernels are created by summarizing similarity within an artificially defined “CNV locus” and then collapsing across all loci. In this work, we propose a new kernel-based test, CONCUR, that is based on the CNV location information contained in standard processing of the variants and which obviates the need for arbitrarily defined CNV loci. CONCUR quantifies similarity between individual pairs as the common area under their copy number profile curves and is designed to detect CNV dosage, length and dosage-length interaction effects. In simulation studies and real data analysis, we demonstrate the ability of the CONCUR test to detect CNV effects under diverse CNV architectures with power and robustness over existing methods.

]]>
<![CDATA[Resolution invariant wavelet features of melanoma studied by SVM classifiers]]> https://www.researchpad.co/article/5c648cd2d5eed0c484c81893

This article refers to the Computer Aided Diagnosis of the melanoma skin cancer. We derive wavelet-based features of melanoma from the dermoscopic images of pigmental skin lesions and apply binary C-SVM classifiers to discriminate malignant melanoma from dysplastic nevus. The aim of this research is to select the most efficient model of the SVM classifier for various image resolutions and to search for the best resolution-invariant wavelet bases. We show AUC as a function of the wavelet number and SVM kernels optimized by the Bayesian search for two independent data sets. Our results are compatible with the previous experiments to discriminate melanoma in dermoscopy images with ensembling and feed-forward neural networks.

]]>
<![CDATA[Improving the calling of non-invasive prenatal testing on 13-/18-/21-trisomy by support vector machine discrimination]]> https://www.researchpad.co/article/5c117b86d5eed0c48469989c

With the advance of next-generation sequencing (NGS) technologies, non-invasive prenatal testing (NIPT) has been developed and employed in fetal aneuploidy screening on 13-/18-/21-trisomies through detecting cell-free fetal DNA (cffDNA) in maternal blood. Although Z-test is widely used in NIPT NGS data analysis, there is still necessity to improve its accuracy for reducing a) false negatives and false positives, and b) the ratio of unclassified data, so as to lower the potential harm to patients as well as the induced cost of retests. Combining the multiple Z-tests with indexes of clinical signs and quality control, features were collected from the known samples and scaled for model training using support vector machine (SVM). We trained SVM models from the qualified NIPT NGS data that Z-test can discriminate and tested the performance on the data that Z-test cannot discriminate. On screenings of 13-/18-/21-trisomies, the trained SVM models achieved 100% accuracies in both internal validations and unknown sample predictions. It is shown that other machine learning (ML) models can also achieve similar high accuracy, and SVM model is most robust in this study. Moreover, four false positives and four false negatives caused by Z-test were corrected by using the SVM models. To our knowledge, this is one of the earliest studies to employ SVM in NIPT NGS data analysis. It is expected to replace Z-test in clinical practice.

]]>
<![CDATA[An Enhanced Region Proposal Network for object detection using deep learning method]]> https://www.researchpad.co/article/5c0e9891d5eed0c484eaadaf

Faster Region-based Convolutional Network (Faster R-CNN) is a state-of-the-art object detection method. However, the object detection effect of Faster R-CNN is not good based on the Region Proposal Network (RPN). Inspired by RPN of Faster R-CNN, we propose a novel proposal generation method called Enhanced Region Proposal Network (ERPN). Four improvements are presented in ERPN. Firstly, our proposed deconvolutional feature pyramid network (DFPN) is introduced to improve the quality of region proposals. Secondly, novel anchor boxes are designed with interspersed scales and adaptive aspect ratios. Thereafter, the capability of object localization is increased. Thirdly, a particle swarm optimization (PSO) based support vector machine (SVM), termed PSO-SVM, is developed to distinguish the positive and negative anchor boxes. Fourthly, the classification part of multi-task loss function in RPN is improved. Consequently, the effect of classification loss is strengthened. In this study, our proposed ERPN is compared with five object detection methods on both PASCAL VOC and COCO data sets. For the VGG-16 model, our ERPN obtains 78.6% mAP on VOC 2007 data set, 74.4% mAP on VOC 2012 data set and 31.7% on COCO data set. The performance of ERPN is the best among the comparison object detection methods. Furthermore, the detection speed of ERPN is 5.8 fps. Additionally, ERPN obtains good effect on small object detection.

]]>
<![CDATA[Introducing chaos behavior to kernel relevance vector machine (RVM) for four-class EEG classification]]> https://www.researchpad.co/article/5b4a0345463d7e3e7a97116d

This paper addresses a chaos kernel function for the relevance vector machine (RVM) in EEG signal classification, which is an important component of Brain-Computer Interface (BCI). The novel kernel function has evolved from a chaotic system, which is inspired by the fact that human brain signals depict some chaotic characteristics and behaviors. By introducing the chaotic dynamics to the kernel function, the RVM will be enabled for higher classification capacity. The proposed method is validated within the framework of one versus one common spatial pattern (OVO-CSP) classifier to classify motor imagination (MI) of four movements in a public accessible dataset. To illustrate the performance of the proposed kernel function, Gaussian and Polynomial kernel functions are considered for comparison. Experimental results show that the proposed kernel function achieved higher accuracy than Gaussian and Polynomial kernel functions, which shows that the chaotic behavior consideration is helpful in the EEG signal classification.

]]>
<![CDATA[Instance-based generalization for human judgments about uncertainty]]> https://www.researchpad.co/article/5b28b5e2463d7e1340e24748

While previous studies have shown that human behavior adjusts in response to uncertainty, it is still not well understood how uncertainty is estimated and represented. As probability distributions are high dimensional objects, only constrained families of distributions with a low number of parameters can be specified from finite data. However, it is unknown what the structural assumptions are that the brain uses to estimate them. We introduce a novel paradigm that requires human participants of either sex to explicitly estimate the dispersion of a distribution over future observations. Judgments are based on a very small sample from a centered, normally distributed random variable that was suggested by the framing of the task. This probability density estimation task could optimally be solved by inferring the dispersion parameter of a normal distribution. We find that although behavior closely tracks uncertainty on a trial-by-trial basis and resists an explanation with simple heuristics, it is hardly consistent with parametric inference of a normal distribution. Despite the transparency of the simple generating process, participants estimate a distribution biased towards the observed instances while still strongly generalizing beyond the sample. The inferred internal distributions can be well approximated by a nonparametric mixture of spatially extended basis distributions. Thus, our results suggest that fluctuations have an excessive effect on human uncertainty judgments because of representations that can adapt overly flexibly to the sample. This might be of greater utility in more general conditions in structurally uncertain environments.

]]>
<![CDATA[Lawsuit lead time prediction: Comparison of data mining techniques based on categorical response variable]]> https://www.researchpad.co/article/5b28b406463d7e1292999390

The quality of the judicial system of a country can be verified by the overall length time of lawsuits, or the lead time. When the lead time is excessive, a country’s economy can be affected, leading to the adoption of measures such as the creation of the Saturn Center in Europe. Although there are performance indicators to measure the lead time of lawsuits, the analysis and the fit of prediction models are still underdeveloped themes in the literature. To contribute to this subject, this article compares different prediction models according to their accuracy, sensitivity, specificity, precision, and F1 measure. The database used was from TRF4—the Tribunal Regional Federal da 4a Região—a federal court in southern Brazil, corresponding to the 2nd Instance civil lawsuits completed in 2016. The models were fitted using support vector machine, naive Bayes, random forests, and neural network approaches with categorical predictor variables. The lead time of the 2nd Instance judgment was selected as the response variable measured in days and categorized in bands. The comparison among the models showed that the support vector machine and random forest approaches produced measurements that were superior to those of the other models. The evaluation of the models was made using k-fold cross-validation similar to that applied to the test models.

]]>
<![CDATA[Firing-rate based network modeling of the dLGN circuit: Effects of cortical feedback on spatiotemporal response properties of relay cells]]> https://www.researchpad.co/article/5b07d0de463d7e0d4a37a6e7

Visually evoked signals in the retina pass through the dorsal geniculate nucleus (dLGN) on the way to the visual cortex. This is however not a simple feedforward flow of information: there is a significant feedback from cortical cells back to both relay cells and interneurons in the dLGN. Despite four decades of experimental and theoretical studies, the functional role of this feedback is still debated. Here we use a firing-rate model, the extended difference-of-Gaussians (eDOG) model, to explore cortical feedback effects on visual responses of dLGN relay cells. For this model the responses are found by direct evaluation of two- or three-dimensional integrals allowing for fast and comprehensive studies of putative effects of different candidate organizations of the cortical feedback. Our analysis identifies a special mixed configuration of excitatory and inhibitory cortical feedback which seems to best account for available experimental data. This configuration consists of (i) a slow (long-delay) and spatially widespread inhibitory feedback, combined with (ii) a fast (short-delayed) and spatially narrow excitatory feedback, where (iii) the excitatory/inhibitory ON-ON connections are accompanied respectively by inhibitory/excitatory OFF-ON connections, i.e. following a phase-reversed arrangement. The recent development of optogenetic and pharmacogenetic methods has provided new tools for more precise manipulation and investigation of the thalamocortical circuit, in particular for mice. Such data will expectedly allow the eDOG model to be better constrained by data from specific animal model systems than has been possible until now for cat. We have therefore made the Python tool pyLGN which allows for easy adaptation of the eDOG model to new situations.

]]>
<![CDATA[Estimation of the dispersal distances of an aphid-borne virus in a patchy landscape]]> https://www.researchpad.co/article/5af106d0463d7e336df9e544

Characterising the spatio-temporal dynamics of pathogens in natura is key to ensuring their efficient prevention and control. However, it is notoriously difficult to estimate dispersal parameters at scales that are relevant to real epidemics. Epidemiological surveys can provide informative data, but parameter estimation can be hampered when the timing of the epidemiological events is uncertain, and in the presence of interactions between disease spread, surveillance, and control. Further complications arise from imperfect detection of disease and from the huge number of data on individual hosts arising from landscape-level surveys. Here, we present a Bayesian framework that overcomes these barriers by integrating over associated uncertainties in a model explicitly combining the processes of disease dispersal, surveillance and control. Using a novel computationally efficient approach to account for patch geometry, we demonstrate that disease dispersal distances can be estimated accurately in a patchy (i.e. fragmented) landscape when disease control is ongoing. Applying this model to data for an aphid-borne virus (Plum pox virus) surveyed for 15 years in 605 orchards, we obtain the first estimate of the distribution of flight distances of infectious aphids at the landscape scale. About 50% of aphid flights terminate beyond 90 m, which implies that most infectious aphids leaving a tree land outside the bounds of a 1-ha orchard. Moreover, long-distance flights are not rare–10% of flights exceed 1 km. By their impact on our quantitative understanding of winged aphid dispersal, these results can inform the design of management strategies for plant viruses, which are mainly aphid-borne.

]]>
<![CDATA[Model Based Predictive Control of Multivariable Hammerstein Processes with Fuzzy Logic Hypercube Interpolated Models]]> https://www.researchpad.co/article/5989daadab0ee8fa60baa1c0

This paper introduces the Fuzzy Logic Hypercube Interpolator (FLHI) and demonstrates applications in control of multiple-input single-output (MISO) and multiple-input multiple-output (MIMO) processes with Hammerstein nonlinearities. FLHI consists of a Takagi-Sugeno fuzzy inference system where membership functions act as kernel functions of an interpolator. Conjunction of membership functions in an unitary hypercube space enables multivariable interpolation of N-dimensions. Membership functions act as interpolation kernels, such that choice of membership functions determines interpolation characteristics, allowing FLHI to behave as a nearest-neighbor, linear, cubic, spline or Lanczos interpolator, to name a few. The proposed interpolator is presented as a solution to the modeling problem of static nonlinearities since it is capable of modeling both a function and its inverse function. Three study cases from literature are presented, a single-input single-output (SISO) system, a MISO and a MIMO system. Good results are obtained regarding performance metrics such as set-point tracking, control variation and robustness. Results demonstrate applicability of the proposed method in modeling Hammerstein nonlinearities and their inverse functions for implementation of an output compensator with Model Based Predictive Control (MBPC), in particular Dynamic Matrix Control (DMC).

]]>
<![CDATA[Explaining Support Vector Machines: A Color Based Nomogram]]> https://www.researchpad.co/article/5989daa2ab0ee8fa60ba6257

Problem setting

Support vector machines (SVMs) are very popular tools for classification, regression and other problems. Due to the large choice of kernels they can be applied with, a large variety of data can be analysed using these tools. Machine learning thanks its popularity to the good performance of the resulting models. However, interpreting the models is far from obvious, especially when non-linear kernels are used. Hence, the methods are used as black boxes. As a consequence, the use of SVMs is less supported in areas where interpretability is important and where people are held responsible for the decisions made by models.

Objective

In this work, we investigate whether SVMs using linear, polynomial and RBF kernels can be explained such that interpretations for model-based decisions can be provided. We further indicate when SVMs can be explained and in which situations interpretation of SVMs is (hitherto) not possible. Here, explainability is defined as the ability to produce the final decision based on a sum of contributions which depend on one single or at most two input variables.

Results

Our experiments on simulated and real-life data show that explainability of an SVM depends on the chosen parameter values (degree of polynomial kernel, width of RBF kernel and regularization constant). When several combinations of parameter values yield the same cross-validation performance, combinations with a lower polynomial degree or a larger kernel width have a higher chance of being explainable.

Conclusions

This work summarizes SVM classifiers obtained with linear, polynomial and RBF kernels in a single plot. Linear and polynomial kernels up to the second degree are represented exactly. For other kernels an indication of the reliability of the approximation is presented. The complete methodology is available as an R package and two apps and a movie are provided to illustrate the possibilities offered by the method.

]]>
<![CDATA[Unsupervised Retinal Vessel Segmentation Using Combined Filters]]> https://www.researchpad.co/article/5989dacfab0ee8fa60bb5a61

Image segmentation of retinal blood vessels is a process that can help to predict and diagnose cardiovascular related diseases, such as hypertension and diabetes, which are known to affect the retinal blood vessels’ appearance. This work proposes an unsupervised method for the segmentation of retinal vessels images using a combined matched filter, Frangi’s filter and Gabor Wavelet filter to enhance the images. The combination of these three filters in order to improve the segmentation is the main motivation of this work. We investigate two approaches to perform the filter combination: weighted mean and median ranking. Segmentation methods are tested after the vessel enhancement. Enhanced images with median ranking are segmented using a simple threshold criterion. Two segmentation procedures are applied when considering enhanced retinal images using the weighted mean approach. The first method is based on deformable models and the second uses fuzzy C-means for the image segmentation. The procedure is evaluated using two public image databases, Drive and Stare. The experimental results demonstrate that the proposed methods perform well for vessel segmentation in comparison with state-of-the-art methods.

]]>
<![CDATA[Advanced Online Survival Analysis Tool for Predictive Modelling in Clinical Data Science]]> https://www.researchpad.co/article/5989db4dab0ee8fa60bdaf75

One of the prevailing applications of machine learning is the use of predictive modelling in clinical survival analysis. In this work, we present our view of the current situation of computer tools for survival analysis, stressing the need of transferring the latest results in the field of machine learning to biomedical researchers. We propose a web based software for survival analysis called OSA (Online Survival Analysis), which has been developed as an open access and user friendly option to obtain discrete time, predictive survival models at individual level using machine learning techniques, and to perform standard survival analysis. OSA employs an Artificial Neural Network (ANN) based method to produce the predictive survival models. Additionally, the software can easily generate survival and hazard curves with multiple options to personalise the plots, obtain contingency tables from the uploaded data to perform different tests, and fit a Cox regression model from a number of predictor variables. In the Materials and Methods section, we depict the general architecture of the application and introduce the mathematical background of each of the implemented methods. The study concludes with examples of use showing the results obtained with public datasets.

]]>
<![CDATA[Some Muirhead Mean Operators for Intuitionistic Fuzzy Numbers and Their Applications to Group Decision Making]]> https://www.researchpad.co/article/5989da3aab0ee8fa60b87b01

Muirhead mean (MM) is a well-known aggregation operator which can consider interrelationships among any number of arguments assigned by a variable vector. Besides, it is a universal operator since it can contain other general operators by assigning some special parameter values. However, the MM can only process the crisp numbers. Inspired by the MM’ advantages, the aim of this paper is to extend MM to process the intuitionistic fuzzy numbers (IFNs) and then to solve the multi-attribute group decision making (MAGDM) problems. Firstly, we develop some intuitionistic fuzzy Muirhead mean (IFMM) operators by extending MM to intuitionistic fuzzy information. Then, we prove some properties and discuss some special cases with respect to the parameter vector. Moreover, we present two new methods to deal with MAGDM problems with the intuitionistic fuzzy information based on the proposed MM operators. Finally, we verify the validity and reliability of our methods by using an application example, and analyze the advantages of our methods by comparing with other existing methods.

]]>
<![CDATA[A Directed Acyclic Graph-Large Margin Distribution Machine Model for Music Symbol Classification]]> https://www.researchpad.co/article/5989da07ab0ee8fa60b76622

Optical Music Recognition (OMR) has received increasing attention in recent years. In this paper, we propose a classifier based on a new method named Directed Acyclic Graph-Large margin Distribution Machine (DAG-LDM). The DAG-LDM is an improvement of the Large margin Distribution Machine (LDM), which is a binary classifier that optimizes the margin distribution by maximizing the margin mean and minimizing the margin variance simultaneously. We modify the LDM to the DAG-LDM to solve the multi-class music symbol classification problem. Tests are conducted on more than 10000 music symbol images, obtained from handwritten and printed images of music scores. The proposed method provides superior classification capability and achieves much higher classification accuracy than the state-of-the-art algorithms such as Support Vector Machines (SVMs) and Neural Networks (NNs).

]]>
<![CDATA[Visual Perception-Based Statistical Modeling of Complex Grain Image for Product Quality Monitoring and Supervision on Assembly Production Line]]> https://www.researchpad.co/article/5989da15ab0ee8fa60b7adae

Computer vision as a fast, low-cost, noncontact, and online monitoring technology has been an important tool to inspect product quality, particularly on a large-scale assembly production line. However, the current industrial vision system is far from satisfactory in the intelligent perception of complex grain images, comprising a large number of local homogeneous fragmentations or patches without distinct foreground and background. We attempt to solve this problem based on the statistical modeling of spatial structures of grain images. We present a physical explanation in advance to indicate that the spatial structures of the complex grain images are subject to a representative Weibull distribution according to the theory of sequential fragmentation, which is well known in the continued comminution of ore grinding. To delineate the spatial structure of the grain image, we present a method of multiscale and omnidirectional Gaussian derivative filtering. Then, a product quality classifier based on sparse multikernel–least squares support vector machine is proposed to solve the low-confidence classification problem of imbalanced data distribution. The proposed method is applied on the assembly line of a food-processing enterprise to classify (or identify) automatically the production quality of rice. The experiments on the real application case, compared with the commonly used methods, illustrate the validity of our method.

]]>
<![CDATA[Permissible Home Range Estimation (PHRE) in Restricted Habitats: A New Algorithm and an Evaluation for Sea Otters]]> https://www.researchpad.co/article/5989da41ab0ee8fa60b89e80

Parametric and nonparametric kernel methods dominate studies of animal home ranges and space use. Most existing methods are unable to incorporate information about the underlying physical environment, leading to poor performance in excluding areas that are not used. Using radio-telemetry data from sea otters, we developed and evaluated a new algorithm for estimating home ranges (hereafter Permissible Home Range Estimation, or “PHRE”) that reflects habitat suitability. We began by transforming sighting locations into relevant landscape features (for sea otters, coastal position and distance from shore). Then, we generated a bivariate kernel probability density function in landscape space and back-transformed this to geographic space in order to define a permissible home range. Compared to two commonly used home range estimation methods, kernel densities and local convex hulls, PHRE better excluded unused areas and required a smaller sample size. Our PHRE method is applicable to species whose ranges are restricted by complex physical boundaries or environmental gradients and will improve understanding of habitat-use requirements and, ultimately, aid in conservation efforts.

]]>
<![CDATA[Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric]]> https://www.researchpad.co/article/5989db5cab0ee8fa60be0289

Data imbalance is frequently encountered in biomedical applications. Resampling techniques can be used in binary classification to tackle this issue. However such solutions are not desired when the number of samples in the small class is limited. Moreover the use of inadequate performance metrics, such as accuracy, lead to poor generalization results because the classifiers tend to predict the largest size class. One of the good approaches to deal with this issue is to optimize performance metrics that are designed to handle data imbalance. Matthews Correlation Coefficient (MCC) is widely used in Bioinformatics as a performance metric. We are interested in developing a new classifier based on the MCC metric to handle imbalanced data. We derive an optimal Bayes classifier for the MCC metric using an approach based on Frechet derivative. We show that the proposed algorithm has the nice theoretical property of consistency. Using simulated data, we verify the correctness of our optimality result by searching in the space of all possible binary classifiers. The proposed classifier is evaluated on 64 datasets from a wide range data imbalance. We compare both classification performance and CPU efficiency for three classifiers: 1) the proposed algorithm (MCC-classifier), the Bayes classifier with a default threshold (MCC-base) and imbalanced SVM (SVM-imba). The experimental evaluation shows that MCC-classifier has a close performance to SVM-imba while being simpler and more efficient.

]]>
<![CDATA[Evidence conflict measure based on OWA operator in open world]]> https://www.researchpad.co/article/5989db5cab0ee8fa60bdfee3

Dempster-Shafer evidence theory has been extensively used in many information fusion systems since it was proposed by Dempster and extended by Shafer. Many scholars have been conducted on conflict management of Dempster-Shafer evidence theory in past decades. However, how to determine a potent parameter to measure evidence conflict, when the given environment is in an open world, namely the frame of discernment is incomplete, is still an open issue. In this paper, a new method which combines generalized conflict coefficient, generalized evidence distance, and generalized interval correlation coefficient based on ordered weighted averaging (OWA) operator, to measure the conflict of evidence is presented. Through ordered weighted average of these three parameters, the combinatorial coefficient can still measure the conflict effectively when one or two parameters are not valid. Several numerical examples demonstrate the effectiveness of the proposed method.

]]>
<![CDATA[An improved chaotic fruit fly optimization based on a mutation strategy for simultaneous feature selection and parameter optimization for SVM and its applications]]> https://www.researchpad.co/article/5989db51ab0ee8fa60bdc2a4

This paper proposes a new support vector machine (SVM) optimization scheme based on an improved chaotic fly optimization algorithm (FOA) with a mutation strategy to simultaneously perform parameter setting turning for the SVM and feature selection. In the improved FOA, the chaotic particle initializes the fruit fly swarm location and replaces the expression of distance for the fruit fly to find the food source. However, the proposed mutation strategy uses two distinct generative mechanisms for new food sources at the osphresis phase, allowing the algorithm procedure to search for the optimal solution in both the whole solution space and within the local solution space containing the fruit fly swarm location. In an evaluation based on a group of ten benchmark problems, the proposed algorithm’s performance is compared with that of other well-known algorithms, and the results support the superiority of the proposed algorithm. Moreover, this algorithm is successfully applied in a SVM to perform both parameter setting turning for the SVM and feature selection to solve real-world classification problems. This method is called chaotic fruit fly optimization algorithm (CIFOA)-SVM and has been shown to be a more robust and effective optimization method than other well-known methods, particularly in terms of solving the medical diagnosis problem and the credit card problem.

]]>