ResearchPad - audio-signal-processing https://www.researchpad.co Default RSS Feed en-us © 2020 Newgen KnowledgeWorks <![CDATA[Adaptation to unstable coordination patterns in individual and joint actions]]> https://www.researchpad.co/article/elastic_article_7665 Previous research on interlimb coordination has shown that some coordination patterns are more stable than others, and function as attractors in the space of possible phase relations between different rhythmic movements. The canonical coordination patterns, i.e. the two most stable phase relations, are in-phase (0 degree) and anti-phase (180 degrees). Yet, musicians are able to perform other coordination patterns in intrapersonal as well as in interpersonal coordination with remarkable precision. This raises the question of how music experts manage to produce these unstable patterns of movement coordination. In the current study, we invited participants with at least five years of training on a musical instrument. We used an adaptation paradigm to address two factors that may facilitate producing unstable coordination patterns. First, we investigated adaptation in different coordination settings, to test the hypothesis that the lower coupling strength between individuals during joint performance makes it easier to achieve stability outside of the canonical patterns than the stronger coupling during individual bimanual performance. Second, we investigated whether adding to the structure of action effects may support achieving unstable coordination patterns, both intra- and inter-individually. The structure of action effects was strengthened by adding a melodic contour to the action effects, a measure that has been shown to improve the acquisition of bimanual coordination skills. Adaptation performance was measured both in terms of asynchrony and variability thereof. As predicted, we found that producing unstable patterns benefitted from the weaker coupling during joint performance. Surprisingly, the structure of action effects did not help with achieving unstable coordination patterns.

]]>
<![CDATA[Speech perception in noise: Impact of directional microphones in users of combined electric-acoustic stimulation]]> https://www.researchpad.co/article/5c8977a1d5eed0c4847d31f3

Objectives

Combined electric-acoustic stimulation (EAS) is a well-accepted therapeutic treatment for cochlear implant (CI) users with residual hearing in the low frequencies but severe to profound hearing loss in the high frequencies. The recently introduced SONNETeas audio processor offers different microphone directionality (MD) settings and wind noise reduction (WNR) as front-end processing. The aim of this study was to compare speech perception in quiet and noise between two EAS audio processors DUET 2 and SONNETeas, to assess the impact of MD and WNR on speech perception in EAS users in the absence of wind. Furthermore, subjective rating of hearing performance was registered.

Method

Speech perception and subjective rating with SONNETeas or DUET 2 audio processor were assessed in 10 experienced EAS users. Speech perception was measured in quiet and in a diffuse noise setup (MSNF). The SONNETeas processor was tested with three MD settings omnidirectional/natural/adaptive and with different intensities of WNR. Subjective rating of auditory benefit and sound quality was rated using two questionnaires.

Results

There was no significant difference between DUET 2 and SONNETeas processor using the omnidirectional microphone in quiet and in noise. There was a significant improvement in SRT with MD settings natural (2.2 dB) and adaptive (3.6 dB). No detrimental effect of the WNR algorithm on speech perception was found in the absence of wind. Sound quality was rated as “moderate” for both audio processors.

Conclusions

The different MD settings of the SONNETeas can provide EAS users with better speech perception compared to an omnidirectional microphone. Concerning speech perception in quiet and quality of life, the performance of the DUET 2 and SONNETeas audio processors was comparable.

]]>
<![CDATA[Evolutionary relationships of courtship songs in the parasitic wasp genus, Cotesia (Hymenoptera: Braconidae)]]> https://www.researchpad.co/article/5c390ba6d5eed0c48491db08

Acoustic signals play an important role in premating isolation based on sexual selection within many taxa. Many male parasitic wasps produce characteristic courtship songs used by females in mate selection. In Cotesia (Hymenoptera: Braconidae: Microgastrinae), courtship songs are generated by wing fanning with repetitive pulses in stereotypical patterns. Our objectives were to sample the diversity of courtship songs within Cotesia and to identify e underlying patterns of differentiation. We compared songs among 12 of ca. 80 Cotesia species in North America, including ten species that have not been recorded previously. For Cotesia congregata, we compared songs of wasps originating from six different host-foodplant sources, two of which are considered incipient species. Songs of emergent males from wild caterpillar hosts in five different families were recorded, and pattern, frequency, and duration of song elements analyzed. Principal component analysis converted the seven elements characterized into four uncorrelated components used in a hierarchical cluster analysis and grouped species by similarity of song structure. Species songs varied significantly in duration of repeating pulse and buzz elements and/or in fundamental frequency. Cluster analysis resolved similar species groups in agreement with the most recent molecular phylogeny for Cotesia spp., indicating the potential for using courtship songs as a predictor of genetic relatedness. Courtship song analysis may aid in identifying closely related cryptic species that overlap spatially, and provide insight into the evolution of this highly diverse and agriculturally important taxon.

]]>
<![CDATA[Optimizing beat synchronized running to music]]> https://www.researchpad.co/article/5c12cf88d5eed0c4849148c2

The use of music and specifically tempo-matched music has been shown to affect running performance. But can we maximize the synchronization of movements to music and does maximum synchronization influence kinematics and motivation? In this study, we explore the effect of different types of music-to-movement alignment strategies on phase coherence, cadence and motivation. These strategies were compared to a control condition where the music tempo was deliberately not aligned to the running cadence. Results show that without relative phase alignment, a negative mean asynchrony (NMA) of footfall timings with respect to the beats is obtained. This means that footfalls occurred slightly before the beat and that beats were anticipated. Convergence towards this NMA or preferred relative phase angle was facilitated when the first music beat of a new song started close to the step, which means that entrainment occurred. The results also show that using tempo and phase alignment, the relative phase can be manipulated or forced in a certain angle with a high degree of accuracy. Ensuring negative angles larger than NMA (step before beat) results in increased motivation and decreasing cadence. Running in NMA or preferred relative phase angles results in a null effect on cadence. Ensuring a positive phase angle with respect to NMA results in higher motivation and higher cadence. None of the manipulations resulted in change in perceived exhaustion or a change in velocity. Results also indicate that gender plays an important role when using forced phase algorithms: effects were more pronounced for the female population than for the male population. The implementation of the proposed alignment strategies and control of beat timing while running opens possibilities optimizing the individual running cadence and motivation.

]]>
<![CDATA[Scalable preprocessing of high volume environmental acoustic data for bioacoustic monitoring]]> https://www.researchpad.co/article/5b6dda12463d7e7491b405eb

In this work, we examine the problem of efficiently preprocessing and denoising high volume environmental acoustic data, which is a necessary step in many bird monitoring tasks. Preprocessing is typically made up of multiple steps which are considered separately from each other. These are often resource intensive, particularly because the volume of data involved is high. We focus on addressing two challenges within this problem: how to combine existing preprocessing tasks while maximising the effectiveness of each step, and how to process this pipeline quickly and efficiently, so that it can be used to process high volumes of acoustic data. We describe a distributed system designed specifically for this problem, utilising a master-slave model with data parallelisation. By investigating the impact of individual preprocessing tasks on each other, and their execution times, we determine an efficient and accurate order for preprocessing tasks within the distributed system. We find that, using a single core, our pipeline executes 1.40 times faster compared to manually executing all preprocessing tasks. We then apply our pipeline in the distributed system and evaluate its performance. We find that our system is capable of preprocessing bird acoustic recordings at a rate of 174.8 seconds of audio per second of real time with 32 cores over 8 virtual machines, which is 21.76 times faster than a serial process.

]]>
<![CDATA[Upper nasal hemifield location and nonspatial auditory tones accelerate visual detection during dichoptic viewing]]> https://www.researchpad.co/article/5b603631463d7e4090b7ce20

Visual performance is asymmetric across the visual field, but locational biases that occur during dichoptic viewing are not well understood. In this study, we characterized horizontal, vertical and naso-temporal biases in visual target detection during dichoptic stimulation and explored whether the detection was facilitated by non-spatial auditory tones associated with the target’s location.

The detection time for single monocular targets that were suppressed from view with a 10 Hz dynamic noise mask presented to the other eye was measured at the 4° intercardinal location of each eye with the breaking Continuous Flash Suppression (b-CFS) technique. Each target was either combined with a sound (i.e., high or low pitch tone) that was congruent or incongruent with its vertical location (i.e., upper or lower visual field) or presented without a sound. The results indicated faster detection of targets in the upper rather than lower visual field and faster detection of targets in the nasal than temporal hemifield of each eye. Sounds generally accelerated target detection, but the tone pitch-elevation congruency did not further enhance performance. These findings suggest that visual detection during dichoptic viewing differs from standard viewing conditions with respect to location-related perceptual biases and crossmodal modulation of visual perception. These differences should be carefully considered in experimental designs employing dichoptic stimulation techniques and in display applications that utilize dichoptic viewing.

]]>
<![CDATA[Fidelity of Automatic Speech Processing for Adult and Child Talker Classifications]]> https://www.researchpad.co/article/5989d9d4ab0ee8fa60b65296

Automatic speech processing (ASP) has recently been applied to very large datasets of naturalistically collected, daylong recordings of child speech via an audio recorder worn by young children. The system developed by the LENA Research Foundation analyzes children's speech for research and clinical purposes, with special focus on of identifying and tagging family speech dynamics and the at-home acoustic environment from the auditory perspective of the child. A primary issue for researchers, clinicians, and families using the Language ENvironment Analysis (LENA) system is to what degree the segment labels are valid. This classification study evaluates the performance of the computer ASP output against 23 trained human judges who made about 53,000 judgements of classification of segments tagged by the LENA ASP. Results indicate performance consistent with modern ASP such as those using HMM methods, with acoustic characteristics of fundamental frequency and segment duration most important for both human and machine classifications. Results are likely to be important for interpreting and improving ASP output.

]]>
<![CDATA[Improvement of Source Number Estimation Method for Single Channel Signal]]> https://www.researchpad.co/article/5989daffab0ee8fa60bc61e5

Source number estimation methods for single channel signal have been investigated and the improvements for each method are suggested in this work. Firstly, the single channel data is converted to multi-channel form by delay process. Then, algorithms used in the array signal processing, such as Gerschgorin’s disk estimation (GDE) and minimum description length (MDL), are introduced to estimate the source number of the received signal. The previous results have shown that the MDL based on information theoretic criteria (ITC) obtains a superior performance than GDE at low SNR. However it has no ability to handle the signals containing colored noise. On the contrary, the GDE method can eliminate the influence of colored noise. Nevertheless, its performance at low SNR is not satisfactory. In order to solve these problems and contradictions, the work makes remarkable improvements on these two methods on account of the above consideration. A diagonal loading technique is employed to ameliorate the MDL method and a jackknife technique is referenced to optimize the data covariance matrix in order to improve the performance of the GDE method. The results of simulation have illustrated that the performance of original methods have been promoted largely.

]]>
<![CDATA[The Sound Sensation of Apical Electric Stimulation in Cochlear Implant Recipients with Contralateral Residual Hearing]]> https://www.researchpad.co/article/5989dabbab0ee8fa60baea66

Background

Studies using vocoders as acoustic simulators of cochlear implants have generally focused on simulation of speech understanding, gender recognition, or music appreciation. The aim of the present experiment was to study the auditory sensation perceived by cochlear implant (CI) recipients with steady electrical stimulation on the most-apical electrode.

Methodology/Principal Findings

Five unilateral CI users with contralateral residual hearing were asked to vary the parameters of an acoustic signal played to the non-implanted ear, in order to match its sensation to that of the electric stimulus. They also provided a rating of similarity between each acoustic sound they selected and the electric stimulus. On average across subjects, the sound rated as most similar was a complex signal with a concentration of energy around 523 Hz. This sound was inharmonic in 3 out of 5 subjects with a moderate, progressive increase in the spacing between the frequency components.

Conclusions/Significance

For these subjects, the sound sensation created by steady electric stimulation on the most-apical electrode was neither a white noise nor a pure tone, but a complex signal with a progressive increase in the spacing between the frequency components in 3 out of 5 subjects. Knowing whether the inharmonic nature of the sound was related to the fact that the non-implanted ear was impaired has to be explored in single-sided deafened patients with a contralateral CI. These results may be used in the future to better understand peripheral and central auditory processing in relation to cochlear implants.

]]>
<![CDATA[Calibration Method of an Ultrasonic System for Temperature Measurement]]> https://www.researchpad.co/article/5989d9e4ab0ee8fa60b6a923

System calibration is fundamental to the overall accuracy of the ultrasonic temperature measurement, and it is basically involved in accurately measuring the path length and the system latency of the ultrasonic system. This paper proposes a method of high accuracy system calibration. By estimating the time delay between the transmitted signal and the received signal at several different temperatures, the calibration equations are constructed, and the calibrated results are determined with the use of the least squares algorithm. The formulas are deduced for calculating the calibration uncertainties, and the possible influential factors are analyzed. The experimental results in distilled water show that the calibrated path length and system latency can achieve uncertainties of 0.058 mm and 0.038 μs, respectively, and the temperature accuracy is significantly improved by using the calibrated results. The temperature error remains within ±0.04°C consistently, and the percentage error is less than 0.15%.

]]>
<![CDATA[Behavioral Quantification of Audiomotor Transformations in Improvising and Score-Dependent Musicians]]> https://www.researchpad.co/article/5989da65ab0ee8fa60b91bb4

The historically developed practice of learning to play a music instrument from notes instead of by imitation or improvisation makes it possible to contrast two types of skilled musicians characterized not only by dissimilar performance practices, but also disparate methods of audiomotor learning. In a recent fMRI study comparing these two groups of musicians while they either imagined playing along with a recording or covertly assessed the quality of the performance, we observed activation of a right-hemisphere network of posterior superior parietal and dorsal premotor cortices in improvising musicians, indicating more efficient audiomotor transformation. In the present study, we investigated the detailed performance characteristics underlying the ability of both groups of musicians to replicate music on the basis of aural perception alone. Twenty-two classically-trained improvising and score-dependent musicians listened to short, unfamiliar two-part excerpts presented with headphones. They played along or replicated the excerpts by ear on a digital piano, either with or without aural feedback. In addition, they were asked to harmonize or transpose some of the excerpts either to a different key or to the relative minor. MIDI recordings of their performances were compared with recordings of the aural model. Concordance was expressed in an audiomotor alignment score computed with the help of music information retrieval algorithms. Significantly higher alignment scores were found when contrasting groups, voices, and tasks. The present study demonstrates the superior ability of improvising musicians to replicate both the pitch and rhythm of aurally perceived music at the keyboard, not only in the original key, but also in other tonalities. Taken together with the enhanced activation of the right dorsal frontoparietal network found in our previous fMRI study, these results underscore the conclusion that the practice of improvising music can be associated with enhanced audiomotor transformation in response to aurally perceived music.

]]>
<![CDATA[A Technical Comparison of Digital Frequency-Lowering Algorithms Available in Two Current Hearing Aids]]> https://www.researchpad.co/article/5989da91ab0ee8fa60b9ffd6

Background

Recently two major manufacturers of hearing aids introduced two distinct frequency-lowering techniques that were designed to compensate in part for the perceptual effects of high-frequency hearing impairments. The Widex “Audibility Extender” is a linear frequency transposition scheme, whereas the Phonak “SoundRecover” scheme employs nonlinear frequency compression. Although these schemes process sound signals in very different ways, studies investigating their use by both adults and children with hearing impairment have reported significant perceptual benefits. However, the modifications that these innovative schemes apply to sound signals have not previously been described or compared in detail.

Methods

The main aim of the present study was to analyze these schemes'technical performance by measuring outputs from each type of hearing aid with the frequency-lowering functions enabled and disabled. The input signals included sinusoids, flute sounds, and speech material. Spectral analyses were carried out on the output signals produced by the hearing aids in each condition.

Conclusions

The results of the analyses confirmed that each scheme was effective at lowering certain high-frequency acoustic signals, although both techniques also distorted some signals. Most importantly, the application of either frequency-lowering scheme would be expected to improve the audibility of many sounds having salient high-frequency components. Nevertheless, considerably different perceptual effects would be expected from these schemes, even when each hearing aid is fitted in accordance with the same audiometric configuration of hearing impairment. In general, these findings reinforce the need for appropriate selection and fitting of sound-processing schemes in modern hearing aids to suit the characteristics and preferences of individual listeners.

]]>
<![CDATA[Predicting the Perceived Sound Quality of Frequency-Compressed Speech]]> https://www.researchpad.co/article/5989da27ab0ee8fa60b80f20

The performance of objective speech and audio quality measures for the prediction of the perceived quality of frequency-compressed speech in hearing aids is investigated in this paper. A number of existing quality measures have been applied to speech signals processed by a hearing aid, which compresses speech spectra along frequency in order to make information contained in higher frequencies audible for listeners with severe high-frequency hearing loss. Quality measures were compared with subjective ratings obtained from normal hearing and hearing impaired children and adults in an earlier study. High correlations were achieved with quality measures computed by quality models that are based on the auditory model of Dau et al., namely, the measure PSM, computed by the quality model PEMO-Q; the measure qc, computed by the quality model proposed by Hansen and Kollmeier; and the linear subcomponent of the HASQI. For the prediction of quality ratings by hearing impaired listeners, extensions of some models incorporating hearing loss were implemented and shown to achieve improved prediction accuracy. Results indicate that these objective quality measures can potentially serve as tools for assisting in initial setting of frequency compression parameters.

]]>
<![CDATA[Transmission Characteristics of Primate Vocalizations: Implications for Acoustic Analyses]]> https://www.researchpad.co/article/5989daedab0ee8fa60bc0001

Acoustic analyses have become a staple method in field studies of animal vocal communication, with nearly all investigations using computer-based approaches to extract specific features from sounds. Various algorithms can be used to extract acoustic variables that may then be related to variables such as individual identity, context or reproductive state. Habitat structure and recording conditions, however, have strong effects on the acoustic structure of sound signals. The purpose of this study was to identify which acoustic parameters reliably describe features of propagated sounds. We conducted broadcast experiments and examined the influence of habitat type, transmission height, and re-recording distance on the validity (deviation from the original sound) and reliability (variation within identical recording conditions) of acoustic features of different primate call types. Validity and reliability varied independently of each other in relation to habitat, transmission height, and re-recording distance, and depended strongly on the call type. The smallest deviations from the original sounds were obtained by a visually-controlled calculation of the fundamental frequency. Start- and end parameters of a sound were most susceptible to degradation in the environment. Because the recording conditions can have appreciable effects on acoustic parameters, it is advisable to validate the extraction method of acoustic variables from recordings over longer distances before using them in acoustic analyses.

]]>
<![CDATA[Robust Real-Time Music Transcription with a Compositional Hierarchical Model]]> https://www.researchpad.co/article/5989da81ab0ee8fa60b9ac67

The paper presents a new compositional hierarchical model for robust music transcription. Its main features are unsupervised learning of a hierarchical representation of input data, transparency, which enables insights into the learned representation, as well as robustness and speed which make it suitable for real-world and real-time use. The model consists of multiple layers, each composed of a number of parts. The hierarchical nature of the model corresponds well to hierarchical structures in music. The parts in lower layers correspond to low-level concepts (e.g. tone partials), while the parts in higher layers combine lower-level representations into more complex concepts (tones, chords). The layers are learned in an unsupervised manner from music signals. Parts in each layer are compositions of parts from previous layers based on statistical co-occurrences as the driving force of the learning process. In the paper, we present the model’s structure and compare it to other hierarchical approaches in the field of music information retrieval. We evaluate the model’s performance for the multiple fundamental frequency estimation. Finally, we elaborate on extensions of the model towards other music information retrieval tasks.

]]>
<![CDATA[3D Tune-In Toolkit: An open-source library for real-time binaural spatialisation]]> https://www.researchpad.co/article/5c900d1fd5eed0c48407e0c0

The 3D Tune-In Toolkit (3DTI Toolkit) is an open-source standard C++ library which includes a binaural spatialiser. This paper presents the technical details of this renderer, outlining its architecture and describing the processes implemented in each of its components. In order to put this description into context, the basic concepts behind binaural spatialisation are reviewed through a chronology of research milestones in the field in the last 40 years. The 3DTI Toolkit renders the anechoic signal path by convolving sound sources with Head Related Impulse Responses (HRIRs), obtained by interpolating those extracted from a set that can be loaded from any file in a standard audio format. Interaural time differences are managed separately, in order to be able to customise the rendering according the head size of the listener, and to reduce comb-filtering when interpolating between different HRIRs. In addition, geometrical and frequency-dependent corrections for simulating near-field sources are included. Reverberation is computed separately using a virtual loudspeakers Ambisonic approach and convolution with Binaural Room Impulse Responses (BRIRs). In all these processes, special care has been put in avoiding audible artefacts produced by changes in gains and audio filters due to the movements of sources and of the listener. The 3DTI Toolkit performance, as well as some other relevant metrics such as non-linear distortion, are assessed and presented, followed by a comparison between the features offered by the 3DTI Toolkit and those found in other currently available open- and closed-source binaural renderers.

]]>
<![CDATA[Speech Perception and Localisation with SCORE Bimodal: A Loudness Normalisation Strategy for Combined Cochlear Implant and Hearing Aid Stimulation]]> https://www.researchpad.co/article/5989da4bab0ee8fa60b8ccb8

A significant fraction of newly implanted cochlear implant recipients use a hearing aid in their non-implanted ear. SCORE bimodal is a sound processing strategy developed for this configuration, aimed at normalising loudness perception and improving binaural loudness balance. Speech perception performance in quiet and noise and sound localisation ability of six bimodal listeners were measured with and without application of SCORE. Speech perception in quiet was measured either with only acoustic, only electric, or bimodal stimulation, at soft and normal conversational levels. For speech in quiet there was a significant improvement with application of SCORE. Speech perception in noise was measured for either steady-state noise, fluctuating noise, or a competing talker, at conversational levels with bimodal stimulation. For speech in noise there was no significant effect of application of SCORE. Modelling of interaural loudness differences in a long-term-average-speech-spectrum-weighted click train indicated that left-right discrimination of sound sources can improve with application of SCORE. As SCORE was found to leave speech perception unaffected or to improve it, it seems suitable for implementation in clinical devices.

]]>
<![CDATA[A Cough-Based Algorithm for Automatic Diagnosis of Pertussis]]> https://www.researchpad.co/article/5989db09ab0ee8fa60bc9894

Pertussis is a contagious respiratory disease which mainly affects young children and can be fatal if left untreated. The World Health Organization estimates 16 million pertussis cases annually worldwide resulting in over 200,000 deaths. It is prevalent mainly in developing countries where it is difficult to diagnose due to the lack of healthcare facilities and medical professionals. Hence, a low-cost, quick and easily accessible solution is needed to provide pertussis diagnosis in such areas to contain an outbreak. In this paper we present an algorithm for automated diagnosis of pertussis using audio signals by analyzing cough and whoop sounds. The algorithm consists of three main blocks to perform automatic cough detection, cough classification and whooping sound detection. Each of these extract relevant features from the audio signal and subsequently classify them using a logistic regression model. The output from these blocks is collated to provide a pertussis likelihood diagnosis. The performance of the proposed algorithm is evaluated using audio recordings from 38 patients. The algorithm is able to diagnose all pertussis successfully from all audio recordings without any false diagnosis. It can also automatically detect individual cough sounds with 92% accuracy and PPV of 97%. The low complexity of the proposed algorithm coupled with its high accuracy demonstrates that it can be readily deployed using smartphones and can be extremely useful for quick identification or early screening of pertussis and for infection outbreaks control.

]]>
<![CDATA[Effect of Simultaneous Bilingualism on Speech Intelligibility across Different Masker Types, Modalities, and Signal-to-Noise Ratios in School-Age Children]]> https://www.researchpad.co/article/5989db3dab0ee8fa60bd58b7

Recognizing speech in adverse listening conditions is a significant cognitive, perceptual, and linguistic challenge, especially for children. Prior studies have yielded mixed results on the impact of bilingualism on speech perception in noise. Methodological variations across studies make it difficult to converge on a conclusion regarding the effect of bilingualism on speech-in-noise performance. Moreover, there is a dearth of speech-in-noise evidence for bilingual children who learn two languages simultaneously. The aim of the present study was to examine the extent to which various adverse listening conditions modulate differences in speech-in-noise performance between monolingual and simultaneous bilingual children. To that end, sentence recognition was assessed in twenty-four school-aged children (12 monolinguals; 12 simultaneous bilinguals, age of English acquisition ≤ 3 yrs.). We implemented a comprehensive speech-in-noise battery to examine recognition of English sentences across different modalities (audio-only, audiovisual), masker types (steady-state pink noise, two-talker babble), and a range of signal-to-noise ratios (SNRs; 0 to -16 dB). Results revealed no difference in performance between monolingual and simultaneous bilingual children across each combination of modality, masker, and SNR. Our findings suggest that when English age of acquisition and socioeconomic status is similar between groups, monolingual and bilingual children exhibit comparable speech-in-noise performance across a range of conditions analogous to everyday listening environments.

]]>
<![CDATA[Accuracy and Reliability of the Kinect Version 2 for Clinical Measurement of Motor Function]]> https://www.researchpad.co/article/5989daddab0ee8fa60bbab3b

Background

The introduction of low cost optical 3D motion tracking sensors provides new options for effective quantification of motor dysfunction.

Objective

The present study aimed to evaluate the Kinect V2 sensor against a gold standard motion capture system with respect to accuracy of tracked landmark movements and accuracy and repeatability of derived clinical parameters.

Methods

Nineteen healthy subjects were concurrently recorded with a Kinect V2 sensor and an optical motion tracking system (Vicon). Six different movement tasks were recorded with 3D full-body kinematics from both systems. Tasks included walking in different conditions, balance and adaptive postural control. After temporal and spatial alignment, agreement of movements signals was described by Pearson’s correlation coefficient and signal to noise ratios per dimension. From these movement signals, 45 clinical parameters were calculated, including ranges of motions, torso sway, movement velocities and cadence. Accuracy of parameters was described as absolute agreement, consistency agreement and limits of agreement. Intra-session reliability of 3 to 5 measurement repetitions was described as repeatability coefficient and standard error of measurement for each system.

Results

Accuracy of Kinect V2 landmark movements was moderate to excellent and depended on movement dimension, landmark location and performed task. Signal to noise ratio provided information about Kinect V2 landmark stability and indicated larger noise behaviour in feet and ankles. Most of the derived clinical parameters showed good to excellent absolute agreement (30 parameters showed ICC(3,1) > 0.7) and consistency (38 parameters showed r > 0.7) between both systems.

Conclusion

Given that this system is low-cost, portable and does not require any sensors to be attached to the body, it could provide numerous advantages when compared to established marker- or wearable sensor based system. The Kinect V2 has the potential to be used as a reliable and valid clinical measurement tool.

]]>