A key problem in the study of the senses is to describe how sense organs extract perceptual information from the physics of the environment. We previously observed that dynamic touch elicits mechanical waves that propagate throughout the hand. Here, we show that these waves produce an efficient encoding of tactile information. The computation of an optimal encoding of thousands of naturally occurring tactile stimuli yielded a compact lexicon of primitive wave patterns that sparsely represented the entire dataset, enabling touch interactions to be classified with an accuracy exceeding 95%. The primitive tactile patterns reflected the interplay of hand anatomy with wave physics. Notably, similar patterns emerged when we applied efficient encoding criteria to spiking data from populations of simulated tactile afferents. This finding suggests that the biomechanics of the hand enables efficient perceptual processing by effecting a preneuronal compression of tactile information.
The sense of touch, which is essential for skilled manipulation and object perception, relies on the encoding of mechanical signals collected by the skin and subcutaneous tissues into neural representations. While neural responses to tactile stimuli are often associated with mechanical inputs arising from small skin regions, we recently observed that dynamic touch elicits mechanical waves in the tactile frequency range that spread throughout the whole hand, with transient excitations decaying within 30 ms (1). Dynamic tactile inputs can thus drive widespread tactile afferent populations (2, 3). These touch-elicited waves have been found to facilitate fine perceptual discriminations (4, 5) and can be used to infer actions, the attributes of touched objects, and locations of contact with the hand (1, 6–8). Receptive fields of neurons in somatosensory cortical areas were observed to span large hand areas and multiple digits (9, 10). The large spatial scale integration at the early stages of processing (11, 12) induces cortical neurons to exhibit integrative responses to tactile inputs delivered to widespread limb regions (13–15). Thus, somatosensory processing could depend on information transported by mechanical waves that propagate in tissues to remote locations, distant from the loci of mechanical contact.
An analogy could be drawn to the cochlea, where the transport of dispersive mechanical waves via the basilar membrane imparts preneuronal filtering to auditory stimuli (16), supporting a frequency-place transformation (17–20). Similar processes have been observed for mechanical waves propagating in the hand (3). In the rodent vibrissal system, whisker mechanics also impart preneuronal processing to tactile stimuli (21, 22).
If the transport of mechanical waves in the hand facilitates efficient somatosensory information encoding, then it should be possible to describe tactile stimuli in terms of a smaller space of informative parameters. This would allow stimuli to be represented as combinations of a small number of primitive features, or tactile patterns. These representations are commonly observed in sensory systems. They correspond to an efficient sensory coding hypothesis that proposes neural circuitry to have evolved to capture relevant sensory information with the fewest physical and metabolic resources (23, 24). Studies of commonly encountered visual and auditory stimuli show that representations in the neural pathways for perceptual processing can emerge from the need to efficiently encode information in natural scenes (25–28).
Here, we show how mechanical waves in the hand produce an efficient encoding of tactile inputs. By optimally encoding a dataset of thousands of naturally occurring whole-hand tactile stimuli, we obtained a compact lexicon of primitive spatiotemporal patterns that sparsely represented information in the entire dataset, enabling it to be classified with an accuracy exceeding 95%. These primitive patterns reflected the interplay of the anatomy of the hand and the physics of tactile wave propagation and were evocative of hand sensory function, including the individuation of digits and the denser innervation of the distal ends of the fingers. We obtained notably similar patterns when we applied the efficient encoding criteria to spiking data from populations of simulated tactile afferents. These results reveal a possible important contribution of the hand biomechanics to early somatosensory processing, which may be compared to the role of cochlear mechanics in early auditory encoding. This new knowledge revises existing views of touch sensing and may aid the understanding of hand sensory function and deficits affecting the sense of touch. It also furnishes new principles that may guide the design of electronic tactile sensors that could leverage the ability of propagating waves to communicate touch information. These devices may yield important applications in robotics, prosthetics, and medicine.
We formulated the efficient encoding of tactile information as an optimal matrix factorization problem and evaluated its predictions using a database of whole-hand tactile stimuli, comprising spatiotemporal skin accelerations, a(x, t), that were captured at 30 different locations, x , via a sensor array worn on the hand during performances of 13 manual gestures and 4600 interactions with objects (see Fig. 1 and Materials and Methods). Each of the 4600 captured stimuli was represented in the dataset by 18,000 samples.
These stimuli were encoded via a compact lexicon of M primitive spatiotemporal patterns, or “bases,” wi(x, t), weighted by time-dependent activations, hi(t), that were unique to each stimulus
We computed an optimal encoding (Fig. 2A) by maximizing the information about every element of the dataset, a(x, t), that was gained by observing the estimate, , as determined by Eq. 1. The simultaneous optimization of the model with respect to wi(x, t) and hi(t) (see Materials and Methods) yielded a set of “tactile basis patterns,” wi(x, t), that together produced an efficient encoding, revealing the latent structure hidden in the ensemble of stimuli. These basis patterns optimally represented the dataset in the sense of maximum likelihood (see Supplementary Text).
The bases may also be interpreted as an array of analysis filters that extracted information from the stimuli via different, complementary patterns of spatiotemporal integration of mechanical signals in the hand. These filters may be compared to spectrotemporal tuning functions in auditory processing (29) or spatiotemporal receptive field filters in retinal processing (30). In a minimal functional model of neural population coding, these filter outputs may be passed through nonlinear transfer functions to predict neural firing.
Model (1) included a non-negativity constraint to match the rectifying property of mechanotransduction (31). This encouraged a sparse encoding, (32, 33), as observed in mammalian visual (34) and auditory (35) cortices and in rodent barrel and somatosensory cortices (36, 37).
Although the analysis was blind to the conditions that gave rise to the signals, the tactile bases were evocative of hand sensory function (Fig. 2, A and B). Most were initially localized at the distal ends of single digits (the most densely innervated regions of the hand). They traveled proximally at rates of 1 to 10 m/s, while decaying over 10 to 30 ms, matching the causal physics of waves in the hand (see the Supplementary Materials). Other bases evolved from the distal region of individual digits to diffuse regions of the hand surface (Fig. 2A). In the frequency domain, pairs of bases exhibited similar spatial patterns but distinct frequency characteristics. For example, the encoding yielded pairs of bases that were both spatially localized within one digit but had different filtering properties: low pass, from about 20 to 80 Hz (Fig. 2B, basis 2), or high pass, from 80 to 160 Hz (Fig. 2B, basis 6). Similar patterns emerged when the encoding rank M , or number of bases, was adjusted (Fig. 3B) or when optimizing with different initial conditions, data subsets, or optimization objectives (figs. S1 to S6).
It could be hypothesized that the structure of our dataset favored such an encoding, in which several spatiotemporal basis patterns were associated with individual digits. For example, 45% of the 4600 analyzed tactile stimuli were elicited by gestures that produced contact at only one digit. To investigate this possibility, we applied the same analysis to a subset of the data that excluded tactile stimuli produced by single-digit gestures. We repeated the same analysis using an additional subset that only included stimuli produced via gestures involving contact with all five digits. In each case, the results were highly similar to those we obtained from encoding the entire dataset (fig. S9), including distinct basis patterns that were primarily localized in single digits.
The space of possible tactile stimuli is constrained by contact and continuum mechanics (Fig. 3A and fig. S8). To assess the number of bases, wi(x, t ), that were needed to capture information about the causal origin of the stimuli, we varied the encoding rank and trained support vector machine (SVM) classifiers to predict the gestures from the activation patterns. The encoding was not selected to optimize classification accuracy. Nonetheless, the classification accuracy increased with the number of bases and was greater than 90% if at least 7 bases were used, or greater than 95% if 12 bases were used. A high classification accuracy is not necessarily expected to be achieved when the number of input dimensions exceeds the number of classes. For example, there are binary prediction tasks, such as cancer prognosis prediction from images, where the consideration of multitudes of features is required to achieve moderately accurate classification (38).
The bases encoded the stimuli via a small number of time-dependent activation weights (Fig. 2, C and D). Stimuli elicited by multifinger gestures were encoded by several bases, while simpler gestures activated one or two. Tactile stimuli produced via similar gestures yielded similar activation patterns (fig. S5), while dissimilar gestures resulted in dissimilar activations, even when the same combinations of digits were involved.
The encoding residual decreased with the number of bases. Five bases were sufficient to maximize the accuracy (80%) with which stimuli from one participant could be classified using only data from the other participants (Fig. 3C). These five bases were highly conserved between individuals and were associated with individual digits (Fig. 3B). The activations of the bases (Fig. 2, C and D) exhibited a high degree of sparsity preserved across many trials (see table S1). Those that were associated with multiple finger contact were less sparse and more diverse than those involving just one finger. Information independence among the basis activations decreased with the number of digits engaged.
The observed encoding efficiency of the mechanical signals was a consequence of spatiotemporal integration supplied by the tactile basis patterns. Prevailing physiological models leave little doubt that the spatial and temporal properties of touch-elicited mechanical signals are reflected in the volleys of afferent activations during natural hand interactions. However, extant methods preclude the simultaneous capture of neural signals from populations of peripheral afferents in the behaving hand. We instead computationally predicted the spiking responses of a population of 773 vibration-sensitive afferents excited by the raw mechanical signals in the entire dataset (fig. S7). The neural simulation yielded 773 spike trains for each of the 4600 trials in the dataset.
We optimized the encoding of the predicted neural responses with the same method that was used for mechanical signals. The results were notably similar (Fig. 3, D and E). The spiking bases exhibited similar patterns of spatial integration to the bases that we obtained using the mechanical data, including individuation of digit representations and denser activation of the fingertips. The results of the classification tasks were qualitatively similar to those that we obtained from the mechanical data, despite the higher dimensionality of the input data (see Materials and Methods). This suggests that the encoding revealed organizational principles that would be preserved by neurotransduction and that went beyond the mere properties of skin vibrations.
The size and the diversity of our corpus of data were limited by experimental constraints, as in many other studies using corpora of motor or sensory data (39–41). Although we selected the gestures on the basis of the most reasonable assumptions available, a larger dataset could be captured during spontaneous manual activities outside of the laboratory or specified on the basis of on analyses of conditions in which our species evolved.
A useful comparison can be drawn to research on hand movements and grasping. Research in this area has shown how a relatively small number of coordination patterns (“synergies”) can explain most of the variability in hand movement data. Similar coordination patterns have been observed in studies based on different laboratory datasets, or on spontaneous manual activities outside the laboratory, with some task dependency (41). The dimensionality of the analysis presented here is much larger than typically arises in hand kinematic studies. Nonetheless, analogous considerations may apply to our findings. Our analyses of subsets of the mechanical data yielded basis patterns that were very similar to those that we obtained from the combined dataset.
The tactile basis patterns were also invariably organized along a gradient from higher to lower finger individuation. This is opposite to the trend that is observed in grasping studies and may evince an important difference in organizational principles between the tactile and motor systems. The larger degree of individuation that our findings suggest appears to be a consequence of the physics of vibration transmission in the skin, which causes propagating vibrations to attenuate with increasing distance from their source. In contrast, hand movement studies reveal a higher degree of multidigit coordination, which is facilitated by the biomechanics of the limb.
While the behavioral relevance of propagating vibrations in the limb is not fully understood, previous research shows how these vibrations can mediate tactile perception (4, 5). Further research is needed to clarify the relevance of the predictions from efficient tactile encoding to hand function and somatosensory processing.
Our findings suggest that the biomechanics of the hand can facilitate tactile perception by effecting the preneuronal compression of tactile information in the whole hand. This compression was produced by a compact lexicon of primitive tactile wave patterns. Spatiotemporal integration supplied by these basis patterns optimally encoded the tactile stimuli. Recent studies of neural correlates of somatosensory processing reveal that, at the earliest stages of cortical processing, individual neurons exhibit complex responses to tactile stimuli distributed throughout the extremities (15, 42, 43). These studies, together with the new findings presented here, show how traditional depictions of receptive fields do not reflect the extent of early somatosensory integration (44), including effects of mechanical transmission in the body.
In our previous study (1), we developed a custom array of 30 three-channel miniature accelerometers (model ADXL335; Analog Devices) attached to the skin to record the stimuli (Fig. 1C). The sensors were attached to the dorsal hand region, because collecting measurements from the volar region of the hand during natural activity remains technically prohibitive, due to the necessity to expose the glabrous skin to contact. However, we observed that patterns of mechanical wave propagation generated in the volar and dorsal regions are quite similar (fig. S8), indicating that similar results would be associated with the volar hand region.
Each accelerometer had a mass of 40 mg. We used noncontact laser vibrometry to verify that the small mass of the sensors did not significantly affect the measurements. They were soldered to a miniature two-sided printed circuit board (dimensions, 6 mm × 8 mm), had a wide bandwidth (0 to 1600 Hz in X and Y; 0 to 550 Hz in Z), and a dynamic range overlapping that of the vibrotactile system (±35.3 m/s2). The accelerometers were affixed to the skin using an elastic, skin-compatible prosthetic adhesive (Pros-Aide, FX Warehouse, Philadelphia, PA). The 90 analog signals measured via this apparatus were sampled at a frequency of 2.0 kHz and quantized with a resolution of 12 bits by a data acquisition system (model PCIE-6321, National Instruments, Austin, TX).
The data consisted of touch-elicited vibrations of the skin that were captured from four individuals (one female and three males, aged 19 to 23 years old). Experiments were conducted consistent with institutional ethics guidelines, and all participants gave their informed consent. No participant reported or exhibited abnormalities of the hands. All were right hand dominant and wore the accelerometer array as indicated in Fig. 1A. The accelerometers were positioned on the hand’s dorsal surface so that they would not interfere with touch interactions. The signals were captured from the dorsal surface of the hand, which avoided introducing artifacts in the data. Because of the properties of vibration transmission in the hand, the captured signals are quite similar to those occurring in the volar surface (see fig. S8). The positions were anatomically standardized. Tactile signals in the hand were measured as individuals performed 13 different prescribed manual gestures during each of the 4600 trials. The gestures were selected to be similar to those used when interacting with the environment in everyday life. The majority involved coupled movement of multiple digits and contact between different parts of the hand and objects: tapping a steel plate with individual digits or combinations of digits, feeling the surface via sliding contact, two-finger grasping of a small or large plastic cylinder (diameter d = 40 or 56 mm, masses m = 31 g) with digits I and II, grasping a plastic ball (d = 63 mm, m = 26 g) with all fingers, and indirectly tapping a surface via a stylus (d = 6 mm, length L = 155 mm, m = 30 g) held in digits I and II. Participants were instructed to use forces of approximately 1 N. The gestures were otherwise unconstrained. Measurements were captured in successive blocks of identical activities. Each block of measurement trials lasted 45 s and was composed of 20 trials (tapping gestures) or 10 trials (other gestures). Visual cueing helped participants to maintain a pace of 2 or 4 s per trial in respective cases. The tactile signals elicited by each gesture spanned 1 to 2 s of data. For analysis, we extracted the time-varying acceleration magnitude, ‖ak(t)‖, from each kth accelerometer (see below), truncated each trial to 600 ms, and downsampled the data to 1.0 kHz. The analyzed data from each trial thus consisted of 30 time-varying signals of 600-ms duration sampled at 1.0 kHz, yielding 18,000 data samples per trial. Thus, the nominal dimensionality of each of the 4600 spatiotemporal stimuli in the dataset was 18,000. The total storage of the dataset required 165 MB.
Although the skin acceleration measurements were sampled at a discrete array of points, they provided a sufficient representation of information in the fields of tactile waves in the hand, because the wavelength, λ, was at least twice as large as the accelerometer spacing, thus satisfying a Nyquist criterion. From wave mechanics, summarized further on in Supplementary Text, λ satisfies a dispersion relation λ = c(f)/f, where c(f) < 10 m/s was the frequency-dependent speed of surface wave propagation in the analyzed range of frequencies, 10 < f < 1000 Hz. We determined that λ = c(f)/f was larger than 10 mm for all frequencies. Further discussion is provided in Supplementary Text.
To match the rectifying properties of tactile afferents, we first computed signal magnitudes. The acceleration magnitude from the kth accelerometer at time frame t was computed as ak(t) = ‖ak(t)‖2, where ak(t) is the vector signal from the k th accelerometer. We translated the accelerometer signals into spatiotemporal skin motion by interpolating the acceleration magnitudes among nearby measurement locations, as shown in Fig. 1B, using an inverse-distance filter, informed by biomechanical measurements (1, 3). The acceleration amplitude a(x, t) at each location x = (x1, x2, x3) on the surface of the model hand was computed as a weighted sum of accelerations ak(t) at nearby sensors
We analyzed frequency content in the stimuli by filtering them to extract content in separate frequency bands: 10 to 20 Hz, 20 to 40 Hz, 40 to 80 Hz, 80 to 160 Hz, 160 to 320 Hz, and 320 to 500 Hz (Fig. 2B). To avoid artifacts, filtering was performed using zero-phase finite impulse response filters.
The model of efficient spatiotemporal encoding is based on convolutive non-negative matrix factorization (45). This model may be compared to that used to represent stimulus encoding in the auditory system (40). The model is mathematically simple; requires few arbitrary choices, the effects of which are readily analyzed; can accommodate physiologically motivated assumptions; and can be compared with models of sensory encoding in other modalities. It encoded the tactile stimuli, a(x, t), by determining the values of hi and wi that provided the best statistical estimate, , of a(x, t ) as determined by model (1), where η is a residual error.
The same tactile basis patterns, wi(x, t), encoded all 18,000 sample values of all 4600 stimuli in the dataset. The activation weights, hi(t ), associated with each basis differed for each stimulus. Both factors were jointly estimated from the data. Each basis could assume arbitrary non-negative values for each position and time. No other statistical assumption was made about the data. The model was causal (Eq. 1); hence, the bases described responses that ensued with delays, τ. We set their duration, T, to the time required for mechanical waves in the hand to decay, about 30 ms, although our findings were robust to variations in duration (see fig. S4). This duration spanned 30 time samples at the sample rate of 1 kHz. Each basis pattern was therefore represented by 900 values. We computed optimal encodings with ranks M = 2 to 12, corresponding to 2 to 12 basis patterns. For each value of M, we determined the optimal basis set and per-stimulus activation weights via simultaneous iterative optimization over wi and hi , beginning from random initializations of each (32). This optimization maximized the statistical information about the stimulus, a(x, t), that is gained by observing the estimate, (see Eq. 1), as measured by the Kullback-Leibler divergence
This measure quantified the dissimilarity between a and , regarding them as statistical distributions that encoded information. Under mild assumptions, minimizing the Kullback-Leibler divergence is equivalent to maximizing, with respect to hi(t) and wi(x, t ), the likelihood of model (1) to represent the data.
Solving this minimization problem involved the determination of parameters hi(t) and wi(x, t) that best captured information in the ensemble of data.
The optimization yielding the tactile codes was performed in an unsupervised manner, without using knowledge about the manual interaction that produced the tactile signals, the touched objects, or any other factors.
To assess the number of bases, wi(x, t), that were necessary to capture information about the causal origin of the stimuli, we varied the encoding rank (number of bases) and designed a classification task whose objective was to use the activation weight pattern to identify the gesture that elicited the stimulus. We integrated the weights over time, , to eliminate the adverse effects arising from timing differences across trials. The task involved multiclass classification, which we implemented as thirteen 1-versus-12 classification tasks. We avoided classification methods, such as convolutional neural networks, that would require extensive model tuning. We instead opted for SVM classifiers, which require few choices and are theoretically sound, involving a convex optimization. All classifiers used a radial basis function SVM kernel (width, 5.0; selected using an independent validation set). We evaluated classification performance using a standard (10-fold) cross-validation method, with a 90% training and 10% testing data split. To assess the between-individuals generalizability of these inferences, we performed a cross-individual validation, in which we trained a classifier on data from three participants and tested it on data from the fourth and averaged the results across each left-out participant.
The Hoyer sparseness measure, a normalized ratio of ℓ1 and ℓ2 norms, is often preferred, on the basis of criteria discussed in the literature (46)
We assessed the diversity of the encoding by computing the empirical Shannon information entropy of the activation values across the entire dataset, as follows. Discretize the activation values hi(t) and let pk be the probability that a randomly drawn activation from any stimulus, channel, or time has a value lying in histogram bin k. The joint entropy of activations in all channels was
The distribution, pk, was computed from all values of hi(t) in the entire dataset, for all i. The entropy HJ was maximized when all weight values were equally likely and decreased as the sparseness of the code increased. This measure revealed differences between encoded stimuli produced by different actions (table S1). The joint activation entropy HJ was highest for gestures involving multifinger contact and lowest for contacts of single fingers, suggesting that the model was most efficient at encoding gestures involving individual digits. For each basis, we computed the entropies, H(hi)
Because no method is known to record neural activity in multiple peripheral afferents during natural manual interactions, we used a biologically justified neuron spiking simulation software [TouchSim (47)]. This software predicted, in silico, the firing patterns of 773 vibration-sensitive afferents distributed throughout a simulated hand in response to the touch-elicited vibrations of the skin that we captured in vivo.
We computed skin displacements from the skin acceleration data. The skin displacement signals were used to drive the TouchSim model. For each of the 4600 trials of the entire database, the model produced and output spike trains for each of the 773 simulated Pacinian corpuscle afferents [PC afferents form a class of afferent fibers terminating in Pacinian corpuscles thought to play a major role in the encoding of skin vibrations (48–50)]. We also computed mean firing rates for each. The output of the simulation was thus the spike train data and mean firing rate of the afferents for each trial. Representative trials, and mean firing rates for several gesture classes, are shown in fig. S7.
We analyzed the mean firing rate data for all 773 PCs using a non-negative matrix factorization procedure, similar to the one used in our analysis of the acceleration data. Informed by our analysis of the acceleration data, we performed the non-negative matrix factorization analysis of the simulated neural data for 2 to 12 bases, yielding 11 different encodings of increasing dimensionality. The eight-basis solution is shown in Fig. 3E. Each basis describes a distribution of mean firing rates used in the encoding. The bases bore a notable resemblance to those that we obtained by analyzing the acceleration data. We evaluated the quality of the encodings using a classification task, residual measure, sparseness measure, and a cross-participant classification task. The classification and evaluation procedures were exactly the same as those used for the accelerometer data and were not optimized for these data. Nonetheless, classification rates reached 90% with eight bases (Fig. 3D).
Funding: This work was supported by the U.S. National Science Foundation (NSF-1628831, NSF-1623459, and NSF-1751348). Additional support was from a Leverhulme Trust Visiting Professorship to V.H. Authorcontributions: Y.S., V.H., and Y.V. planned and designed the study and wrote the paper. Y.S. and Y.V. performed research. Y.S. and Y.V. contributed new reagents/analytic tools. Y.S. and Y.V. analyzed data. Competing interests: The authors declare that they have no competing interests. Data and materials availability: The full set of mechanical data used to produce all of the results in this paper is available athttps://rtlab.s3.amazonaws.com/Publish_data.zip.
Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/6/16/eaaz1158/DC1