People can use abstract rules to flexibly configure and select actions for specific situations, yet how exactly rules shape actions toward specific sensory and/or motor requirements remains unclear. Both research from animal models and human-level theories of action control point to the role of highly integrated, conjunctive representations, sometimes referred to as event files. These representations are thought to combine rules with other, goal-relevant sensory and motor features in a nonlinear manner and represent a necessary condition for action selection. However, so far, no methods exist to track such representations in humans during action selection with adequate temporal resolution. Here, we applied time-resolved representational similarity analysis to the spectral-temporal profiles of electroencephalography signals while participants performed a cued, rule-based action selection task. In two experiments, we found that conjunctive representations were active throughout the entire selection period and were functionally dissociable from the representation of constituent features. Specifically, the strength of conjunctions was a highly robust predictor of trial-by-trial variability in response times and was selectively related to an important behavioral indicator of conjunctive representations, the so-called partial-overlap priming pattern. These results provide direct evidence for conjunctive representations as critical precursors of action selection in humans.
Flexible, goal-directed action requires the use of abstract rules that can be applied to a range of specific situations. However, we know little about how such rules connect with lower-level sensory or response representations as a specific action is planned and executed. In traditional stage-based processing models, rules or task sets regulate the flow of information from stimulus to response in the form of a cascade of relatively independent processing steps (1234–5). In contrast, recent results from research in nonhuman primates suggest a critical role of neurons with nonlinear, mixed selectivity response properties that integrate various aspects (i.e., rules, stimuli, and responses) in a conjunctive manner (6, 7). Similarly, some cognitive psychologists have proposed––mostly on the basis of behavioral results––that as a necessary step for action selection, relevant features, including rules, need to be combined into highly integrated conjunctive representations, referred to as event files (8910–11). However, no direct, neural-level indicator of event files exists, making it difficult to bridge the gap between theories about integrated representations in human action selection and the literature on mixed selectivity neurons in animal models.
Currently, the main signature of event files is an indirect, behavioral aftereffect known as the partial-overlap priming cost (Fig. 1A): When either all or none of the action-relevant features repeat across consecutive trials (e.g., both rule and response either repeat or change), performance is relatively fast. In contrast, when only some but not all features overlap across trials (e.g., response repeats, but rule changes), response times (RTs) and/or errors increase. According to event file theory, entire event files can be easily repeated or replaced; however, when an overlapping feature needs to be extracted from a recently activated event file, RTs or error costs increase, leading to the partial-overlap priming pattern.
There is also neuroimaging evidence on how the partial-overlap cost pattern is expressed neuroanatomically (12, 13) and in evoked electroencephalography (EEG) components (14). However, given that partial-overlap costs are an aftereffect of event file formation, this pattern provides no information about how conjunctive representations versus their constituent features behave during response selection and whether they are indeed a critical precursor of successful action. Moreover, partial-overlap costs can also be explained by alternative models, such as in terms of interactions between distinct hierarchical levels of control (15) or as an indirect consequence of response inhibition (16).
To evaluate the hypothesized role of conjunctive representations, we used the EEG signal to decode information about action-relevant representations in a time-resolved manner (1718–19) while participants selected responses on the basis of randomly cued action rules (11) (Fig. 1 A and B). By definition, conjunctive representations are correlated with the representations of constituent features. To tease apart these correlated representations, we used representational similarity analysis (RSA) (20, 21). Standard RSA requires information about the similarity of the multivariate neural signals across conditions (e.g., based on correlations); however, this cannot be computed on the level of individual trials. Therefore, we performed RSAs using confusion profiles that resulted from an initial step of decoding each of the possible action-relevant constellations on the level of individual trials and time points (Fig. 1 C and D; Methods). This two-step procedure allowed us to examine the roles of conjunctive and constituent representations in predicting trial-by-trial variability in performance.
In two experiments in humans, our results provide temporally precise evidence for the activation of conjunctive representations during action selection. Consistent with findings from mixed selectivity neurons and event file theory, conjunctions were robust and unique predictors of variability in performance and were specifically related to the pattern of partial-overlap priming effects.
We conducted two experiments that we report together. In Experiment 1, we used the spatial rules task with three different rules (Fig. 1 A and B), which did not allow us to differentiate between different types of conjunctions (i.e., S-R conjunctions vs. rule-S-R conjunctions). In Experiment 2, we used an expanded task space with four different rules (Fig. 1 E and F), which included conjunctions that share the same S-R pairs but have different abstract rules (11). This allowed us to dissociate conjunctions that integrate rules (i.e., rule-S-R conjunctions) from rule-independent conjunctions (i.e., S-R conjunctions).
For all analyses, error trials, posterror trials, and trials in which RTs were larger than 99.5% of each individuals’ RT distributions were excluded. In both experiments, and consistent with previous work (11), we observed partial-overlap costs in RTs and errors as a function of the different trial-to-trial transitions (Fig. 2): In Experiment 1, when either rules and responses repeated or when both changed, responses were fast and accurate, whereas costs emerged in the case of partial updates of either rules or stimuli/responses. In Experiment 2, the repetition of rule-S-R settings produced RT and error benefits, whereas any partial updates (including S-R repetitions with rule changes) generated costs. Results of our statistical analyses are provided in SI Appendix, Tables S1 and S2.
To directly assess the role of conjunctive representations during action selection, we used time-resolved RSAs on the level of single trials (Figs. 3A and 4A). In agreement with previously reported results (17, 19), the cascade of decoded representations unfolded consistently with the expected flow of information. The rule was activated during the prestimulus phase, followed by strong expression of the stimulus and finally by the response. Critically, over and above these constituent features, the conjunctive representations were active during the entire poststimulus period (Figs. 3A and 4A) in both experiments. In Experiment 2, rule-S-R conjunctions were more strongly expressed than rule-independent S-R conjunctions. In addition, in both experiments, conjunctions emerged in tandem with (in Experiment 1) or clearly before (in Experiment 2) response activation (SI Appendix, Fig. S8). Consistent with event-file theory, this temporal pattern suggests that conjunctions arise during response selection and not just as a response-selection aftereffect.
Note that the expression of conjunctions was statistically robust even though we accounted for subject-specific differences in RTs between action constellations. Thus, decoding results cannot be explained in terms of unspecific difficulty differences between action constellations (SI Appendix, Figs. S1 and S2). In Experiment 2, we observed that the rule representation diminished after stimulus onset (Fig. 4A). Excluding conjunction models restored the poststimulus rule representation, suggesting that the rule-S-R conjunction model captures the same variance as explained by the rule model in this phase of action selection (Fig. 4A, Inset).
To test the prediction from event-file theory that conjunctive representations are critical for action selection, we regressed trial-to-trial variation in RTs onto the strength of each expressed representation. Using multilevel modeling, we performed these analyses for each time point and with all predictors entered simultaneously. The resulting “impact trajectories” are shown in Figs. 3B and 4B. Statistical results for a priori selected time intervals are summarized in SI Appendix, Tables S3 and S4; SI Appendix, Figs. S3 and S4 provide corresponding results from standard decoding analyses. Note that negative t values indicate that stronger representations lead to faster responses. Consistent with the prediction from event file theory, conjunctive representations were the dominant predictors of performance in both experiments. In Experiment 2, both rule-S-R conjunctions and S-R conjunctions explained substantial independent variability in trial-to-trial RTs (Fig. 4B), with a slight edge for the rule-specific conjunctions. Taken together, these results indicate that conjunctive representations emerge during response selection and predict upcoming behavior over and above the influence of the constituent features.
To directly connect the EEG-decoded conjunctive representations with the theoretical event file construct, we examined how these representations relate to the partial-overlap priming pattern. As shown in Figs. 3C and 4C, the strength of decoded conjunctions expresses the partial-overlap pattern in both Experiments 1 and 2. Conjunctive representations were particularly strong in those transitions in which RTs were fast (i.e., when either everything repeated or everything changed; see Fig. 2). In Experiment 1, conjunctive representations showed the partial-overlap pattern in the expected direction during the early phase (b = 0.024, SE = 0.010, t(20) = 2.58) but not in the late poststimulus phase (b = 0.004, SE = 0.010, t(20) = 0.39), and none of the constituent features showed the critical interaction pattern (all t(20) <0.21). In Experiment 2, only the strength of rule-S-R conjunctions showed the partial-overlap costs (b = 0.021, SE = 0.009, t(21) = 2.22) for the early selection phase (b = 0.021, SE = 0.009, t(21) = 2.24) and the late selection phase (Fig. 4C). None of the constituent features (all t(21) < 0.72) or S-R conjunctions showed such an effect (b = 0.012, SE = 0.009, t(21) = 1.27 for the early selection phase; b = 0.007, SE = 0.010, t(21) = 0.72 for the late selection phase).
Another important prediction that can be derived from the event file model is that strong conjunctions should be particularly difficult to “unbind” on the subsequent trial. Thus, the stronger the conjunction on trial n-1, the larger the partial-overlap costs on trial n should be. Our results, shown in Figs. 3D and 4D, confirm this prediction for both experiments. A stronger conjunctive representation in trial n-1, late in the selection period, led to greater RT partial-overlap costs on trial n (b = 0.025, SE = 0.011, t(20) = 2.25). Importantly, this pattern was unique for conjunctive representations and was not found for any of the constituent representations (all t(20) < 0.05). In Experiment 2, only rule-S-R conjunctions significantly modulated RT partial-overlap costs on the next trial (b = 0.031, SE = 0.011, t(20) = 2.81) (Fig. 4D). Again, this pattern was absent for S-R conjunctions or any other of the constituent representations (all t(21) <0.38). Thus, it is specifically the highest-order, rule-S-R conjunctions that relate to the partial-overlap cost pattern. Overall, the behavior of decoded conjunctive representations was highly consistent with predictions from the event file model.
We tested whether integrated, conjunctive representations between task-relevant features emerge during action selection, as predicted from results with mixed selectivity neurons (6) and by event file theory (8, 9). In our paradigm, action settings had to be updated flexibly for each trial, creating unique constellations among rules, stimuli, and responses. We combined a standard linear decoding approach with a subsequent time-resolved RSA to track the emergence of conjunctive representations and their constituent features over time and for each individual trial.
The time course of decoded information showed a highly plausible cascade of action representations (rule, stimulus, and then response). Most critically, we found robust evidence for conjunctive representations, emerging shortly after stimulus onset and persisting until response execution. Analyses with response-locked EEG data fully confirmed this pattern of results (SI Appendix, Fig. S8). The fact that conjunctive representations were continuously present from stimulus processing to response execution is consistent with their role in translating sensory codes into response codes based on the current task rules.
Even though conjunctive representations were on average less strongly expressed than those of constituent features, they were statistically highly robust (Figs. 3B and 4B and SI Appendix, Fig. S10). Moreover, conjunctive representations were strong and unique predictors of trial-by-trial variability in RTs, over and above other constituent features. These results are difficult to reconcile with traditional stage theories (1234–5) and hierarchical control models (15, 22), in which information flows in a strict feed-forward manner and thus allows no integrated representations to emerge. This is all the more remarkable given that our task design, with explicitly cued rules that appear before each stimulus, should have been clearly compatible with a hierarchical selection architecture (e.g., first selection of rule, then of rule-specific S-R link). Instead, our results indicate that action selection is established by tying together the disparate task-relevant features from the entire selection event into a common representation.
In Experiment 1, conjunctions could entail any pairwise or complete combination of rule, stimulus, or response features; in Experiment 2, we were further able to dissociate between rule-specific rule-S-R conjunctions and rule-independent S-R conjunctions. The fact that in Experiment 2, both rule-S-R and S-R conjunctions emerged is an important finding in its own right, suggesting that integrated representations that match the contingencies in the environment develop in parallel on different levels of specificity. This combination of both rule-specific and rule-independent representations can account for previous findings showing that S-R associations learned within one rule can transfer to another rule, albeit in a limited manner (11, 23).
A key behavioral indicator of event files is the partial-overlap priming pattern (Fig. 1A) (11, 24). In both experiments, this pattern was apparent not only in RTs and errors (Fig. 2), but also in the strength of conjunctions (Figs. 3C and 4C). More importantly, the strength of conjunctions in trial n-1 predicted the size of partial-overlap costs in trial n (Figs. 3D and 4D), suggesting the tighter the integration between action features, the harder it is to “unbind” the features to integrate them into a new conjunction. Importantly, only conjunctions, and not the basic features, showed such a relationship with the partial-overlap pattern, thereby functionally dissociating conjunctions from their constituent codes. The results of Experiment 2 also indicated that specifically, rule-S-R conjunctions were related to the partial-overlap cost, but S-R conjunctions were not. It is noteworthy that the conjunctions in Experiment 1 (where we were not able to distinguish between S-R and rule-S-R conjunctions) showed a similar priming pattern as the rule-S-R conjunctions in Experiment 2, suggesting integration of not just stimuli and responses, but also of rules in both experiments.
While our results provide temporal and functional information about specific representations, they are relatively silent about the underlying neural mechanisms or their neuroanatomic location. We had no strong a priori predictions about frequency bands that might contain conjunction-specific information and used a broad spectrum of frequencies for decoding. In post hoc analyses, we found that the pattern of EEG responses underlying conjunctive representations is idiosyncratic (SI Appendix, Fig. S5) but is most strongly expressed in the delta-band frequency signal (SI Appendix, Figs. S6 and S7). This latter result is generally consistent with previous evidence showing that decision-relevant representations can be decoded from oscillations in the delta band (25).
Regarding the question of neuroanatomic location, research with animal models points to the hippocampus as being particularly critical for representing highly contextualized, conjunctive information (6, 2627–28). There is also some evidence from human neuroimaging work that implicates the hippocampus in retrieving incidentally learned associations between actions and their consequences, albeit using paradigms that involve learning across longer time frames (29, 30). In addition, single-neuron electrophysiological work with nonhuman animals indicates that neurons coding task-relevant features are distributed across the frontal and parietal cortices (31, 32). A large proportion of recorded neurons in these areas integrate multiple features in a nonlinear manner (6, 33). Such heterogeneous neural responses allow efficient linear readout of information to downstream neurons and can also code conjunctive information in a high-dimensional format (34). In human neuroimaging work, attempts to decode high-level, task-relevant representations in frontal areas have proven more challenging (35).
An important finding from the research on mixed selectivity neurons is that the degree of nonlinear information coded in these neurons is functionally distinct from the representation of linear features. For example, nonlinear responses were found to be highly robust in correct trials but were largely missing in error trials, whereas simple linear information was equally present on correct and error trials (6, 7, 36). This pattern is consistent with our finding that the strength of conjunctive representations uniquely predicts trial-by-trial performance beyond the predictive strength of constituent, simple features (Figs. 3B and 4B). As mentioned earlier, further evidence for a functional dissociation comes from our finding that only conjunctive representations, and not the representations of constituent simple features, express the partial-overlap priming pattern (Figs. 3 C and D and 4 C and D).
Our results regarding the relevance of conjunctions for both efficient action selection and the partial-overlap priming pattern directly confirm the predictions derived from event file theory. Therefore, they provide an important missing link between two distinct lines of research: the relatively abstract event file conceptualization, designed to explain the architecture of human action selection, and the recent progress in characterizing the format of representations arising from mixed-selectivity neurons. For example, from the herein-established relationship between the partial-overlap pattern and the neural signature of event files, we can derive the testable prediction that the partial-overlap pattern should also be selectively expressed in the activity of mixed-selectivity neurons in animal models. In addition, our present results raise an array of new questions about the functional properties of conjunctive representations; we do not know how these representations are constrained by capacity limitations (37), how they respond to distracting information (7), to what degree they allow integration of action outcomes or goals (38), or how they change through experience (11) (SI Appendix, Fig. S9). The EEG decoding approach used here provides the tools to address these and related questions in human participants.
Additional information on the study methodology is provided in SI Appendix.
Forty-four individuals participated after providing written informed consent following the protocol approved by the University of Oregon’s Human Subjects Committee in exchange for remuneration of $10/h and additional performance-based incentives. Participants with a predefined criterion of >35% of trials with EEG artifacts were eliminated from further analysis, leaving 20 out of 22 participants for Experiment 1 and 21 out of 22 participants for Experiment 2.
Participants performed a cued rule-selection task in which one of the preinstructed action rules was randomly selected to determine possible S-R mappings on a trial-by-trial basis (11) (Fig. 1B). Based on the cued rule, participants responded to the location of a circle (1.32° radius) that randomly appeared in the corner of a white frame (6.6° off-center) by selecting one of the four response keys that were arranged in a 2 × 2 matrix. Each action rule specified four S-R links using a simple spatial transformation rule; for instance, the “vertical” rule mapped the top-left circle to the bottom-left response. To ensure that decoding of rule information was not driven by superficial, perceptual aspects, we used two cues for each rule, a pair of verbal cues in Experiment 1 and symbol/word pair in Experiment 2 (Fig. 1 C and E), which appeared in either even or odd trials to prevent immediate cue repetitions. In Experiment 1, “vertical,” “horizontal,” and “diagonal” rules were used (i.e., a 66.6% switch rate). In Experiment 2, for different rules, “vertical,” “horizontal,” “clockwise,” and “counterclockwise” rules were used (i.e., a 75% switch rate). This specific set of rules ensured that each S-R link occurred in two different rules (e.g., a top-left circle leads to a bottom-left response in both the vertical and the clockwise rule), allowing us to attempt to decode both rule-S-R conjunctions and rule-unspecific S-R conjunctions in Experiment 2 (Fig. 1 E and F).
We presented two practice blocks and 200 experimental blocks per experiment. Participants were instructed to complete as many correct trials as possible within each 16-s block. Trials that began within the 16 s were allowed to be completed.
EEG activity was recorded from 20 tin electrodes using the International 10/20 system and preprocessed to remove artifacts (SI Appendix, EEG Recording and Processing). Furthermore, temporal-spectral profiles of single-trial EEG data were obtained via complex wavelet analysis (39) by applying time-frequency analysis (1 to 35 Hz) to preprocessed EEG data (SI Appendix, Time-Frequency Analysis). This analysis resulted in a frequency band-specific power estimate at each sample point. As in our previous work (17), to prepare training data for the decoding analyses, we averaged five different frequency bands: 1 to 3 Hz for the delta band, 4 to 7 Hz for the theta band, 8 to 12 Hz for the alpha band, 13 to 30 Hz for the beta band, and 31 to 35 Hz for the gamma band. Within individuals, frequency-specific power values were z-transformed across electrodes in each sample to remove the effects that uniformly influenced all electrodes. While we had no a priori predictions about the role of specific frequency bands in representing different action-relevant representations, we present post hoc analyses probing the relevance of each frequency band in SI Appendix, Figs. S6 and S7.
To obtain information about the strength of each feature and conjunction on the level of individual trials and time points, we used a two-step procedure. First, we performed a linear decoding analysis to discriminate between all 12 different action constellations in Experiment 1 or all 16 constellations in Experiment 2. This analysis was conducted for each time point and used the average power of rhythmic EEG activity within the predefined frequency bands (delta, theta, alpha, beta, and gamma), generating 100 features (5 frequency bands × 20 electrodes) to train decoders. Following cross-validation, this decoding step yielded a vector of “confusion profiles” of classification probabilities for both the correct and all possible incorrect classifications and for each time point and trial (Fig. 1D). As a second step, we applied RSA (20) to each profile of classification probabilities to determine their underlying similarity structure for each time point and trial. Specifically, we regressed the classification probability vector onto model vectors as simultaneously entered predictors, which were derived from a set of RSA model matrices (Fig. 1D).
Each model matrix represented a potential underlying representation. In Experiment 1, we constructed RSA models for the rules, stimuli, responses, and conjunctions (Fig. 1D). In Experiment 2, we used separate matrices for the rule-specific S-R conjunction model (rule-S-R conjunction) and the rule-independent S-R conjunction model (S-R conjunction) (Fig. 1F). Complete orthogonalization of basic features could be established within each of two equal-sized subspaces but not across the entire space of action constellations. Specifically, one subspace (G1 in Fig. 1E) contained constellations with stimuli at the top-left or bottom-right corner (leading to a bottom-left or bottom-right response for all rules), whereas the second subspace (G2 in Fig. 1E) contained trials with stimuli at the left-bottom or top-right corner (leading to a top-left or bottom-right response). Within each subspace, conjunctions were defined by the combination of four rules (vertical, horizontal, clockwise, and counterclockwise), two stimulus positions, and two responses, ensuring that each S-R link could occur in the context of two different action rules.
For the results shown in Figs. 3C and 4C and in Figs. 3D and 4D, we used multilevel linear modeling to analyze within-subject variability in RSA scores as a function of trial-to-trial transition variables (Figs. 3C and 4C), or in RTs as a function of trial n -1 RSA scores and trial-to-trial transition variables (Figs. 3D and 4D). In each case, subject-specific intercepts and slopes were included as random effects. Log-transformed RTs as dependent variables were prewhitened by linear and quadratic trends of experimental trials and blocks. We performed statistical tests for a priori selected time intervals: cue-to-stimulus period from the onset of cue to the onset of stimulus (−300 to 0 ms for Experiment 1 and −500 to 0 ms for Experiment 2), early poststimulus period (0 to 300 ms of the poststimulus segment for both experiments), and late poststimulus period (300 to 600 ms of the poststimulus segment for both experiments). We predicted trial-to-trial RTs/RSA scores in the current trials with EEG signals from prestimulus and early poststimulus periods to capture processing before response execution (SI Appendix, Fig. S8 presents results using signals aligned to the response onsets). The late poststimulus interval was used to assess how partial-overlap costs are modulated by the strength of action representations developed during selection in n -1 trials (Figs. 3D and 4D). In addition, to visualize the impact of different decoded features on RTs across time, we ran fixed-effect models plus random intercepts at each sample point, but without random slopes (Figs. 3B and 4B).
All data and analysis scripts related to this paper are available in Open Science Framework (40).
This research was supported by National Institute on Aging Grant R01 AG037564-01A1 and by NSF Grant 1734264.