To maximize future rewards in this ever-changing world, animals must be able to discover the temporal structure of stimuli and then anticipate or act correctly at the right time. How do animals perceive, maintain, and use time intervals ranging from hundreds of milliseconds to multiseconds in working memory? How is temporal information processed concurrently with spatial information and decision making? Why are there strong neuronal temporal signals in tasks in which temporal information is not required? A systematic understanding of the underlying neural mechanisms is still lacking. Here, we addressed these problems using supervised training of recurrent neural network models. We revealed that neural networks perceive elapsed time through state evolution along stereotypical trajectory, maintain time intervals in working memory in the monotonic increase or decrease of the firing rates of interval-tuned neurons, and compare or produce time intervals by scaling state evolution speed. Temporal and nontemporal information is coded in subspaces orthogonal with each other, and the state trajectories with time at different nontemporal information are quasiparallel and isomorphic. Such coding geometry facilitates the decoding generalizability of temporal and nontemporal information across each other. The network structure exhibits multiple feedforward sequences that mutually excite or inhibit depending on whether their preferences of nontemporal information are similar or not. We identified four factors that facilitate strong temporal signals in nontiming tasks, including the anticipation of coming events. Our work discloses fundamental computational principles of temporal processing, and it is supported by and gives predictions to a number of experimental phenomena.
Much information that the brain processes and stores is temporal in nature. Therefore, understanding the processing of time in the brain is of fundamental importance in neuroscience (123–4). To predict and maximize future rewards in this ever-changing world, animals must be able to discover the temporal structure of stimuli and then flexibly anticipate or act correctly at the right time. To this end, animals must be able to perceive, maintain, and then use time intervals in working memory, appropriately combining the processing of time with spatial information and decision making (DM). Based on behavioral data and the diversity of neuronal response profiles, it has been proposed (5, 6) that time intervals in the range of hundreds of milliseconds to multiseconds can be decoded through neuronal population states evolving along transient trajectories. The neural mechanisms may be accumulating firing (7, 8), synfire chains (9, 10), the beating of a range of oscillation frequencies (11), etc. However, these mechanisms are challenged by recent finding that animals can flexibly adjust the evolution speed of population activity along an invariant trajectory to produce different intervals (12). Through behavioral experiments, it was found that humans can store time intervals as distinct items in working memory in a resource allocation strategy (13), but an electrophysiological study on the neuronal coding of time intervals maintained in working memory is still lacking. Moreover, increasing evidence indicates that timing does not rely on dedicated circuits in the brain but instead, is an intrinsic computation that emerges from the inherent dynamics of neural circuits (3, 14). Spatial working memory and DM are believed to rely mostly on a prefrontoparietal circuit (15, 16). The dynamics and the network structure that enable this circuit to combine spatial working memory and DM with flexible timing remain unclear. Overall, our understanding of the processing of time intervals in the brain is fragmentary and incomplete. It is, therefore, essential to develop a systematic understanding of the fundamental principle of temporal processing and its combination with spatial information processing and DM.
The formation of temporal signals in the brain is another unexplored question. Strong temporal signals were found in the brain even when monkeys performed working memory tasks where temporal information was not needed (17181920–21). In a vibrotactile working memory task (17), monkeys were trained to report which of the two vibrotactile stimuli separated by a fixed delay period had higher frequency (Fig. 1D). Surprisingly, although the duration of the delay period was not needed to perform this task, temporal information was still coded in the neuronal population state during the delay period, with the time-dependent variance explaining more than 75% of the total variance (18, 19). A similar scenario was also found in other nontiming working memory tasks (1920–21). It is unclear why such strong temporal signals arose in nontiming tasks.
Previous works showed that, after being trained to perform tasks such as categorization, working memory, DM, and motion generation, artificial neural networks (ANNs) exhibited coding or dynamic properties surprisingly similar to experimental observations (222324–25). Compared with animal experiments, ANN can cheaply and easily implement a series of tasks, greatly facilitating the test of various hypotheses and the capture of common underlying computational principles (26, 27). In this paper, we trained recurrent neural networks (RNNs) (Fig. 1A) to study the processing of temporal information. First, by training networks on basic timing tasks, which require only temporal information to perform (Fig. 1B), we studied how time intervals are perceived, maintained, and used in working memory. Second, by training networks on combined timing tasks, which require both temporal and nontemporal information to perform (Fig. 1C), we studied how the processing of time is combined with spatial information processing and DM, the influence of this combination on decoding generalizability, and the network structure that this combination is based on. Third, by training networks on nontiming tasks (Fig. 1D), we studied why such a large time-dependent variance arises in nontiming tasks, thereby understanding the factors that facilitate the formation of temporal signals in the brain. Our work presents a thorough understanding of the neural computation of time.
We trained an RNN of 256 soft plus units supervisedly using back propagation through time (28). Self-connections of the RNN were initialized to one, and off-diagonal connections were initialized as independent Gaussian variables with mean of zero (27), with different training configurations initialized using different random seeds. The strong self-connections supported self-sustained activity after training (SI Appendix, Fig. S1B), and the nonzero initialization of the off-diagonal connections induced sequential activity comparable with experimental observations (27). We stopped training as soon as the performance of the network reached criterion (23, 25) (performance examples are in SI Appendix, Fig. S1).
In the interval production (IP) task (the first task of Fig. 1B), the network was to perceive the interval between the first two pulses, maintain the interval during the delay epoch with variable duration, and then produce an action at time after the go cue. Neuronal activities after training exhibited strong fluctuations (Fig. 2A). In the following, we report on the dynamics of the network in the perception, delay, and production epochs of IP (Fig. 1 has an illustration of these epochs).
The first epoch is the perception epoch. In response to the first stimulus pulse, the network started to evolve from almost the same state along an almost identical trajectory in different simulation trials with different values until another pulse came (Fig. 2B); the activities of individual neurons before the second pulse in different trials highly overlapped (Fig. 2 C and D). Therefore, the network state evolved along a stereotypical trajectory starting from the first pulse, and the time interval between the first two pulses can be read out using the position in this trajectory when the second pulse came. Behaviorally, a human’s perception of the time interval between two acoustic pulses is impaired if a distractor pulse appears shortly before the first pulse (29). A modeling work (29) explained that this is because successful perception requires the network state to start to evolve from near a state in response to the first pulse, whereas the distractor pulse kicks the network state far away from . This explanation is consistent with our results that interval perception requires a stereotypical trajectory.
We then studied how the information of timing interval between the first two pulses was maintained during the delay epoch. We have the following findings. 1) The speeds of the trajectories decreased with time in the delay epoch (Fig. 2 E and F). 2) The states at the end of the delay epoch at different values were aligned in a manifold in which the first principal component (PC) explained 90% of its variance (Fig. 2G). 3) For a specific simulation trial, the position of in manifold linearly encoded the value of the trial (Fig. 2H). 4) The distance between two adjacent trajectories kept almost unchanged with time during the delay neither decayed to zero nor exploded (Fig. 2I): this stable dynamics supported the information of encoded by the position in the stereotypical trajectory at the end of the perception epoch in being maintained during the delay. Collectively, approximated a line attractor (24, 30) with slow dynamics, and was encoded as the position in . To better understand the scheme of coding in , we classified neuronal activity in manifold as a function of into three types (Fig. 2 J and K): monotonically decreasing (MoD), monotonically increasing (MoI), and nonmonotonic (non-M) (Materials and Methods). We found that most neurons were MoD or MoI, whereas only a small portion were non-M neurons (Fig. 2K). This implies that the network mainly used a complementary (i.e., concurrently increasing and decreasing) monotonic scheme to code time intervals in the delay epoch, similar to the scheme revealed in refs. 17 and 31. This dominance of monotonic neurons may be the reason why the first PC of explained so much variance (Fig. 2G); SI Appendix, section S2 and Fig. S2 G and H has a simple explanation.
In the production epoch, the trajectories of the different values tended to be isomorphic (Fig. 2L). The neuronal activity profiles were self-similar when stretched or compressed in accordance with the produced interval (Fig. 2M), suggesting temporal scaling with (12). To quantify this temporal scaling, we defined the scaling index (SI) of a subspace as the portion of variance of the projections of trajectories into that can be explained by temporal scaling (12). We found that the distribution of SI of individual neurons aggregated toward one (SI Appendix, Fig. S2B), and the first two PCs that explained most variance have the highest SI (SI Appendix, Fig. S2C). We then used a dimensionality reduction technique that furnished a set of orthogonal directions (called scaling components [SCs]) in the network state space that were ordered according to their SI (Materials and Methods). We found that a subspace (spanned by the first three SCs) that had high SI (=0.98) occupied about 40% of the total variance of trajectories (Fig. 2N), in contrast with the low SI of the perception epoch (SI Appendix, Fig. S2F). The average speed of the trajectory in the subspace of the first three SCs was inversely proportional to (Fig. 2O). Collectively, the network adjusted its dynamic speed to produce different time intervals in the production epoch, similar to observations of the medial frontal cortex of monkeys (12, 32). Additionally, we found a nonscaling subspace in which mean activity during the production epoch changed linearly with (SI Appendix, Fig. S2 D and E), also similar to the experimental observations in refs. 12 and 32.
In the interval comparison (IC) task (the second task of Fig. 1B), the network was successively presented two intervals; it was then required to judge which interval was longer. IC required the network to perceive the time interval of the stimulus 1 epoch, to maintain the interval in the delay epoch, and to use it in the stimulus 2 epoch in which duration is . Similar to IP, the network perceived time interval with a stereotypical trajectory in the stimulus 1 epoch (SI Appendix, Fig. S3 A–C) and maintained time interval using attractor dynamics with a complementary monotonic coding scheme in the delay epoch (SI Appendix, Fig. S3 D–H). The trajectory in the stimulus 2 epoch had a critical point at time after the start of stimulus 2. The network was to give different comparison (COMP) outputs at the go epoch depending on whether or not the trajectory had passed at the end of stimulus 2. To make a correct COMP choice, only the period from the start of stimulus 2 to (or to the end of stimulus 2 if ) needs to be timed: as long as the trajectory had passed , the network could readily make the decision that , with no more timing required. After training, we studied the trajectories from the start of stimulus 2 to in the cases that and found temporal scaling (SI Appendix, Fig. S3 J–N) similar to the production epoch of IP consistently with animal experiments (33, 34). These similarities between IP and IC on how to perceive, maintain, and use time intervals imply universal computational schemes for neural networks to process temporal information.
The average speed of the trajectory after increased with (SI Appendix, Fig. S3O), whereas the speed before decreased with : this implies that the dynamics after was indeed different from that before.
It is a ubiquitous phenomenon that neural networks encode more than one quantity simultaneously (3536–37). In this subsection, we will discuss how neural networks encode temporal and spatial information (or decision choice) simultaneously, which enables the brain to take the right action at the right time.
In the timed spatial reproduction (t-SR) task (the first task in Fig. 1C), the network was to not only take action at the desired time but also, act at the spatial location indicated by the first pulse. Similar to IP and IC, the network used stereotypical trajectories, attractors, and speed scaling to perceive, maintain, and produce time intervals (SI Appendix, Fig. S4). In the following, we will focus on the coding combination of temporal and spatial information.
In the perception epoch, under the two cases when the first pulse was at two locations and separately, the activities and of the th neuron exhibited similar profiles with time (Fig. 3B), especially when and had close values. In our simulation, the location of the first pulse was represented by a Gaussian bump with standard deviation (SD) of 2., which is much smaller than the smallest spatial distance 6 between two different colors in Fig. 3 A and B; thus, the similarity of the temporal profiles in Fig. 3B should not result from the overlap of the sensory inputs from the first pulse but rather, should emerge during training.
To quantitatively investigated the coding combination of temporal and spatial information, we studied the first temporal flow PC (F-PC1) of the neuronal population, namely the first PC of , and the first spatial PC (S-PC1), namely the first PC of , with indicating averaging over parameter . By temporal flow, we mean the time elapsed from the beginning of a specific epoch. We found that the angle between F-PC1 and S-PC1 distributed around , significantly larger than (Fig. 3D). This indicates that temporal flow and spatial information were coded in almost orthogonal subspaces (Fig. 3C). We then studied the mixed variance (19). Specifically, the variance explained by temporal (or spatial) information is [or ], and the mixed variance is , where is the total variance. We found that the mixed variance took a small portion of the total variance, smaller than the variance of either temporal or spatial information (Fig. 3E). To understand the implication of this result, we noted that a sufficient condition for is that different isospace (or isotemporal flow) lines are related to each other through translational movement (Fig. 3 F, Upper Left), where an isospace (or isotemporal flow) line is a manifold in the state space with different temporal flow (or space) values but a fixed space (or temporal flow) value. The opposite extreme case implies that different isospace (or isotemporal flow) lines are strongly intertwined; SI Appendix, section S3 has details. Together, orthogonality and small mixed variance suggest that isospace and isotime lines interweave into rectangle-like grids (Fig. 3 F, Lower); SI Appendix, Fig. S6 has illustrations of the simulation results.
In the delay epoch, the population states were attracted toward a manifold of slow dynamics at the end of the delay epoch (Fig. 2 E–I and SI Appendix, Fig. S4 B–D), maintaining both the duration of the perception epoch and the spatial information . We studied the coding combination of and in in a similar way to above. We found that the first time interval PC (I-PC1), namely the first PC to code , was largely orthogonal with S-PC1 (Fig. 3G), and the mixed variance between and was small (Fig. 3H).
In the production epoch, the network needed to maintain three pieces of information: temporal flow , time interval , and spatial location . We studied the angle between the first PCs of any two of them (i.e., F-PC1, I-PC1, and S-PC1). We found that S-PC1 was orthogonal with F-PC1 and I-PC1 but that F-PC1 and I-PC1 was not orthogonal (Fig. 3I). For any two parameters, their mixed variance was smaller than the variance of their own (Fig. 3J); Materials and Methods has details.
Collectively, in all of the three epochs, the coding subspaces of temporal and spatial information were largely orthogonal with small mixed variance, suggesting rectangle-like grids of isospace and isotime lines; SI Appendix, Fig. S6 has illustrations.
In the timed decision-making (t-DM) task (the second task in Fig. 1C), the network was to make a decision choice at the desired time to indicate which of the two presented stimuli was stronger. Similar to IP, IC, and t-SR, the network used stereotypical trajectories, attractors, and speed scaling to separately perceive, maintain, and produce time intervals (SI Appendix, Fig. S5 A–M). In all of the three epochs of t-DM, the first PC to code decision choice was orthogonal with F-PC1 or I-PC1, and the mixed variance between any two parameters was small (SI Appendix, Fig. S5 N–S); however, F-PC1 and I-PC1 in the production epoch were not orthogonal (SI Appendix, Fig. S5R). These results are all similar to those of the t-SR task.
We then studied how the above geometry of coding space influences decoding generalizability: suppose that the population state space is parameterized by and ; we want to know the error of decoding from a state in an iso- line after training the decoder using another iso- line (Fig. 4A). We considered two types of nearest-centroid decoders (20). Decoder 1 projects both and into the first PC of and reads the value of to be the value that minimized the distance between and , where indicates the projection operation. Decoder 2 first translationally moves the whole iso- line so that the mass center of coincides with that of , where indicates the translation operation, and then reads according to (Fig. 4A). Apparently, zero error of Decoder 1 requires and to perfectly overlap. If the grids woven by iso- and iso- lines are tilted (Fig. 3 F, Upper Left ) or nonparallelogram like (Fig. 3 F, Upper Right), which can be quantified by the orthogonality or mixed variance ratio, respectively, introduced in the above section, the projections and may have nonoverlapping mass centers (Fig. 4 B, Upper ) or different lengths (Fig. 4 B, Lower), causing decoding error. Decoder 2 translationally moves the mass center of to the position of that of , and therefore, its decoding error only depends on the nonparallelogram likeness of grids. Biologically, the projection onto the first PC of can be realized by Hebbian learning of decoding weights (38), the nearest-centroid scheme can be realized by winner-take-all DM (20), and the overlap of the mass centers in Decoder 2 can be realized by homeostatic mechanisms (39) to keep the mean neuronal activity over different iso- lines unchanged (SI Appendix, Eq. S19).
Consistent with the decoding scenario above, when decoding temporal flow generalizing across spatial information in the production epoch of t-SR, the error of Decoder 1 negatively correlated with the angle between F-PC1 and S-PC1 and positively correlated with the portion of mixed variance (Fig. 4 C and D); the error of Decoder 2 depended weakly on and positively correlated with (Fig. 4 E and F). Materials and Methods has details. Thanks to the angle orthogonality and small mixed variance (Fig. 3), both decoders have above-chance performance (Fig. 4G). Additionally for both t-SR and t-DM tasks, we studied the decoding generalization of temporal (nontemporal) information across nontemporal (temporal) information in all of the perception, delay, and production epochs. In all cases, we found how the decoding error depended on the angle between the first PCs of the decoded and generalized variables, and the mixed variance followed a similar scenario to that above (SI Appendix, Figs. S7 and S8).
A common feature of the network dynamics in all of the epochs of the four timing tasks above was neuronal sequential firing (Fig. 5A and SI Appendix, Fig. S9 A–C). We ordered the peak firing time of the neurons and then measured the recurrent weight as a function of the order difference between two neurons. We found, on average, stronger connections from earlier- to later-peaking neurons than from later- to earlier-peaking neurons (Fig. 5B and SI Appendix, Fig. S9 D–F) (27, 40, 41). To study the network structure that supported the coding orthogonality of temporal flow and nontemporal information in the perception and production epochs of t-SR (or t-DM), we classified the neurons into groups according to their preferred spatial location (or decision choice). Given a neuron and a group of neurons ( may or may not belong to ), we ordered their peak times and investigated the recurrent weight from to each neuron of (except itself if ); Materials and Methods has details. In this way, we studied the recurrent weight as a function of the difference between the peak orders of post- and presynaptic neurons and the difference of their preferred nontemporal information (Fig. 5 C and D). In t-SR, first, exhibited similar asymmetry as that in IP (Fig. 5B) (positive if and negative if ), which drove sequential activity. Second, decreased with and became negative when was large enough (Fig. 5C). Together, the network of t-SR can be regarded as of several feedforward sequences, with two sequences exciting or inhibiting each other depending on whether their spatial preferences are similar or far different. The sequential activity coded the flow of time, and the short-range excitation and long-range inhibition maintained the spatial information (42). A similar scenario also existed in the network of t-DM (Fig. 5D), where the sequential activity coded the flow of time, and the inhibition between the sequences of different decision preferences provided the mutual inhibition necessary for making decisions (43).
The scenario that feedforward structure hidden in recurrent connections drives sequential firing has been observed in a number of modeling works (27, 40, 41). Our work extends this scenario to the interaction of multiple feedforward sequences, which can code temporal flow and nontemporal information simultaneously.
We have shown that in the perception and production epochs of t-SR and t-DM, when the network is required to record the temporal flow and maintain the nontemporal information simultaneously, neuronal temporal profiles exhibit similarity across nontemporal information (Fig. 3B and SI Appendix, Fig. S5A) and that the subspaces coding temporal flow and nontemporal information are orthogonal with small mixed variance (Fig. 3 D, E, I, and J). Interestingly, in tasks that do not require temporal information to perform, such profile similarity, orthogonality, and small mixed variance were also experimentally observed (18, 19). Moreover, the time-dependent variance explained more than 75% of the total variance in some nontiming tasks (18, 19). It would be interesting to ask why nontiming tasks developed such strong temporal signals, thereby understanding the factors that facilitate the formation of time sense of animals.
Before we studied the reasons for the strong temporal signals observed in nontiming tasks, we studied how the requirement of temporal processing influences the temporal signal strength. To this end, we studied the spatial reproduction (SR) task, where the network was to reproduce the spatial location immediately after a fixed delay (Fig. 6 A, Left , first row), and the DM task, where the network was to decide which stimulus was stronger immediately after the presentation of two stimuli (Fig. 6 A, Right , first row). Unlike t-SR (or t-DM) (Fig. 1C), SR (or DM) did not require the network to record time between the two pulses (or during the presentation of the two stimuli). We used to be the portion of time-dependent variance in the total variance , with being the firing rate of the th neuron at time and nontemporal information . We compared the portions and in the perception epochs of t-SR and t-DM with the portion in the delay epoch of SR and the during the presentation of stimuli in DM. We found that and (Fig. 6B). Therefore, temporal signals are stronger in timing tasks. However, even in the two timing tasks t-SR and t-DM that we studied, the portion of time-dependent variance was smaller than the portion (75%) experimentally observed in nontiming tasks (18, 19) (Fig. 6B). Therefore, there should exist other factors than the timing requirement that are important to the formation of temporal signals.
Specifically, we studied the following four factors: 1) temporal complexity of task, 2) overlap of sensory input, 3) multitasking, and 4) timing anticipation.
Temporal complexity measures the complexity of spatiotemporal patterns that the network receives or outputs in a task (27). To test the influence of temporal complexity on the strength of temporal signals, we designed COMP and change detection (CD) tasks that enhanced the temporal complexity of SR and a cue-dependent decision-making (cue-DM) task that enhanced the temporal complexity of DM (Fig. 6A). In COMP, the network was to report whether the spatial coordinate of the stimulus presented before the delay was smaller or larger than that of the stimulus presented after the delay, consistent with its vibrotactile version (18) (Fig. 1D). In CD, the network was to report whether the two stimuli presented before and after the delay were the same (21). In cue-DM, the network was to report the index of the stronger or weaker stimulus depending on the cue flashed at the end of the presented stimuli. COMP and CD have higher temporal complexity than SR because the output not only depends on the stimulus before the delay but also, the stimulus after the delay. Similarly, cue-DM has higher temporal complexity than DM because the output also depends on the cue. We found that , , and , suggesting that temporal complexity increases the portion of time-dependent variance (Fig. 6C). It has been empirically found that the task temporal complexity increases the temporal fluctuations in neuronal sequential firing (27). Here, we showed that the temporal fluctuation of the average neuronal activity over nontiming information also increases with the temporal complexity of the task. The result is consistent with the experimental observation that the population state varied more with time in COMP than in SR (20).
Suppose that the population states of the sensory neurons in response to two stimuli and are and , respectively. If and have high overlap, then the evolution trajectories of the recurrent network in response to and should be close to each other. In this case, the variance of the trajectories induced by the stimulus difference is small, and the time-dependent variance explains a large portion of the total variance. To test this idea, we broadened the Gaussian tuning curves of the sensory neurons in t-SR, SR, COMP, and CD tasks and found an increased portion of time-dependent variance (Fig. 6D).
The brain has been well trained on various timing tasks in everyday life, and therefore, the animal may also have a sense of time when performing nontiming tasks, which increases the time-dependent variance. To test this hypothesis, we trained networks on t-SR and SR concurrently so that the network could perform either t-SR or SR indicated by an input signal (26). We also trained t-DM and DM concurrently. We only considered these two task pairs because the two tasks in each pair share the same number and type of inputs and outputs (except for a scalar go cue input in t-SR and t-DM); hence, they do not require any changes in the network architecture. We found that both and were larger in networks that were also trained on timing tasks than in networks trained solely on nontiming tasks (Fig. 6E).
In the working memory experiments that observed strong time-dependent variance (18, 19), the delay period had fixed duration. This enabled the animals to learn this duration after long-term training and predict the end of the delay, thereby getting ready to take actions or receive new stimuli toward the end of the delay. If the delay period is variable, then the end of the delay will no longer be predictable. We found that the temporal signals in fixed delay tasks were stronger than those in variable delay tasks (Fig. 6F), which suggests that timing anticipation is a reason for strong temporal signals. A possible functional role of timing anticipation is anticipatory attention: a monkey might pay more attention to its finger or a visual location when a vibrotactile or visual cue was about to come toward the end of the delay to increase its sensitivity to the stimulus. To study the influence of this anticipatory attention to the formation of temporal signals, we supposed feedback connections from the recurrent network to the sensory neurons in our model (Fig. 6G). Feedback currents could reduce the firing thresholds of sensory neurons through disinhibition mechanism (44). We also added L2 regularization on the feedback current (Fig. 6G) to reduce the energy cost of the brain (Materials and Methods ). After training, the feedback current stayed at a low level to reduce the energy cost but became high when the cue was about to come to increase the sensitivity of the network (Fig. 6I). We found that adding this feedback mechanism increased the portion of time-dependent variance (Fig. 6H) because the feedback current, which increased with time toward the end of the delay (Fig. 6I), provided a time-dependent component of the population activity.
Collectively, other than the timing requirement in timing tasks, we identified four possible factors that facilitate the formation of strong temporal signals: 1) high temporal complexity of tasks, 2) large sensory overlap under different stimuli, 3) transfer of timing sense due to multitasking, and 4) timing anticipation.
In summary, neural networks perceive time intervals through stereotypical dynamic trajectories, maintain time intervals by attractor dynamics in a complementary monotonic coding scheme, and perform IP or IC by scaling evolution speed. Temporal and nontemporal information is coded in orthogonal subspaces with small mixed variance, which facilitates decoding generalization. The network structure after training exhibits multiple feedforward sequences that mutually excite or inhibit depending on whether their preferences of nontemporal information are similar or not. We identified four possible factors that facilitate the formation of strong temporal signals in nontiming tasks: temporal complexity of task, overlap of sensory input, multitasking, and timing anticipation.
In the perception epoch, the network evolved along a stereotypical trajectory after the first pulse (Fig. 2 B and C). Consistently, some neurons in the prefrontal cortex and striatum prefer to peak their activities around specific time points after an event (45). In the brain, such a stereotypical trajectory may be formed by not only neuronal activity state but also, synaptic state, such as slow synaptic current or short-term plasticity. The temporal information coded by the evolution of synaptic state can be read out by the network activity in response to a stimulus (29, 46, 47).
We also studied the trajectory speed with time during the perception epoch. We found that, after a transient period, the speed in the IP task stayed around a constant value (SI Appendix, Fig. S2A), the speed in the IC and t-DM tasks increased with time (SI Appendix, Figs. S3C and S5C), and the speed in the t-SR task decreased with time (SI Appendix, Fig. S4B). Therefore, we did not make any general conclusion on the trajectory speed when the network was perceiving time intervals.
The temporal scaling when producing or comparing intervals has been observed in animal experiments (12, 32, 34). A possible reason why temporal scaling exhibits in both the production epoch of IP and the stimulus 2 epoch of IC (Fig. 2 L–O and SI Appendix, Fig. S3 I–N) is that both epochs require the network to compare the currently elapsed time with the time interval maintained in working memory. In IC, the decision choice is switched as soon as the trajectory has passed the critical point at which the elapsed time equals the maintained interval ; in IP, the network is required to output a movement as soon as : both tasks share a DM process around the time point. This temporal scaling enables generalizable decoding of the portion of the elapsed time (SI Appendix, Figs. S7G and S8G), which enables people to identify the same speech or music played at different speeds (48, 49).
When the to be produced interval gets changed, the trajectory in the perception epoch is truncated or lengthened (Fig. 2 B–D), whereas the trajectory in the production epoch is temporally scaled (Fig. 2 L–O). This difference helps us to infer the psychological activity of the animal. For example, in the fixed delay working memory task when the delay period was changed, the neuronal activity during the delay was temporally scaled (18). This implies that the animals had already learned the duration of the delay and were actively using this knowledge to anticipate the coming stimulus instead of passively perceiving time. However, this anticipation was not feasible before the animal had learned the delay duration. Therefore, we predict that, at the beginning of training, the animal perceived time using stereotypical trajectory, and the scaling phenomenon gradually emerged during training.
Temporal and nontemporal information is coded orthogonally with small mixed variance (Fig. 3). Physically, time is consistently flowing regardless of the nontemporal information, and much information is also invariant with time. The decoding generalizability that resulted from this coding geometry (Fig. 4) helps the brain to develop a shared representation of time across nontemporal information or a shared representation of nontemporal information across time using a fixed set of readout weights. Decoding generalizability of nontemporal information across time has been studied in working memory tasks (20, 50) and has been considered as an advantage of working memory models in which information is maintained in stable or a stable subspace of neuronal activity (20, 51). Here, we showed that, with this geometry, such advantage also exists for reading out temporal information.
Interestingly, the orthogonality and small mixed variance have been experimentally observed in nontiming tasks (18, 19), and therefore, their formation does not seem to depend on the timing task requirement. Consistently, in the delay epoch of t-SR and t-DM, although the network needed not to record the temporal flow to perform the tasks, temporal flow and nontemporal information were still coded orthogonally with small mixed variance (SI Appendix, Fig. S10 A–D). By comparing the orthogonality and mixed variance in the perception epoch of t-SR and t-DM with those in the nontiming tasks (Fig. 6A), we found that timing task requirement did not influence the orthogonality but generally reduced the mixed variance (SI Appendix, Fig. S10 E and F). The network structure in nontiming tasks also exhibited interacting feedforward sequences (SI Appendix, Fig. S10 G–K).
Our results concerning the various factors that affect the strength of temporal signals in nontiming tasks lead to testable experimental predictions. The result of sensory overlap (Fig. 6D) implies that sensory neurons with large receptive fields are essential to the strong temporal signals. The result of multitasking (Fig. 6E) implies that animals better trained on timing or music have stronger temporal signals when performing nontiming tasks. The result of temporal complexity (Fig. 6C) implies that animals have stronger temporal signals when performing tasks with higher temporal complexity, which is consistent with some experimental clues (20). The result of timing anticipation (Fig. 6F), consistent with ref. 27, implies that, if the appearance of an event is unpredictable, then the temporal signals should be weakened. Other than anticipatory attention (Fig. 6G), anticipation may influence the temporal signals through other mechanisms. In the fixed delay COMP task (Fig. 1D), suppose that a stimulus appeared before the delay; then, both the population firing rate and the information about in the population state increase toward the end of the delay period (52). It is believed that this is because the information about was stored in short-term potentiated synapses in the middle of the delay to save the energetic cost of neuronal activity, while they got retrieved into the population state near the end of the delay period to facilitate information manipulation (53). This storing and retrieving process may also be a source of temporal signals.
We have discussed the processing of single time intervals using our model. However, recent evidence implies that the brain may use different neural substrates and mechanisms to process regular beats from single time intervals (54, 55). Dynamically, in the medial premotor cortices, different regular tapping tempos are coded by different radii of circular trajectories that travel at a constant speed (55), which is different from the stereotypical trajectory or speed scaling scenario revealed in our model (Fig. 2 B–D and L–O). The distribution of the preferred intervals of the tapping interval-tuned neurons is wide, peaking around 850 ms (56, 57), which is also different from the complementary monotonic tuning scenario in our model (Fig. 2 J and K). Additionally, humans tend to use a counting scheme to estimate single time intervals when the interval duration is longer than 1,200 ms (58), which implies that the beat-based scheme is mentally used to reduce the estimation error of single long intervals even without external regular beats. However, after we trained our network model to produce intervals up to 2,400 ms, it processed intervals between 1,200 and 2,400 ms in similar schemes (SI Appendix, Fig. S11) to that illustrated in Fig. 2 for intervals below 1,200 ms. All of these results suggest the limitation of our model to explain beat-based timing. Modeling work on beat-based timing is the task of future research.
Methods are provided in SI Appendix as well as SI Appendix, Figs. S1–S11. In SI Appendix, section S1, we present the details of our computational model, including the network structure, the tasks to be performed, and the methods that we used to train the network model. We also present the details to analyze the interval coding scheme in the delay and production epochs (Fig. 2), the coding combination of temporal and nontemporal information (Figs. 3 and 6B), and decoding generalization (Fig. 4) as well as the firing and structural sequences (Fig. 5). We also explain the relationship between the monotonic coding at the end of the delay epoch and the low dimensionality of the attractor (Fig. 2 G and K) as well as the geometric meaning of mixed variance (Fig. 3) in SI Appendix. Computer code is available from https://github.com/zedongbi/IntervalTiming.
This work was supported by the Hong Kong Baptist University (HKBU) Strategic Development Fund and HKBU Research Committee and Interdisciplinary Research Clusters Matching Scheme 2018/19 (RC-IRCMs/18-19/SCI01). This research was conducted using the resources of the High-Performance Computing Cluster Center at HKBU, which receives funding from the Hong Kong Research Grant Council and HKBU. Z.B. is supported by the start-up fund of the Institute for Future, Qingdao University.