Nucleosomes are biomolecular complexes formed by DNA wrapped around histone proteins. They represent the basic units of Eukaryotic chromosomes, compacting the genome so that it fits into the small nucleus, and regulating important biological processes such as gene expression. Nucleosomes are disassembled during disruptive events such as DNA replication, and re-assembled afterwards to preserve the correct organization of chromatin. However, the molecular details of nucleosome assembly are still not well understood. In particular, experiments found that histones and DNA may associate into a variety of non-canonical complexes, but their precise conformation and role during assembly remain unclear. In this study, we addressed these problems by performing extensive molecular dynamics simulations of nucleosomes undergoing assembly and disassembly. The simulations reveal many insights into the kinetics of assembly, the structure of non-canonical nucleosome intermediates, and the influence of salt concentration and DNA sequence on the assembly process.
Nucleosomes are the structural units of chromatin, each consisting of ~147 base pairs of DNA wrapped around a protein histone octamer made of one H3/H4 tetramer and two H2A/H2B dimers [1,2]. Nucleosomes not only enable the efficient compaction of the long Eukaryotic genomic DNA into the nucleus, but also influence key biological process by controlling the access to DNA by other proteins , for example during transcription . Such important functions rely on the establishment of precise epigenetic patterns of nucleosome positions, histone variants and histone modifications, which mark the distinction between active and repressed chromatin, and between different genomic regions, e.g., promoters and gene bodies . For example, the electrostatic interactions between histone tails and DNA  can be tuned through post-translational modifications such as acetylation , controlling nucleosome unwrapping and therefore access to the genome. In order to preserve the identity of each cell, essential in multi-cellular organisms, this genomic organization has to be maintained  despite the continuous disassembly of nucleosomes over the course of DNA transcription  and replication . However, the molecular details of nucleosome assembly and disassembly have not been yet fully elucidated.
In vitro experiments showed that the salt-induced disassembly of nucleosomes starts with DNA unwrapping and it is then followed by the progressive loss of the two H2A/H2B dimers , therefore forming firstly an asymmetric hexasome, and finally a tetrasome. Assembly proceeds in the opposite direction, starting with the binding of a H3/H4 tetramer, then followed by the binding of a H2A/H2B dimer on each side . This order of events is due to the specific geometry of the nucleosome (with the H2A/H2B dimers surrounding the central H3/H4 tetramer), and to the more extensive network of protein-DNA hydrogen bonds on H3/H4 relative to H2A/H2B [1,13]. A similar assembly pathway was suggested to occur also in vivo , except that in this case the process is facilitated by histone chaperones limiting non-nucleosomal histone-DNA interactions , or by chromatin remodelers . However, many experiments paint a picture of assembly (or disassembly) more complicated than a simple 2-step process involving tetrasomes, hexasomes and complete nucleosomes, highlighting the existence of various partially-assembled or non-canonical nucleosome structures, such as pre-nucleosomes , right-handed nucleosomes , remodeler-associated fragile nucleosomes , and partially opened nucleosomes . Sub-nucleosomal conformations are also widely distributed across the genome in vivo , and their formation has been shown to depend in part on the underlying DNA sequence . How such structures form and interconvert between each other remains unclear.
Characterizing the precise molecular details of nucleosome intermediates and the factors controlling their formation, e.g., salt concentration and DNA sequence, would greatly aid our understanding of how chromatin organization is established and maintained. In order to address these questions, we ran extensive coarse-grained molecular dynamics (MD) simulations of nucleosome assembly under different conditions. We consider a simple system made of the strong nucleosome-positioning 601 DNA sequence , a H3/H4 tetramer and two H2A/H2B dimers. By analyzing our MD trajectories with Markov state modeling , we reveal a complex kinetic landscape characterized by many metastable structures, some of which correspond to previously proposed non-canonical nucleosomes. Our simulations show that A/T nucleosome positioning signals on the 601 DNA sequence direct the binding of the H2A/H2B dimer on the optimal DNA region, so that the dimer is then poised to easily bind the H3/H4 tetramer interface. This effect, which is observed only on the A/T-rich side of the 601 DNA, reveals a highly sequence-dependent assembly process, suggesting a potential mechanism allowing genomic sequences to directly control nucleosome organization in vivo . We also investigated the influence of salt on the kinetics, finding that successful nucleosome assembly from a tetrasome is faster at low salt concentration, but that the system is also more likely to remain stuck in off-pathway kinetic traps. This explains the necessity to assemble nucleosomes via salt dialysis , or via the addition of chromatin remodelers . The assembly/disassembly mechanism itself is also a function of salt concentration: while at low or intermediate salt the H2A/H2B dimers bind/unbind to/from the H3/H4 tetramer co-operatively with DNA wrapping, unbinding occurs only after DNA unwrapping at high salt. Overall, our results establish clear mechanistic relations between environmental conditions, DNA sequence, and the dynamics of nucleosome assembly.
In our MD simulations, the DNA is modeled using the 3SPN2.C coarse grained model , where each nucleotide is represented by three beads centered on the base, sugar and phosphate groups. The histones are modeled according to the AICG2+ structure-based model , where each residue is represented by one bead centered on the Cα atom. This and similar coarse-grained models have been successfully applied to study complex processes such as nucleosome sliding [28–31], unwrapping [7,32,33], and assembly , at a fraction of the computational cost required for all-atom simulations [35–39]. Notably, the 3SPN2.C model can capture the sequence-dependent elasticity of DNA , and it has been shown to correctly predict the affinity of nucleosome formation for different sequences , making this model suitable to investigate the effect of sequence on the assembly kinetics. Further details on the coarse-grained model and its parametrization are provided in the Methods section.
Our simulation system consists of the 147-bp nucleosome-positioning 601 DNA , one histone H3/H4 tetramer, and two copies of histone H2A/H2B dimers. The 601 sequence was chosen for its widespread use in experiments and for their characteristic positioning motifs asymmetrically distributed around the nucleosome, allowing us to study their role during assembly. Specifically, 10-bp periodic A/T base steps are more frequent on the left side of 601 (called the beta side in Ref. , see Fig 1A), leading to enhanced nucleosome unwrapping on the right side relative to the left (i.e. asymmetric unwrapping) [11,40].
We performed nucleosome MD simulations for 108 MD steps at 400 mM mono-valent ion concentration starting from the tetrasome conformation, in which the H3/H4 tetramer is bound on the 601 DNA at the optimal position and both of H2A/H2B dimers are unbound from the tetramer but bound on DNA at different positions depending on the simulation run (see the left snapshot in Fig 1B as an example). Out of 215 runs, we found only one trajectory that reached to the complete nucleosome, which is shown in Fig 1B and S1 Movie. As highlighted by the timeline of the sum of the radii of gyration of the left and right DNA sections (RgL+RgR ), in this successful trajectory assembly occurs in two steps: firstly, the right half of DNA bends together with the binding of one H2A/H2B dimer to the tetramer (central cartoon in Fig 1B), then, the left half of DNA folds concomitantly with the assembly of the other H2A/H2B dimer to form the complete nucleosome (right cartoon). In Fig 1C, the same trajectory is projected on the 2-dimensional free energy landscape (obtained by the Markov state model as described below) along the left and right radii of gyration of DNA. The free energy landscape also shows that the left side of the 601 DNA bends more easily (i.e., with a lower free energy cost) than the right side; this asymmetry is discussed later in the results. Other trajectories stopped at various apparently metastable states, due to the limitation of the simulation time.
Starting from the same tetrasome conformations used at 400 mM, we also performed 215 simulations at 200 mM and 300 mM, finding respectively 20 and 7 cases of complete nucleosome assembly (a representative trajectory at 200 mM is shown in Fig 1C). We will discuss the salt concentration dependence in more details later in the results.
In order to characterize the complete landscape and kinetics of nucleosome assembly and disassembly, we generated a Markov state model (MSM) of nucleosome dynamics using 1000 independent simulation runs of 108 MD steps each at 400 mM salt, starting from a diverse set of partially or fully assembled nucleosomes (215 of these are the same described in the previous section, see Methods for more details). The salt concentration of 400 mM was chosen so that neither the fully disassembled nor the fully assembled nucleosome conformations are too much preferred over the others, facilitating the characterization of the dynamics. To build an MSM, we first divide the conformational space into a set of M discrete microstates, each corresponding to a group of similar conformations, and then we learn from the unbiased MD trajectories the M x M transition probabilities to go from each state to another after a certain time interval, the so-called lag-time. Observing either complete assembly or disassembly events in a single trajectory is extremely rare, but the Markov state modeling approach allows us to combine many trajectories together to fully characterize the kinetics of the system, identifying the main long-lived conformations and the dominant pathways of nucleosome assembly. We generated an MSM with M = 400 microstates and a lag-time of 4x106 MD steps (see Methods section for more details, and S1 Fig for the convergence of the MSM timescales).
In order to aid the interpretation of the MSM, we can use the PCCA clustering method  to further group the 400 microstates into a much smaller number of metastable basins. PCCA provides a hierarchical representation of the model by clustering the conformations into more and more basins by considering progressively faster relaxation processes within the system (associated to the eigenvalues and eigenvectors of the MSM). The simplest, yet meaningful, representation of our system can be obtained using just 4 basins. With this choice, the PCCA method automatically identifies the 4 states that are commonly observed in experiments: the tetrasome, two hexasomes, and the complete nucleosome (representative conformation in S2 Fig). S2 Fig shows that the 4 basins are well separated onto a 2-dimensional projection defined by the two slowest coordinates derived from time-lagged independent component analysis  (TICA, see Methods for their definition), further indicating the metastability of these 4 states.
However, we can extract more information from our MSM by characterizing a higher number of PCCA basins and the transition probabilities between them. We decided to consider 11 PCCA basins, since this is high enough to provide interesting insights into nucleosome dynamics and low enough to still make the model interpretable. We obtained the transition probabilities between these 11 basins from a coarse-grained hidden Markov state model derived from the full MSM . Fig 2 summarizes the resulting kinetic landscape of nucleosome assembly and disassembly at 400 mM, displaying the identified 11 metastable basins and transition probabilities between these basins (in S3 Fig, we visualize the basins along various combinations of TICA projections). Among the 11 basins, we find one state corresponding to the fully assembled canonical nucleosome (N), seven hexasome states (HX) and three tetrasome states (T, TL and TR). The three tetrasome states are distinguished by the positions of the unbound H2A/H2B dimers. In the T state one dimer is on the left side of the unwrapped DNA while the other dimer is on the right side, remaining close to the target H3/H4 interface. On the other hand, in the TL and TR states both unbound dimers interact solely with either the left or right side of the unwrapped DNA, respectively. The seven hexasome structures are divided into two types, depending on whether one H2A/H2B dimer is bound to the left (HL0, HL1 and HL2) or the right (HR0, HR1, HR2 and HR3) binding interface of the H3/H4 tetramer (respectively wrapping the left or the right sides of the 601 DNA in the canonical nucleosome). Each of these two hexamer types can adopt different conformations depending on the location of the unbound H2A/H2B dimer. In the left hexasome HL0, the unbound dimer is either far away or it interacts only weakly with nucleosomal DNA. In HL1, the unbound dimer is located on the right side of the DNA relatively close to its optimal location in the complete nucleosome. In HL2, the unbound dimer bridges two DNA gyres while facing outside of the nucleosome, in a configuration that cannot easily lead to assembly. HR0, HR1 and HR2 are the respective right hexasome versions of those just described. Finally, in hexasome HR3 a H2A/H2B dimer binds to the right interface of the H3/H4 tetramer, but it is stabilized by the wrapping of the left side of the 601 sequence. This metastable configuration is not found on the opposite side (there is no HL3 basin), suggesting that it originates from the marked propensity of the left side of 601 to bend. Notably, in the HR3 hexasome the DNA follows a right-handed super-helix, contrary to the canonical form that follows a left-handed one. For each of the 11 identified metastable states, we deposited as S1 Data 4 representative coarse-grained nucleosome configurations in PDB format (in S1 Data, which also contains the equilibrium probabilities and the transition probabilities between the basins). Researchers interested in investigating sub-nucleosome conformations in all-atom MD may use these as starting configurations following a previously-developed back mapping procedure .
Our MSM also reveals a clear asymmetry between the left and right side of the 601 sequence. For example, the left hexasome state HL1 has an equilibrium probability about 20 times higher than its right counterpart HR1 , consistent to what observed in past experiments on hexasomes . Kinetics itself is also highly asymmetric: binding always occurs with a higher rate on the left side of 601, whereas unbinding occurs with a higher rate on the right (Fig 2), as observed in past experimental studies . For instance, a tetrasome T transitions into a left hexasome HL1 after 4x106 MD steps with probability of ~3x10-3, one order of magnitude greater than the probability to observe a T to HR1 transition (~2x10-4). Similarly, the rate of nucleosome formation is ten times faster when this last assembly step involves H2A/H2B binding on the left side (i.e., from HR1 to N).
What is the molecular origin of this kinetic asymmetry? As shown in Fig 1A, the left side of 601 contains periodically spaced A/T base steps at most 5+10n positions from the dyad, whereas on the right side this periodic pattern is much weaker. The importance of the periodic motifs in nucleosome folding stems from the intrinsic bending of the DNA helix, which lowers the energy cost of wrapping around histones [33,40,46]. Indeed, when the histone octamer is assembled (state N), the left side of the 601 DNA has a much higher probability to be fully wrapped than the right side: Fig 3A shows how unwrapping DNA from the left side of 601 (the A/T-rich one) costs ~4 additional kB T of energy compared to unwrapping from the right side (the A/T-poor one), similar with what found in another coarse-grained study . Furthermore, even the distributions of H2A/H2B dimer positions along DNA when they are unbound from the tetramer display strong differences (Fig 3B): the left distribution has two sharp peaks separated by 10 base pairs, one of which corresponds to the location found in the left hexasome and complete nucleosome states, while H2A/H2B positioning on the right side of 601 is much more uniform. This shows that the sequence motifs on the left side of 601 (highlighted again in Fig 3B) facilitate the positioning of the H2A/H2B dimer so that it is poised for a successful binding with the H3/H4 tetramer.
The analysis of transition pathways  (S4 Fig) in the coarse-grained MSM of the 11 long-lived states shows that two pathways make up most of the successful assembly transitions between the tetrasome T and the full nucleosome N: 66% of the pathways go through the more stable left hexasome HL1 (T➔HL1➔N), while 19% go through the less stable right hexasome HR1 (T➔HR1➔N). Therefore, all other hexasome conformations (HL0, HL2, HR0, HR2 and HR3) do not significantly contribute to successful assembly and may be therefore considered off-pathway kinetic traps.
The above landscape of nucleosome assembly and disassembly was obtained at an intermediate ionic strength (400 mM). However, an interesting question is how the assembly and disassembly processes are affected by changes in salt concentration. As described before, our assembly simulations from the tetrasome configurations at 200 mM, 300 mM, and 400 mM salt resulted in a complete nucleosome assembly in 20, 7 and 1 cases out of 215 runs, suggesting that low salt favors assembly. This is reasonable, since nucleosome are stabilized by protein-DNA electrostatic interactions. However, it does not explain the observation that the optimal procedure to assemble nucleosomes in vitro is by slowly decreasing the salt concentration from high to low values , and otherwise chromatin remodelers are required to successfully complete assembly . One possibility is that at low ionic strength the kinetic landscape becomes more rugged, and the system struggles to escape from off-pathway traps. Indeed, Fig 4A indicates that although lower salt concentration increases the probability to reach the complete nucleosome, it also significantly increases the probability to reach the kinetic traps identified from the MSM (HL0, HL2, HR0, HR2 and HR3).
To further study this effect, we run for each ionic strength 100 107-steps simulations starting from a HR2 hexasome conformation, one of the traps that do not participate in the assembly, where the unbound H2A/H2B dimer bridges both left and right sides of DNA. Consistently with our hypothesis, the time required to escape from the initial state increases as we lower the salt concentration (Fig 4B). This analysis was performed by plotting the MD trajectories along the slowest TICA coordinate (see Methods for more details).
During the successful assembly trajectories at 200 mM and 300 mM salt, we also found evidence of new metastable states that are not clearly captured by our MSM at 400 mM. In these conformations the nucleosome is already very compact but partially opened: due to the absence of DNA bending at super-helical location +/-3 (i.e., 3 DNA helical turns away from the dyad), one or both H2A/H2B dimers are unbound from the tetramer and oriented at a wider angle relative to the canonical conformation (see snapshot in Fig 4C). During our trajectories, we found this angle to vary between ~10 and ~50 degrees (the angle was computed based on the orientation of the longest helices in histones H2A and H4). PDB coordinates of representative partially opened conformations (part of the HL, HR or T states) have been deposited in S1 Data together with the other metastable states defined from the MSM. To further investigate their properties, we run 100 107 -steps simulations starting from the partially opened right hexasome at 200, 300, and 400 mM salt. Fig 4C shows that, similarly to the HR2 state, the escape from the partially opened state becomes slower as we decrease the salt concentration, presumably because this conformation is stabilized by electrostatic interactions of the H2A C-terminal and H2B N-terminal histone tails bridging the two DNA gyres. Furthermore, at 300 mM we find that starting from a partially opened hexasome conformation the probability to transition either into the complete nucleosome or into a more open hexasome is about 50%, similarly to a transition state (within 5x106 MD steps, we observe complete assembly events in 269 runs out of 400 starting from the left partially opened state, and in 156 out of 400 from the right one). Finally, we note that all the evidence points to these partially opened nucleosomes being the same as those identified in a recent FRET study . In both experiments and in our simulations, 1. the angle between the open H2A/H2B dimer and the tetramer very similar (a value of ~20 degrees was suggested from experiments), 2. the stability of these states decreases with increasing ionic strength, and 3. they are intermediates along the nucleosome assembly pathways. In order to test the accuracy of our predictions, it is also useful to compare the transition rates between complete nucleosomes and opened states, which are also available from FRET experiments . In principle, one MD timestep in our simulations equals 15 fs; however, coarse-graining smoothens the free energy landscape, speeding up the dynamics, so we map 1 MD timestep to 30 ps, based on the speedup observed in our previous work on nucleosome sliding using the same model  (a similar speedup was also observed in coarse-grained simulations of protein folding ). Experimentally, at low salt concentration, the characteristic times for the transitions (inverse rates) from the complete nucleosome to the partially opened state and back have been estimated to be respectively >50 ms and >3 ms, with an equilibrium constant of 0.06. From the simulations at 300 mM, we estimated the inverse rates to be respectively ~5x108 MD steps = 15 s and ~106 MD steps = 30 ms (see Methods), consistent with the lower bounds from experiments, and an equilibrium constant of ~0.002, somewhat underestimating the stability of the partially opened state.
In order to explore whether the disassembly mechanism is also affected by the salt concentration, we run additional 108 -step simulations starting from the complete nucleosome at 500, 600, 700, and 800 mM salt (30 trajectories each). As expected, nucleosomes become unstable at high salt, with frequent DNA unwrapping and dimer unbinding. Interestingly, disassembly pathways appear qualitatively different at intermediate (400 mM) and high (800 mM) salt: while in the former case H2A/H2B unbinding from the tetramer proceeds cooperatively with the unwrapping of DNA, in the latter case DNA unwrapping anticipates histone unbinding (Fig 5). This trend is in qualitative agreement with the experimental literature: the cooperative transition is consistent with the recent results obtained by FRET at intermediate salt , while the non-cooperative behavior was suggested by a combination of SAXS and FRET at very high 1.9 M salt concentration .
In conclusion, our coarse-grained MD simulations revealed the complex landscape of nucleosome assembly and its modulation by salt and DNA sequence. Firstly, we used Markov state modeling to summarize the assembly dynamics on 601 sequences at 400 mM salt concentration (at which the assembly is reversible). Our MSM highlighted, apart from the expected canonical nucleosome, tetrasome and hexasome states, several long-lived configurations characterized by alternative interactions between histones and DNA. Interestingly, in one of the right hexasome states (HR3 ), a canonical histone hexamer wraps the DNA in a right-handed way. Right-handed nucleosomes can form in experiments after the application of positive DNA supercoiling [48,49] and are also found in vivo within centromeres . However, while past works investigated how such change in DNA handedness could originate from an equivalent change in the arrangement of the histone folds [48–50], our simulations predict that these right-handed structures can also form via an alternative wrapping of the DNA around a canonical histone hexamer, without requiring a complex rearrangement of histones . Since this state is found only on one side of 601, the simulations further suggest that right-handed nucleosomes are stabilized by specific DNA sequences. Torsional stresses, such as those generated by the passage of RNA polymerase, are also expected to affect the stability of various nucleosome configurations, as found in experiments on nucleosomes with super-coiled DNA , and in a recent MD study that investigated the effect of torsion on nucleosomal DNA unwrapping .
Our simulations also provide fresh insights into asymmetric nucleosome dynamics on 601 DNA. This sequence is characterized by several nucleosome positioning motifs, namely A/T base steps periodically spaced every 10 bp, which favor the wrapping of DNA around histones. However, these motifs are asymmetrically distributed along the sequence, favoring H2A/H2B dimer assembly on the left side of 601 by about one order of magnitude. Asymmetric disassembly of nucleosomes along 601 was observed in many experiments [11,19,40]; our MD simulations revealed that DNA sequence motifs promote assembly by directing the binding of H2A/H2B dimers towards the H3/H4 target, an effect originating from the co-operativity between histone binding and DNA bending at low salt concentration. We suggest that by promoting the assembly of nucleosomes directly at highly flexible genomic regions, while avoiding stiff DNA regions such as poly-A tracts , the sequence dependence in the rates of assembly may also favor the maintenance of nucleosome positioning in vivo after disruptive events such as replication. The importance of the sequence-dependent shape and flexibility of DNA for protein-DNA recognition  and chromatin organization  has been well investigated in the literature, but our simulations further reveal previously unappreciated dynamical effects. In the future, we plan to investigate the dynamics of nucleosomes on real genomic sequences to further explore these ideas. For example, yeast promoters could be an ideal target, since yeast nucleosome positions have been mapped with base-pair accuracy , and the relationship between nucleosome organization and DNA flexibility along the genome has been recently investigated experimentally . Studying a large number of sequences at the same time should also allow us to derive general features of the sequence-dependent nucleosome dynamics that are not evident from the analysis of only one or few sequences.
Finally, our simulations highlighted how salt dependence can have a large effect on the kinetics of the system. The MSM generated at 400 mM salt shows that successful nucleosome assembly pathways are relatively simple: the dominant pathway from the tetrasome (T) to the complete nucleosome conformation (N) goes through a single intermediate hexasome state with a H2A/H2B dimer bound to the H3/H4 tetramer on the left side of 601 (HL1), while the second dominant pathway goes through the opposite intermediate hexasome (HR1 ) where the dimer is bound to the tetramer on the right side of 601. All other metastable states do not significantly participate into the assembly process, effectively acting as kinetic traps slowing down nucleosome formation. While low salt favors assembly, we find that it also stabilizes off-pathway kinetic traps, explaining the importance of salt dialysis for the successful assembly process . Under some conditions, experiments found that histones and DNA form non-canonical pre-nucleosome structures that require chromatin remodelers to be converted into the canonical forms . We note that the experimental observation that pre-nucleosomes may form on just ~80 bp of DNA  mirrors our simulation results showing that metastable non-canonical nucleosomes generally display less interactions with DNA than those found in the canonical form. Based on our findings, we predict that pre-nucleosomes should resemble the haxasome states HL2 and HR2, where the unbound H2A/H2B dimer bridges the two opposite ends of nucleosomal DNA to prevent successful assembly unless some large rearrangement first occurs.
Simulations at 200 and 300 mM also revealed critical intermediate states along the successful pathways of nucleosome assembly: here the nucleosome is almost completely formed, but one H2A/H2B dimer does not yet make direct contacts with the H3/H4 tetramer, with the two interfaces at a wider angle relative to the canonical nucleosome. These partially opened nucleosomes where also recently identified in a series of FRET experiments . Simulations indicate that these partially opened states are roughly halfway along the nucleosome pathways at low salt (i.e., they are transition states); in virtue of this, modulating the stability of this state, for instance by histone chaperones or epigenetic modifications, should have a large effect on assembly and disassembly rates, suggesting its potential role in vivo . Again, we found that the formation of these structures does not require any conformational change within histones (which may nevertheless play a role in nucleosome dynamics ), but are instead stabilized by non-specific electrostatic interactions between histone tails and DNA. In addition to conformational stability, salt concentration also affects the assembly/disassembly pathway itself: while at low and intermediate salt the H2A/H2B dimer binds the H3/H4 tetramer co-operatively with DNA bending (so that DNA sequence can directly affect kinetics), at high salt these two processes are uncoupled, with DNA unwrapping anticipating dimer-tetramer dissociation. These results recapitulate past experimental observations under various conditions [11,19].
Overall, using MD simulations in combination with the MSM analysis, we made a series of important predictions on nucleosome dynamics: some of them already verified in experiments (the intermediate partially opened nucleosome state, the asymmetric dynamics on 601, and the salt-concentration-dependent assembly pathways), and others waiting to be tested in the future (the detailed conformations of off-pathway nucleosome states such as pre-nucleosomes and right-handed nucleosomes, and the role of sequence-dependent nucleosome kinetics for chromatin organization). The position coordinates of the metastable nucleosome conformations deposited in S1 Data will also represent a useful resource for researchers interested in further investigating nucleosome assembly, for instance by aiding the design of FRET experiments , or by providing starting configurations for all-atom MD simulations through a back-mapping procedure . Finally, we envision that our coarse-grained model will be of great use to characterize nucleosome assembly in more realistic scenarios typical of in vivo experiments, for example in the presence of RNA polymerase , histone chaperones , and remodelers .
To enable the complete study of nucleosome assembly and disassembly, we utilize coarse-grained molecular modeling: we map each amino-acid to a single bead , and each nucleotide to three beads corresponding to base, sugar, and phosphate groups . The functional potentials and force-field parameters are chosen according to the AICG2+ structure-based model for histones  and the 3SPN2.C sequence-dependent model for DNA . The flexible histone tails (residue ids 1–32 for H3, 1–23 for H4, 1–14 and 121–128 for H2A, 1–26 for H2B) are modeled according to a sequence-dependent statistical local potential specifically designed for disordered regions, which was derived by Boltzmann inversion from the distribution of angles and dihedral angles observed in the non-structured subset of the Protein Data Bank . The potential of the remaining folded regions is based on the native structures of the first copies of each H3/H4 and H2A/H2B dimers present in the nuclesome crystal structure with PDB id 1KX5 (using a single reference for each pair of dimers ensures the symmetry of the histone octamer, which is important to study the asymmetry induced by DNA sequence). Whenever two protein residues i and j are within a certain cutoff distance in the native structure, the coarse-grained potential includes a contact interaction that favors the binding of the two residues :
Where εij is a contact-dependent energy determined from the all-atom reference structure and r0ij is the native distance between the residues. In addition, the protein potential also includes bond, angle and dihedral interactions with equilibrium values set according to the native structure, and non-native excluded volume interactions proportional to the 12th power of the distance.
Experimentally, in the absence of DNA or at high salt the histone octamer breaks into a H3/H4 tetramer and two H2A/H2B dimers [11,19,60], but using the default AICG2+ potential we do not observe histone octamer breakage under these conditions. This happens because the default AICG2+ scaling factor defining the overall strength of the residue-residue interactions was optimized for studying the folding of single-domain proteins, but in our case the attraction between the H3/H4 and H2A/H2B dimers is over-estimated. In order to find a suitable scaling factor for the native interactions between H3/H4 and H2A/H2B residues, we scanned different values performing MD simulation at either low salt (200 mM), at which we expect our nucleosomes to be stable, or high salt (800 mM), at which we expect disassembly. We find that scaling factors lower than 0.2 lead to unstable nucleosomes even at low salt, whereas for values higher than 0.4 we do not observe disassembly at even at high salt. Rescaling the dimer-tetramer interactions by a factor of 0.3 reproduces the expected behavior of nucleosomes: out 30 independent simulation runs of 108 MD steps starting from a fully assembled nucleosome, we observed no disassembly events at 200 mM salt, whereas we observed disassembly in 4 out of 30 runs at 800 mM; as shown by the MSM reported in the Results section, nucleosome assembly is reversible at intermediate 400 mM salt, with comparable equilibrium populations of tetrasomes and complete nucleosomes. This critical concentration at which nucleosomes disassemble is somewhat lower than the value observed in experiments, which is around 600 mM ; however, taking into account the high computational cost of the parameterization, we regard the agreement between experiments and simulations satisfactory, and sufficient to provide reliable insights into the nucleosome assembly dynamics. We use the dimer-tetramer scaling factor of 0.3 in all simulations reported in this study.
Histones and DNA interact via excluded volume, Debye–Hückel electrostatics, and hydrogen bond interactions. The hydrogen bonds are modeled according to a distance- and angle-dependent potential based on nucleosome crystal structures, and was calibrated based on nucleosome structural stability and DNA unwrapping . The size of the beads to define excluded volume interactions, and the parameter settings for the hydrogen bonds are the same used in our previous works [28–30]. The charges on the histone residues are determined via the RESPAC method , via which we optimized the coarse-grained Debye–Hückel electrostatic potential against the one obtained by solving the full Poisson-Boltzmann equation around the all-atom histone core structures, similarly to what done in other coarse-grained models of the nucleosome  (the optimization was performed at 100 mM salt, but the resulting charges are very robust this choice). The RESPAC procedure was applied to the globular part of the H3/H4 tetramer and the two H2A/H2B dimers, whereas for the flexible tails we used the standard integer charges on the Lysine, Arginine, Aspartic and Glutamic acids residues only.
All reported molecular dynamics (MD) simulations have been performed with the software CafeMol  by integrating the equations of motion using Langevin dynamics at 300 K with a timestep of 0.3 CafeMol time units (corresponding to 15 fs). During the MD simulations, we apply harmonic restraints on the structured region of the two H3 histones (0.001 kcal/mol/Å2) and on the phosphate beads of the DNA dyad residues (bp id 73, 0.1 kcal/mol/Å2 ), limiting large-scale sliding of DNA relative to the H3/H4 tetramer (but not the unbound H2A/H2B dimers). The nucleosome sliding mechanism was already analyzed in previous studies [28,29], and especially for the 601 positioning sequence considered here can be extremely slow. In our simulations, the H3/H4 is always placed at its optimal experimental position on the DNA sequence; preventing its sliding allows to avoid exploring less favorable positions and to focus solely on the assembly dynamics, which is our main focus. Finally, the system is enclosed into a spherical volume with a radius of 20 nm by a harmonic potential. CafeMol input files to run nucleosome assembly simulations are provided in the S2 Data.
The starting conformations for the 1000 MD simulation runs at 400 mM used to generate the MSM have been obtained during high-salt (800 mM) simulations starting from complete nucleosomes, and then selecting 1000 timeframes that include tetrasomes (215 conformations), left and right hexasomes (105 and 118 conformations, respectively), and nucleosomes (562 conformations). To quantify the salt-dependence of the nucleosome kinetics, we also performed, at both 200 and 300 mM salt, 215 108-steps simulation runs starting from the 215 tetrasome conformations. To analyze the escape from 2 metastable states (either the HR2 hexasome or the partially opened state) as a function of salt concentration, we run 100 independent 107-steps trajectories starting from the same conformation (either HR2 or partially opened HR1 ) at 200, 300, and 400 mM salt. In order to characterize the time necessary to escape from these states (analysis in Fig 4), we projected the molecular positions onto the slowest TICA motions  within each basin. The TICA projection was performed separately for the HR2 and partially opened metastable basins based on the positions of H2A residues 46 to 73 of the unbound dimer and the left side of DNA (the one interacting with the unbound H2A/H2B dimer) during the simulations at 300 mM salt. The inverse transition rate from the partially opened state to the complete nucleosome at 300 mM was estimated from the 100 simulations starting from the partially opened state by fitting an exponential curve to the survival probability to remain in the initial state before transitioning into the complete nucleosome, while the inverse transition rate for the opposite process was obtained from 100 107-MD-steps simulations starting from the complete nucleosome, during which we observed two transitions into the partially opened state.
To generate the Markov state model of nucleosome assembly at 400 mM salt from our 1000 MD simulations, we first have to cluster the configurations into discrete states based on a distance between configurations defined on a suitable space. To this aim, we project the particle positions onto a series of collective variables that capture all the relevant modes of nucleosome dynamics: H2A/H2B dimer binding to DNA and to H3/H4, DNA wrapping, and DNA handedness. Since the two H2A/H2B dimers are identical, we generated variables that are symmetric under an exchange of these two dimers. In principle, the two H3/H4 binding interfaces are also identical, but the 601 DNA that is wrapped around the tetramer is not symmetric, so we do want to distinguish between the left and right H3/H4 interfaces, corresponding respectively to the dimer binding the left or the right side of the 601 DNA. We defined 5 types of collective variables (for a total of 4+4+2+2+1 = 13 collective variables):
Starting from these 13 collective variables, we aggregate all 1000 MD trajectories and perform a further dimensionality reduction using time-lagged independent component analysis (TICA)  as implemented in PyEMMA 2  using a lag-time of 106 MD steps. We then select the top 8 slowest coordinates (which explain 95% of the kinetic variance in the data) and cluster the conformations into 400 discrete states using k-means . The MSM was then generated by PyEMMA 2  using a lag-time of 4x106 MD steps. Further increasing the lag-time does not lead to significant changes in the implied time scales of the MSM, indicating the robustness of the results (S1 Fig). The TICA co-ordinates are also ideally suited for visualization purposes: in S2 Fig we show the first two coordinates are sufficient to separate the system into 4 regions corresponding to nucleosomes, left and right hexasomes and tetrasomes (more TICA coordinates can further separate into more metastable basins with faster transitions among them, S3 Fig).