Protein folding is crucial for normal physiology including development and healthy aging, and failure of this process is related to the pathology of diseases including neurodegeneration and cancer. Early thermodynamic and kinetic studies based on the unfolding and refolding equilibrium of individual proteins in the test tube have provided insight into the fundamental principles of protein folding, although the problem of predicting how any given protein will fold remains unsolved. Protein folding within cells is a more complex issue than folding of purified protein in isolation, due to the complex interactions within the cellular environment, including post-translational modifications of proteins, the presence of macromolecular crowding in cells, and variations in the cellular environment, for example in cancer versus normal cells. Development of biophysical approaches including fluorescence resonance energy transfer (FRET) and nuclear magnetic resonance (NMR) techniques and cellular manipulations including microinjection and insertion of noncanonical amino acids has allowed the study of protein folding in living cells. Furthermore, biophysical techniques such as single-molecule fluorescence spectroscopy and optical tweezers allows studies of simplified systems at the single molecular level. Combining in-cell techniques with the powerful detail that can be achieved from single-molecule studies allows the effects of different cellular components including molecular chaperones to be monitored, providing us with comprehensive understanding of the protein folding process. The application of biophysical techniques to the study of protein folding is arming us with knowledge that is fundamental to the battle against cancer and other diseases related to protein conformation or protein–protein interactions.
Newly synthesized polypeptide chains need to fold correctly to form the native protein structure in order to carry out their cellular function . Under normal circumstances, the native protein structure is usually the thermodynamically most stable state, although some proteins maintain an unfolded or partially unfolded state to exert their function in cells [2,3]. The collapse of the polypeptide chain is driven by the hydrophobic effect whereby ordered water molecules are excluded, and new interactions can then form between individual amino acid side chains . In vitro folding studies indicate that most cytosolic proteins are only marginally stable and are in dynamic equilibrium with unfolded or partially folded states, meaning that a protein may unfold and refold many times during its life cycle . Folding and conformational changes of proteins due to interaction with other cellular components, including other proteins, cofactors, metabolites, or nucleic acids, are very common in cells . Anfinsen demonstrated that a small protein can fold spontaneously in the test tube . However complex multi-domain proteins often cannot fold spontaneously to the correct native conformation . There are many factors that can affect the folding process, including solution conditions such as, pH and ionic strength, as well as post-translational modifications and macromolecular crowding in cells [5,8]. To avoid aggregation of folding intermediates and avoid energetic traps during folding, molecular chaperones and folding catalysts assist protein folding through different mechanisms [9,10]. An ATP-independent holdase function or an ATP-dependent remodeling function are the two common types of assistance provided by molecular chaperones .
Cancer is an example where disease initiation and progression are often related to the failure of protein folding and quality control [12–14]. Cancer cells are hallmarked by abnormally high proliferation and migration supported by increased metabolism and an enhanced protein quality control system [13,14]. There is often an increase in production of nascent polypeptides, more frequent mutation, and raised levels of reactive oxygen species (ROS), as well as other changes to the cellular environment that challenge the protein folding process in cancer cells [13,15]. Dependence of protein folding in cancer cells on chaperones is indicated by higher expression levels of some chaperones, such as Hsp70 and Hsp90 which have been identified as drug targets for cancer therapy [16,17]. The study of protein folding in cancer is revealing further details of cancer mechanisms and provides new avenues for cancer therapy, and it is valuable to study protein folding in different types of cancer cells and tissues to identify both common characteristics and notable differences. However non-invasive in-cell study is still challenging and so study of the mechanisms by which chaperones and other cellular components affect protein folding carried out using purified protein in simplified environments also provide important clues for the development of anti-cancer drugs . Ligands including drugs often alter the folding, stability and conformation of a protein . Thus changes in protein folding resulting from the introduction of a drug could indicate an interaction between the drug and protein, and identifying drug-binding targets on a proteomic scale based on protein folding status could provide critical information for understanding the mechanism of the efficacy of the drug and possible side effects [20,21]. Protein folding studies provide a very important foundation to understand and overcome cancer. In the recent 20 years biophysical approaches including fluorescence resonance energy transfer (FRET), optical tweezers, nuclear magnetic resonance (NMR) and magnetic resonance imaging (MRI) have significantly extended the scope and depth of protein folding studies based on classical spectroscopic tools such as absorption, circular dichroism (CD) and intrinsic Trp fluorescence of proteins (Figure 1).
The recent trend in studying protein folding is to access the real scene in living cells and tissues. In-cell study of protein folding utilizes the accumulated knowledge regarding living cells as well as the development of new biophysical approaches. Till now most in-cell studies of protein folding have been performed in E. coli or cancer cell lines, which paves the way for further precise study of protein folding in cancer and other diseases.
Clinical magnetic resonance imaging (MRI) is based on imaging of signal contrast of protons from different living components. Chemical exchange saturation transfer (CEST)-MRI has been applied to detect protein states in living tissue. CEST-MRI exploits the spontaneous chemical exchange of protons to indirectly detect the solutes via the abundant water proton signal. CEST-MRI has been extended to a radiofrequency (RF) irradiation scheme at two different frequency offsets (dual CEST-MRI) to sense changes in the state of protein folding in solution and it shows potential application to detect the mobile fraction of the proteome in diverse pathologies . The global status of protein folding in the cancer cell line HepG2 was demonstrated using CEST-MRI of the relayed nuclear Overhauser effect (rNOE) signal which is linked to protein conformation . Signal changes within a 20-min 42°C non-lethal heat-shock were coincident with protein denaturation and aggregation processes induced by the heat shock, and the signal recovery after heat shock was consistent with recovery due to chaperone-induced refolding. This demonstrates the potential of CEST-MRI for monitoring pathological changes to protein folding in cancer and other diseases . While CEST-MRI provides a powerful tool to gauge the global status of protein folding in the cell, it is not suited to the precise study of the folding of individual proteins.
Individual protein studies can provide detailed and precise information about protein folding in cells. The main current biophysical approaches to reveal individual protein folding in situ are in-cell FRET and in-cell NMR. In-cell FRET relies on fluorescent labeling of the target protein to achieve a high signal to noise ratio. There are two basic strategies to deliver exogenous fluorescent probes into living cells: (1) expression of proteins with noncanonical amino acids or fusion expression with fluorescent proteins in cells [24,25]; (2) microinjection of protein labeled with a fluorescent probe (fusion with fluorescence proteins or cysteine labeling with maleimide-conjugated fluorescence dye) into living cells . FRET efficiency is dependent on the distance of two fluorescent labels, enabling FRET methods to explore protein conformational changes, dynamics, folding kinetics and protein interaction in living cells at bulk and single-molecule level while mitigating noise from the complex cellular background [26,27].
Expression of the fluorescent fusion protein GFP-RAF-YFP in HeLa cells for single molecule FRET (smFRET) study of conformational changes in RAF has been carried out . Alternative laser excitation (ALEX) which enables rapid switching between a donor (D)-excitation and an acceptor (A)-excitation laser to identify distinct emission signatures for all diffusing species was used to measure the native state of RAF while undergoing native interactions with other intrinsic proteins and the reaction network of the signal transduction pathway in live cells [25,28]. It was found that cytosolic RAF has at least three conformational states: the inactive closed form, the active open form and the inactive fully-open form. Spontaneous transitions between the conformational states were detected with epidermal growth factor (EGF) stimulation. The S621A mutation shifts the distribution of conformational states to the inactive fully-open form . Thus smFRET is applicable to detect conformational distribution changes resulting from intracellular interactions in live cells for other cytosolic proteins (Figure 2).
Membrane proteins undertake important tasks in cells and are major drug targets implicated in many diseases. However, studying the protein folding of membrane proteins has remained very challenging due to the difficulty in applying classical methods including absorption, CD and Trp fluorescence which are more suitable for water-soluble proteins. In recent years, the combination of new technology and development of new membrane-mimicking systems such as nanodiscs and cell unroofing methods has promoted the study of membrane protein folding . Using a noncanonical amino acid insertion strategy together with ACCuRET (Anap Cyclen-Cu2+ resonance energy transfer), maltose binding protein (MBP) was labeled in HEK293T/17 cells and used as a benchmark. Using cell unroofing, it was shown that ACCuRET can accurately monitor rearrangements of proteins in native membranes based on measurement of absolute distances and distance changes . Thus ACCuRET is applicable for measuring conformational dynamics of both soluble proteins and membrane proteins.
Protein NMR measurements rely on isotope labeling, and classical solution NMR methods are more suitable for small (lower than 30 kDa) proteins, requiring also a high concentration of protein . One challenge for in-cell NMR is to label target proteins in living cells with isotopes, typically 2H, 13C and 15N. Using isotope enriched culture medium to express an isotope-labeled protein has become routine in E. coli, but is still expensive and difficult to achieve in eukaryotic cells. As the endogenous expression level is generally low, an alternative strategy in eukaryotic cells is to microinject or translocate an isotope-labeled protein which is expressed in the E. coli expression system into the cytosol or nuclei of target cells [31,32]. The other labeling strategy is to express 19F labeled protein using noncanonical 19 F-amino acids in cells . Another challenge is the lack of sensitivity of in-cell NMR due to the low concentration of target protein in cells, the background noise, and reduced quality of the spectra due to monitoring within the in-cell environment. To improve the signal quality, 19F labeling is a good choice as it is a high-sensitivity isotope, and 19 F NMR spectra are virtually background-free as biological molecules do not contain fluorine atoms . Another solution is to optimize NMR detection protocols. Transverse relaxation optimized spectroscopy (TROSY) NMR methods enhance the sensitivity and allow the dynamics and interactions of large (up to 1 MDa) proteins or assemblies to be probed, extending the scope of NMR study . Fast pulsing methods also help maximize the signal-to-noise ratio . The improvement of the sensitivity enables in-cell NMR to measure protein folding in cells at near physiological expression levels (nanomolar to micromolar range) . Relaxation dispersion (RD) and saturation transfer (ST) methods can provide detail about the pathways of biomolecular processes, including transiently populated intermediates of protein folding, enzyme catalysis and binding . Rapid progress in solid-state NMR (ssNMR) and paramagnetic relaxation enhancement (PRE) techniques has also extended the scope of protein folding study in cells [36–38]. A typical in-cell NMR experiment is shown in Figure 3, where NMR spectra of different human superoxide dismutase 1 (SOD1) maturation states in human cells were measured to describe the complete post-translational maturation process of SOD1 [36,39].
Intrinsically disordered proteins (IDPs) have strong NMR signals and are suitable for in-cell NMR study. Preliminary in-cell NMR study of protein folding in human cell lines has involved IDPs, such as α-synuclein and Tau, which are implicated in neurodegenerative diseases. After NMR characterization of α-synuclein in E. coli cells , 15 N labeled α-synuclein was delivered into mammalian cells and an electroporation protocol was used to tightly control the cellular α-synuclein concentration resulting in a uniform distribution of the protein in the cytoplasm with minimal perturbation of cell viability . The conformation of α-synuclein in different cell lines, including neuronal B65 and SK-N-SH cells and RCSN-3 cells was compared by 2D 1H-15 N correlation NMR spectra. This study revealed some common features of α-synuclein in different cell lines: (1) The major conformation of the monomeric disordered α-synuclein was similar in different intracellular environments; (2) the N-terminus of α-synuclein undergoes acetylation in cells; (3) α-synuclein does not form stable interactions with cellular membranes; however N- and C-termini of α-synuclein transiently interact with the cytoplasmic components and/or membrane . Further elucidation of how oxidative stress alters the fate of α-synuclein inside the living cells was carried out by delivering 15 N-labeled, N-terminally acetylated, methionine-(Met1, Met5, Met116 and Met127)-oxidized α-synuclein into non-neuronal and neuronal cells . Time-resolved in-cell NMR demonstrated that Tyr125 phosphorylation was selectively impaired by the C-terminal methionine oxidation, suggesting that alteration of the cellular environment can selectively affect post-translational modifications of α-synuclein and, thus, potentially control its conformation, conformational landscape, and aggregation propensity . Isotope-enriched Tau was delivered into HEK-293T cells by electroporation and the NMR spectrum of Tau in living cells was acquired . It was found that Tau predominantly binds to microtubules (MT) at its MT-binding repeats in HEK-293T cells. It was also found that disease-associated phosphorylation of Tau was immediately eliminated once phosphorylated Tau was delivered into HEK-293 T cells, implying a potential cellular protection mechanism under stressful conditions .
In-cell NMR is also applicable for in situ study of membrane proteins. An ssNMR-based approach has been developed that is supported by dynamic nuclear polarization (DNP) to directly examine the structural and dynamic properties of epidermal growth factor receptor (EGFR) activation by the EGF in native membranes . A431 cells were cultured in the labeled medium to produce [13C, 15 N] labeled A431 membrane vesicles. The results show that the ligand-free state of the extracellular domain (ECD) is highly dynamic, while the intracellular kinase domain (KD) is rigid. Ligand binding restricts the overall and local motion of EGFR domains, including the ECD and the C-terminal region. It is suggested that the reduction in conformational entropy of the ECD by ligand binding favors the cooperative binding required for receptor dimerization, causing allosteric activation of the intracellular tyrosine . The accumulation of studies using in-cell NMR highlight the potential of in-cell NMR to study protein folding, structural dynamics and interactions at the residue level in neurodegenerative diseases, cancer and other diseases related to protein conformation.
It is currently difficult to obtain complete and detailed information about protein folding purely from in-cell study. In vitro study of purified proteins in solution still provides a useful approach to simplify the system of study allowing powerful deductions to be made, and increasingly complex in vitro systems have been developed to imitate the actual cellular environment. There are two attractive trends for in vitro protein folding study: (1) using cell extracts to imitate the actual cellular environment to extrapolate to protein folding in cells; (2) mechanistic study of chaperone-assisted protein folding.
Real-time NMR can provide high-resolution structural information alongside kinetic details of protein folding in a virtually continuous manner, not only for the characterization of folding intermediates but also for the investigation of the molecular mechanisms of assisted protein folding [45,46]. Cell-free protein production is still the best method to achieve high-throughput protein production and selective incorporation of isotope-labeled amino acids into a target protein. When the purified isotope-labeled protein is in the complex environment of native cells, or membrane extracts or tissue homogenates, they are visible by real-time NMR detection, providing residue-level resolution of structural and kinetic changes in conformation during folding, interaction, modification and bioreactions in an environment mimicking that of the cell . The kinase inhibitory domain of the cell cycle regulatory protein p27Kip1 (p27) is intrinsically disordered in isolation. Nuclear spin hyperpolarization using dissolution dynamic nuclear polarization (D-DNP) enabled the real-time observation of 13 C NMR signals during p27 folding upon binding to Cdk2/cyclin A on a time scale of several seconds . Time-dependent intensity changes are dependent on the extent of folding and binding, as manifested by differential spin relaxation. This study also followed a partially folded p27 intermediate by analysis of signal decay rates .
Mechanisms of chaperone-assisted protein folding has attracted particular attention in recent years as it is possible to manipulate protein folding in cells by adjusting molecular chaperone activity via addition of small molecules and this also opens a new avenue for disease therapy especially for cancer [48–50]. Hsp70s are ATP-dependent foldases with complex and enigmatic mechanisms, and their dynamics and allostery make them challenging to study . Technique development in FRET, NMR and optical tweezers is revealing increasing detail of the precise mechanism by which Hsp70s promote protein folding. Both conformational changes and folding mechanisms of Hsp70s themselves and Hsp70-assisted substrate protein folding have been explored in depth.
Using smFRET, the interdomain conformational heterogeneity and the kinetics of conformational changes of Hsp70 induced by ATP or the cochaperone Hsp40 was revealed . NMR studies have provided insight into the conformation of Hsp70s with high resolution in different allosteric states [53,54]. Using optical tweezers unfolding and folding of the bacterial Hsp70 homolog (DnaK) was monitored, and a folding nucleus and minimal ATP binding domain of Hsp70 has been identified , and bifurcating unfolding pathways for the substrate binding domain (SBD) of Hsp70 and mechanical hinge regions has been revealed (Figure 4) .
Combining single-pair FRET and hydrogen/deuterium exchange, the mechanism by which DnaK/DnaJ/GrpE accelerates folding of the multi-domain protein firefly luciferase (FLuc) was explained. Inter-domain misfolding was identified as the cause of slow folding, and DnaK binding causes expansion of the misfolded region and thereby resolves the kinetically trapped intermediates, with folding occurring upon GrpE-mediated release from DnaK, which commits a fraction of FLuc to fast folding, circumventing misfolding . NMR study revealed Hsp70 induced stabilization of the unfolded states of its substrates SRC homology 3 domain (SH3) and hTRF1, and DnaK binding directly affects the unfolded protein ensemble and fine tunes the folding landscape of the substrate proteins [58,59].
In a typical optical tweezers assay, the protein of interest is attached to beads for stretching experiments to monitor the unfolding and folding of an immobilized protein (Figure 4) [56,60]. Monitoring unfolding and folding of MBP using optical tweezers, it was shown that DnaK binds and stabilizes not only extended peptide segments, but also partially folded and near-native protein structures . Integrated optical tweezers monitored unfolding and folding of MBP with fluorescent-particle tracking shows that ClpB translocates both arms of the loop simultaneously and switches to single-arm translocation when encountering obstacles and substrates refold while exiting the pore, analogous to co-translational folding . Using optical tweezers to monitor unfolding and folding of the glucocorticoid receptor (GR), it was observed in detail the steps of folding and hormone binding of GR and identifying a structural element that opens and closes upon hormone binding, forming the basis for understanding GR activation and its regulation by chaperone proteins . A current limitation of optical tweezers is that it is only suited to in vitro study of protein folding.
The above studies demonstrate that biophysical techniques can provide detailed information about the structural and dynamic aspects of protein folding. The study of in-cell protein folding presents challenges, such as short-half lives and degradation of proteins in cells. Each of the techniques described above has strengths and limitations, and a full picture can only be achieved by the combination of multiple approaches. In-cell FRET and NMR require labeling of target proteins and optimizing the signal to noise ratio in a cellular background. To date, only a small fraction of the proteome has been probed by such techniques. In-cell FRET has been more widely applied to membrane proteins than cytosolic proteins. In-cell NMR is still most applicable to small monomeric proteins or intrinsically disordered polypeptide regions, as the relaxation time of the NMR signal is also affected by molecular crowding in the cell, leading to NMR signal attenuation. Careful interpretation of results is required when the techniques employed require overexpression of proteins at levels much higher than their natural cellular protein concentration. The delivery of exogenously labeled proteins can also perturb the natural cell environment, and this must be taken into account. Of the techniques discussed here, smFRET and optical tweezers are particularly powerful for the study of simplified systems at the single molecular level, while NMR studies provide high-resolution residue specific information of the ensemble of protein molecules. In future, the combination of these different biophysical approaches, including further breakthroughs in technique development, will be increasingly required in order to unravel the mysteries of the protein folding process.
|ACCuRET||Anap Cyclen-Cu2+ resonance energy transfer|
|ALEX||alternative laser excitation|
|CEST||chemical exchange saturation transfer|
|D-DNP||dissolution dynamic nuclear polarization|
|EGF||epidermal growth factor|
|EGFR||epidermal growth factor receptor|
|FRET||fluorescence (or Förster) resonance energy transfer|
|IDPs||intrinsically disordered proteins|
|MBP||maltose binding protein|
|MRI||magnetic resonance imaging|
|NMR||nuclear magnetic resonance|
|PRE||paramagnetic relaxation enhancement|
|rNOE||relayed nuclear Overhauser effect|
|ROS||reactive oxygen species|
|SBD||substrate binding domain|
|SH3||SRC homology 3 domain|
|smFRET||single molecular FRET|
|SOD1||superoxide dismutase 1|
|TROSY||transverse relaxation optimized spectroscopy|
The authors declare that there are no competing interests associated with the manuscript.
The authors acknowledge support from the Chinese Ministry of Science and Technology [2017YFA0504000] and the National Natural Science Foundation of China [31920103011, 31770829, 21673278].
H.Z. and S.P. conceived and wrote the review; all authors revised the review and approved the final version.