CORD-19:f212d6366a45726107a30bde1f4615c28cb5ce22 JSON TXT

REVIEW A decade and a half of protein intrinsic disorder: Biology still waits for physics Abstract The abundant existence of proteins and regions that possess specific functions without being uniquely folded into unique 3D structures has become accepted by a significant number of protein scientists. Sequences of these intrinsically disordered proteins (IDPs) and IDP regions (IDPRs) are characterized by a number of specific features, such as low overall hydrophobicity and high net charge which makes these proteins predictable. IDPs/IDPRs possess large hydrodynamic volumes, low contents of ordered secondary structure, and are characterized by high structural heterogeneity. They are very flexible, but some may undergo disorder to order transitions in the presence of natural ligands. The degree of these structural rearrangements varies over a very wide range. IDPs/IDPRs are tightly controlled under the normal conditions and have numerous specific functions that complement functions of ordered proteins and domains. When lacking proper control, they have multiple roles in pathogenesis of various human diseases. Gaining structural and functional information about these proteins is a challenge, since they do not typically "freeze" while their "pictures are taken." However, despite or perhaps because of the experimental challenges, these fuzzy objects with fuzzy structures and fuzzy functions are among the most interesting targets for modern protein research. This review briefly summarizes some of the recent advances in this exciting field and considers some of the basic lessons learned from the analysis of physics, chemistry, and biology of IDPs. A bit more than ten years ago, Protein Science published a review entitled "Natively unfolded proteins: a point where biology waits for physics" (Protein Sci 2002 11(4):739-756). 1 The major goal of that article was to bring an intriguing protein family of natively unfolded proteins (which are recognized now to constitute a subset of a very broad class of intrinsically disordered proteins, IDPs) out of shadow, to emphasize their lack of ordered structure under physiological conditions (at least ordered structure that could be detected by traditional low resolution techniques), to systemize their major structural properties, and to highlight their biological significance. The introduction of such biologically active but essentially unstructured proteins was used to challenge the hitherto dominant structure-centric viewpoint (structure-function paradigm), according to which a specific function of a protein is determined by its unique and rigid three-dimensional (3D) structure. The title of the review ("a point where biology waits for physics") was inspired by the observations that many of such "structure-less" proteins analyzed by that time acted as "binders" that did undergo at least partial folding after interaction with their binding partners. These observations provoked an idea that these biologically important proteins with little or no ordered structure have to wait to become more folded (and functional) as a result of binding to their specific partners. In other words, for these proteins, "biology," that is, the ability to have biological functions, seemed to wait for "physics" which is manifested in their ability to undergo binding-induced folding (at least partial), which is necessary to bring the functional state of these proteins to life. 1 At the beginning, the idea that structure-less proteins can be biologically active was taken as a complete heresy by many researchers, since it was absolutely alien to then dominated structure-function paradigm which represented a foundation of the long-standing belief that the specific functionality of a given protein is determined by its unique 3-D structure. This structure-function paradigm that describes reasonably well the catalytic behavior of enzymes was based on the "lock-and-key" hypothesis formulated in 1894 by Emil Fischer. 2 This viewpoint was solidified by the successful solution of X-ray crystallographic structures of many proteins (as of February 26, 2013 there were 81,922 protein structures in the Protein Data Bank, 3 with 72,761 of these structures being determined by Xray crystallography). These many crystal structures reinforced a static view of functional protein, where a rigid active site of an enzyme can be viewed as a sturdy lock that provides an exact fit to only one key, a specific and unique substrate. 4 Despite numerous limitations, this lock-and-key model was an extremely fruitful concept that was responsible for the creation of modern protein science. 1 Figure 1 (A) shows some of the most obvious scientific consequences of the application of structure-function paradigm which is deservedly placed at the center of the "Big Bang" model that gives rise to the protein science universe. 1 Obviously, the consideration of a protein as a rigid crystal-like entity is an oversimplification, since even the most stable and well-folded proteins are dynamic systems that possess different degrees of conformational flexibility. This is because of the simple fact that so-called conformational forces, that is, forces stabilizing the secondary structure of a protein and its tertiary fold, are weak and can be broken even at ambient temperatures due to thermal fluctuations. 4 The breaking of these weak interactions releases the groups that were involved in these interactions and gives them the possibility to be involved in the formation of new weak interactions of comparable energy. 4 Since these structural rearrangements are of relatively small scale and since they occur typically in a time scale that is faster than the time required for structure determination by X-ray crystallography and many other physical techniques, the 3-D structures of proteins determined by these techniques represent averaged pictures. 6 Furthermore, one should keep in mind that not all proteins structures which are deposited to PDB are structured throughout their entire lengths. Instead, many PDB proteins have portions of their sequences missing from the determined structures (so-called regions of missing electron density) 7, 8 due to the failure of the unobserved atom, side chain, residue, or region to scatter X-rays coherently caused by their flexible or disordered nature. Such flexible/disordered regions are rather common in the PDB, since only about 30% of protein structures deposited in the PDB do not have such regions of missing electron density. 9 In addition to ordered proteins possessing disordered regions of varying length, the literature contains numerous examples of biologically active proteins with flexible structures. 4 Therefore, there is another class of functional proteins and protein regions that contain smaller or larger highly dynamic fragments, and some proteins are even characterized by a complete or almost complete lack of ordered structure under physiological conditions (at least in vitro) which appears to be a critical aspect of these proteins' function in vivo. 4, [10] [11] [12] [13] [14] [15] These proteins and protein regions (which are known now as IDPs and IDP regions (IDPRs)) have no single, well-defined equilibrium structure and exist as heterogeneous ensembles of conformers such that no single set of coordinates or backbone Ramachandran angles is sufficient to describe their conformational properties. These proteins were independently discovered one-by-one over a long period of time and therefore they were considered as rare exceptions to the general rule. Although the phenomenon of biological functionality without stable structure was repeatedly observed, for a long time it was unnoticed by a wide audience because the authors frequently invented new terms to describe their protein of interest. 16 In fact, an incomplete list of terms coined in the literature to describe these proteins includes floppy, pliable, rheomorphic, 17 flexible, 18 mobile, 19 partially folded, 20 natively denatured, 21 natively unfolded, 12, 22 natively disordered, 15 intrinsically unstructured, 11, 14 intrinsically denatured, 21 intrinsically unfolded, 22 intrinsically disordered, 13 vulnerable, 23 chameleon, 24 malleable, 25 4D, 26 protein clouds, 27 dancing proteins, 28 proteins waiting for partners, 29 and several other names often representing different combinations of "natively/naturally/inherently/intrinsically" with "unfolded/unstructured/disordered/denatured" among several others. Therefore, the majority of the names used in the early literature express that the "unfolded, unstructured, disordered, and denatured" state is a "native, natural, inherent, and intrinsic" property of these proteins. 16 Although protein intrinsic disorder is considered now as an established concept and PubMed contains hundreds and hundreds of papers talking about different aspects of IDPs/IDPRs, the route to recognizing these proteins as a novel functional entity was complex and lengthy. As it is often the case for new scientific concepts, the idea of structure-less functionality went through the stages of passive ignorance and active denial to scrupulous examination and enthusiastic acceptance. For example, it took me more than a year to publish my first paper dedicated to the systematic analysis of such proteins, and the manuscript was successively rejected by 14 journals before it was finally accepted by Proteins. 12 However, time showed that the concept of protein intrinsic disorder was a useful invention and could be considered as a universal lock-pick that helps in solving many of the seemingly unsolvable Figure 1 . A: Protein structure-function paradigm is the "Big Bang" created universe of the modern protein science. Some major directions based on the consideration of protein function as lock-and-key mechanism are shown. Modified from Ref. 1 . B: Paradigm shift caused by the introduction of the protein intrinsic disorder concept opened a wide array of new directions in protein science. In essence, introduction of this concept can be considered as a scientific revolution that, according to Kuhn, 5 "occurs when scientists encounter anomalies that cannot be explained by the universally accepted paradigm within which scientific progress has thereto been made" (http://en.wikipedia.org/wiki/Paradigm_shift). Uversky problems in protein science. One could say that this idea gave a new boost to the development of the protein science, generating a wide array of principally novel research directions [see Fig. 1(B) ]. The goals of this review are: (i) to outline some recent advances in the field of IDPs/IDPRs; (ii) to illustrate the usefulness of intrinsic disorder for protein function; (iii) to show that intrinsic disorder can affect different levels of protein structural organization; (iv) to indicate intimate involvement of intrinsic disorder in pathogenesis of various maladies; (v) to emphasize the exceptional structural heterogeneity of IDPs/IDPRs and to show that IDPs are definitely much more structurally complex than random coillike polypeptides; (vi) to accentuate that although this structural heterogeneity is very important for protein functionality, it represents a crucial hurdle for structural characterization of IDPs; (vii) to stress that new experimental and computational approaches and new theories and models are crucially needed for future progression of this field and protein science in general. These and other points highlight the current state of the field, where further advances in understanding of the "biology" of IDPs still waits for "physics," with "physics" now being new theories, instrumentation, and analytical approaches. Identification of IDPs as unique entities belonging to a new protein tribe is directly related to the recognition that their amino acid sequences are dramatically different from those of ordered proteins. 10, 12, 13, [30] [31] [32] For example, it has been pointed out that the low content of hydrophobic residues combined with the high load of charged residues that often gives rise to high net charge of a polypeptide chain represents a characteristic feature of some IDPs (so called extended IDPs or natively unfolded proteins with coil-like or close to coil-like structures, see below). 12 Therefore, compact proteins and extended IDPs can be distinguished based only on their net charges and hydropathies using a simple charge-hydropathy (CH) plot, where the IDPs are specifically localized within a specific region of CH phase space and are reliably separated from compact ordered proteins. 12 More detailed comparison of amino acid sequences revealed that in comparison with ordered proteins and domains, the IDPs/IDPRs are significantly depleted in order-promoting amino acids (Trp, Tyr, Phe, Ile, Leu, Val, Cys, and Asn), 10, 33 being instead enriched in disorder-promoting residues, such as Ala, Arg, Gly, Gln, Ser, Glu, Lys, and Pro. 13, 31, 32, 34, 35 Difference between ordered and disordered proteins goes far beyond these differences in their amino acid compositions. In fact, based on the comparison of the 265 amino acid physico-chemical property-based scales (such as hydropathy, net charge, flexibility index, helix propensities, strand propensities, aromaticity, etc.) 34 and more than 6000 composition-based attributes (e.g., all possible combinations having one to four amino acids in the group) 36 it has been concluded that ordered and disordered proteins and regions can be discriminated using many of these attributes. 13 Based on the analysis of 517 amino acid scales, a novel amino acid scale, Top-IDP (Trp, Phe, Tyr, Ile Met, Leu, Val, Asn, Cys, Thr, Ala, Gly, Arg, Asp, His, Gln, Lys, Ser, Glu, and Pro), was built to provide ranking for the tendencies of the amino acid residue to promote order or disorder. 30 The fact that the sequences of ordered and disordered proteins and regions are noticeably different suggested that IDPs clearly constitute a separate entity inside the protein kingdom, that these proteins can be reliably predicted using various computational tools, [37] [38] [39] [40] [41] [42] and structurally, that IDPs should be very different from ordered globular proteins since peculiarities of amino acid sequence determine protein structure. Natural Abundance of IDPs: Touching the Tip of the Iceberg Initial systematic analyses revealed that intrinsic disorder in proteins is a rather common phenomenon. In fact, as of 2002, the list of experimentally validated natively unfolded proteins with chain length greater than 50 amino acid residues contained more than 100 entries. 1 It was also pointed out that this list would probably be doubled if shorter polypeptides 30-50 residues long were included, 1 and that these 100 experimentally validated natively unfolded have at least 250 homologues, which are also expected to be natively unfolded. 1, 12 It happened that these "large" numbers (which actually were large enough to make a crucial point that biologically active structure-less proteins represent the new rule and not mere rare exceptions) constitute just a small tip of an iceberg. In fact, using computational tools developed for sequence-based intrinsic disorder prediction the wide spread of IDPs and hybrid proteins containing IDPRs was convincingly shown. [43] [44] [45] [46] For example, more than 15,000 out of 91,000 proteins in the thencurrent Swiss Protein database were identified as having long IDPRs. 47 The published in 2000 analysis of 31 whole genomes that span the 3 kingdoms of life revealed that many proteins contained segments predicted to have 40 consecutive disordered residues and that the eukaryotes exhibited more disorder by these measures than either the prokaryotes or the archaea. 43 Other studies on the abundance of intrinsic disorder in various evolutionary distant species supported these findings and consistently showed that the eukaryotic proteomes had higher fraction of intrinsic disorder than prokaryotic proteomes. 44, [48] [49] [50] [51] [52] This conclusion is in line with the results of a comprehensive bioinformatics investigation of the disorder distribution in almost 3500 proteomes from viruses and three kingdoms of life, results of which are shown in Figure 2 as the correlation between the intrinsic disorder content and proteome size for 3484 species from viruses, archaea, bacteria, and eukaryotes. 46 Surprisingly, Figure 2 shows that there is a well-defined gap between the prokaryotes and eukaryotes in the plot of fraction of disordered residues on proteome size, where almost all eukaryotes have 32% or more disordered residues, whereas the majority of the prokaryotic species have 27% or fewer disordered residues. 46 Therefore, it looks like the fraction of 30% disordered residues serves as a boundary between the prokaryotes and eukaryotes and reflects the existence of a complex step-wise correlation between the increase in the organism complexity and the increase in the amount of intrinsic disorder. A gap in the plot of fraction of disordered residues on proteome size parallels a morphological gap between prokaryotic and eukaryotic cells which contain many complex innovations that seemingly arose all at once. In other words, this sharp jump in the disorder content in proteomes associated with the transition from prokaryotic to eukaryotic cells suggests that the increase in the morphological complexity of the cell paralleled the increased usage of intrinsic disorder. 46 The variability of disorder content in unicellular eukaryotes and rather weak correlation between disorder status and organism complexity (measured as the number of different cell types) is likely related to the wide variability of their habitats, with especially high levels of disorder being found in parasitic host-changing protozoa, the environment of which changes dramatically during their life-span. 53 The further support for this hypothesis came from the fact that the intrinsic disorder content in multicellular eukaryotes (which are characterized by more stable and less variable environment of individual cells) was noticeably less variable than that in the unicellular eukaryotes. 46 It was pointed out that IDPs possess noticeable amino acid biases, and many IDPs/IDPRs are characterized by sequence redundancy and low sequence complexity, containing long stretches of various repeats and being completely devoid of some (often many) types of amino acid residues. These observations seem to indicate that the sequence space of IDPs/IDPRs should be simpler than that of ordered proteins. However, the reality is more complex than conventional wisdom might suggest, and the sequence space attainable by simple IDPs/IDPRs is more diversified than that of the structurally more sophisticated ordered proteins. In fact, a 100 residue-long protein in which any of the normally occurring 20 amino acids can be found has a sequence space of 20 100 (10 130 ) sequences. 54 Obviously, not all random amino acid sequences can fold into unique structures. In other words, a sequence space of a foldable protein (or "foldable" sequence space) is noticeably smaller than the entire sequence space available for a random polypeptide chain. For decades, the actual size of "foldable" sequence space continues to be unsolved mystery despite a large body of theoretical, biochemical, and computational work that aims to unravel the relationship between a protein's primary sequence and its resulting 3D structure. 55 However, the actual number of different amino acid residues in a given foldable sequence can be dramatically reduced, 54 since all twenty residues are not necessary for protein folding and the actual physicochemical identity of most of the amino acids in a protein is irrelevant. [56] [57] [58] [59] [60] [61] [62] [63] In other words, folding alphabet can be noticeably reduced, 55, 64 and amino acids can be clustered based on some shared features such as homolog substitution frequency, 65 local structural environments, 66 or peculiarities of the tertiary structural environments. 67 This simplified folding code further reduces the available "foldable" sequence space. 68 Figure 2 . Correlation between the intrinsic disorder content and proteome size for 3484 species from viruses, archaea, bacteria, and eukaryotes. Each symbol indicates a species. There are totally six groups of species: viruses expressing one polyprotein precursor (small red circles filled with blue), other viruses (small red circles), bacteria (small green circles), archaea (blue circles), unicellular eukaryotes (brown squares), and multicellular eukaryotes (pink triangles). Each viral polyprotein was analyzed as a single polypeptide chain, without parsing it into the individual proteins before predictions. The proteome size is the number of proteins in the proteome of that species and is shown in log base. The average fraction of disordered residues is calculated by averaging the fraction of disordered residues of each sequence over the all sequences of that species. Disorder prediction is evaluated by PONDR-VSL2B. Modified from Ref. 46. Simply by virtue of their existence, IDPs/IDPRs add a new level of complexity to the sequence-structure relationship, dividing the population of protein sequences into two categories, sequences that yield natively ordered, and sequences that code natively disordered proteins. 55 IDPs/IDPRs cannot fold spontaneously and some of them require specific partners to gain more ordered structure. Therefore, they do not possess an entire folding code that defines the ability of foldable proteins to fold spontaneously into a unique biologically active structure. The missing portion of the IDP folding code (or at least part of it) is supplemented by binding partner(s). This defines a principal difference between structured proteins and IDPs/IDPRs: foldable proteins fold first and then bind to their partners whereas IDPs/IDPRs remain disordered until they interact with their partners. 68, 69 Furthermore, many IDPs/IDPRs do not require folding to be functional, 1, 4, 13, 14, [70] [71] [72] [73] and some of them form fuzzy complexes, in which they preserve significant amount of disorder. 74, 75 All this suggests that the sequence space of IDPs (at least those which either do not fold at all or do not completely fold at binding) is noticeably greater than the "foldable" sequence space due to the removal of restrictions posed by the need to gain ordered structure spontaneously. 68 This represents one of the conundrums of intrinsic disorder, where the apparent sequence redundancy and simplicity are combined with the lack of structural restrains leading to the increase in the dimensions and complexity of the available sequence space. Also, the existence of a noticeable sequencestructure heterogeneity of IDPs should be emphasized. 68 Since the unique 3D-structure of an ordered single-domain protein is defined by the interplay between all (or almost all) of its residues, one could expect that the structure-coding potential is homogeneously distributed within its amino acid sequence. On the other hand, a sequence of an IDP/IDPR contains multiple, relatively short functional elements and therefore represents a very complex structural and functional mosaic. 68 This important feature defines the known ability of an IDP/IDPR to interact, regulate, and be controlled by multiple structurally unrelated partners. 76 Such functional "anatomy" of IDPs/IDPRs is determined by the extremely high level of their sequence heterogeneity, which is further increased due to the ability of a single IDPR to bind to multiple partners gaining very different structures in the bound state. 77 One of the crucial consequences of an extended sequence space and non-homogeneous distribution of foldability (or the structure-coding potential) within amino acid sequences of IDPs and IDPRs is their astonishing structural heterogeneity. In fact, a typical IDP/IDPR contains a multitude of elements coding for potentially foldable, partially foldable, differently foldable, or not foldable at all protein segments. 68 As a result, different parts of a molecule are ordered (or disordered) to a different degree. This distribution is constantly changing in time where a given segment of a protein molecule has different structures at different time points. As a result, at any given moment, an IDP has a structure which is different from a structure viewed at another moment. 68 Another level of structural heterogeneity is determined by the fact that many proteins are hybrids of ordered and disordered domains and regions, and this mosaic structural organization is crucial for their functions. 16 Also, even when they do not possess ordered domains, IDPs are known to have various levels and depth of disorder. 78 Over a few past years, an understanding of the available conformational space of IDPs/IDPRs underwent significant evolution. In fact, for a long time, IDPs were considered mostly "unstructured" or "natively unfolded" polypeptide chains. This was mostly due to the fact that the majority of IDPs analyzed at early stages of the field contained very little ordered structure, that is, they were really mostly unstructured or unfolded. Finding and characterization of such "structure-less" proteins was important to build up a strong case to counter-point the dominant view represented by the classical sequence-to-structureto-function paradigm, especially since such fully unstructured, yet functional proteins clearly represented the other extreme of the protein structurefunction spectrum. 16 The top half of the Figure 3 illustrates this situation by opposing rock-like ordered proteins and cooked spaghetti-like IDPs. However, already in some early studies, it was indicated that IDPs/IDRs could be crudely grouped into two major structural classes, proteins with compact and extended disorder. 1, 4, 12, 13, 73 Based on these observations, the protein functionality was ascribed to at least three major protein conformational states, ordered, molten globular, and coil-like, 13, 79 indicating that functional IDPs can be less or more compact and possess smaller or larger amount of flexible secondary/tertiary structure. 1, 4, 12, 13, 73, 79 Roughly at the same time, it was emphasized that the extended IDPs (known as natively unfolded proteins) do not represent a uniform entity but contain two broad structural classes, native coils and native pre-molten globules. 1 Currently available data suggest that intrinsic disorder possesses multiple flavors, can have multiple faces, and can affect different levels of protein structural organization, where whole proteins, or various protein regions can be disordered to a different degree. 68 This new view of structural space of functional proteins can be visualized to form a continuous spectrum of differently disordered conformations extending from fully ordered to completely structure-less proteins, with everything in between (Fig. 3, bottom half) . Here, functional proteins can be well-folded and be completely devoid of disordered regions (rock-like scenario). Other functional proteins may contain limited number of disordered regions (a grass-on-the rock scenario), or have significant amount of disordered regions (a llama/camel hair scenario), or be molten globule-like (a greasy ball scenario), or behave as pre-molten globules (a spaghetti-and-meatballs/sausage scenario), or be mostly unstructured (a hairball scenario). Notably, in this representation, there is no boundary between ordered proteins and IDPs, and, the structure-disorder space of a protein is considered as a continuum. It is important to remember that even the most ordered proteins do not resemble "solid rocks" and have some degree of flexibility. In fact, a protein molecule is an inherently flexible entity and the presence of this flexibility (even for the most ordered proteins) is crucial for its biological activity. 80 Also, another important point to remember is that due to their heteropolymeric nature, proteins are never random coils and always have some residual structure. 68 Protein biophysicists/biochemists working on different aspects of ordered proteins (e.g., analyzing their structural properties, functions, folding, etc.) would find biophysical properties of functional IDPs/IDPRs to be rather unusual since these highly dynamic proteins do not follow the well-accepted wisdom that a protein has to be well-folded to be biologically functional. However, the unusualness is a subjective feature, and from the viewpoint of polymer physics the extended IDPs/IDPRs possess the expected behavior . Structural heterogeneity of IDPs/IDPRs. Top half: Bi-colored view of functional proteins which are considered to be either ordered (folded, blue) or completely structure-less (disordered, red). Ordered proteins are taken as rigid rocks, whereas IDPs are considered as completely structure-less entities, kind of cooked noodles. Bottom half: A continuous emission spectrum representing the fact that functional proteins can extend from fully ordered to completely structure-less proteins, with everything in between. Intrinsic disorder can have multiple faces, can affect different levels of protein structural organization, and whole proteins, or various protein regions can be disordered to a different degree. Some illustrative examples includes ordered proteins that are completely devoid of disordered regions (rock-like type), ordered proteins with limited number of disordered regions (grass-on-the rock type), ordered proteins with significant amount of disordered regions (lhama/camel hair type), molten globule-like collapsed IDPs (greasy ball type), pre-molten globule-like extended IDPs (spaghetti-and-sausage type), and unstructured extended IDPs (hairball type). of flexible and charged polymers, whereas the behavior of an ordered protein is rather unexpected (i.e., due to the existence of the native ensemble that for well-folded, ordered proteins can be approximated as a harmonic well around a unique, welldefined equilibrium structure). Therefore, one definitely should keep in mind that the "unusual" biophysics of extended IDPs/IDPRs has its roots in the usual polymer physics of highly charged and flexible polypeptides. Each protein is believed to be a unique entity that has quite unique primary sequence which governs its 3D structure (or lack thereof) and ensures specific biological function(s). Therefore, understanding the effect of sequence variance on the biological performance presents a challenging task. However, natural polypeptides have originated as random copolymers of amino acids, which were adjusted or "selected" over evolution based on their functional capacities. 56, 81 Despite their differences in primary amino acid sequences, protein molecules in a number of conformational states behave as polymer homologues, suggesting that the volume interactions can be considered as a major driving force responsible for the formation of equilibrium structures or structural ensembles. 82 For example, ordered globular proteins and molten globules (both as folding intermediates of globular proteins or as examples of collapsed IDPs) exhibit key properties of polymer globules, where the fluctuations of the molecular density are expected to be much less than the molecular density itself. Extended IDPs (both intrinsic coils and intrinsic pre-molten globules) and ordered proteins in the pre-molten globule intermediate state possess properties of squeezed coils, since water is a poor solvent for a polypeptide. In fact, even high concentrations of strong denaturants (e.g., urea and GdmCl) are very likely to be bad solvents for protein chains, resulting in the preservation of extensive residual structure even under these harsh denaturing conditions. 82 Based on these and related observations, and taking into account the fact that many IDPs/IDPRs are characterized by significant amino acid composition biases, the overall polymeric behavior of these proteins and regions can be mimicked reasonably well by the behavior of low-complexity polypeptides (e.g., homopolypeptide and block copolypeptides). Following these ideas, it was shown that water is a poor solvent for polypeptide backbone alone and for the IDPs containing long tracts of polar amino acid residues since polar homo-polypeptides without hydrophobic groups (e.g., polyglutamine or glycineserine block copolypeptides) were shown to prefer collapsed ensembles in aqueous media. [83] [84] [85] [86] [87] [88] Furthermore, even polyglycine was shown to have a tendency to form heterogeneous ensembles of collapsed structures in water. 88 A systematic analysis of the conformational behavior of protamines, arginine-rich IDPs involved in the condensation of chromatin during spermatogenesis, and protamine-like peptides revealed that there is a charge-driven coil-to-globule transition in these highly charged polypeptides, where the net charge per residue serves as the discriminating order parameter. 89 Overall, the increase in the hydrodynamic dimensions of a polypeptide chain with increase in its net charge per residue can be attributed to the increase in the intramolecular electrostatic repulsions between similarly charged sidechains and the favorable solvation of these moieties. 89 Based on these premises, at least three different classes of globule-forming polar/charged IDPs were proposed. The first class is comprised by polar tracts which collapse due to water being a poor solvent for a backbone and non-charged side chains. The second class is represented by weak polyelectrolytes and weak polyampholytes, which have low per residue net charge and low fractions of positively and/or negatively charged residues. These IDPs/ IDPRs form collapsed structures since the driving force responsible for the collapse is not overcome by the intramolecular electrostatic repulsion between the charged side-chains and by their favorable free energies of solvation. Furthermore, if such IDPs/ IDPRs possess polyampholytic nature, their globular state could be additionally stabilized by electrostatic interactions between the oppositely charged sidechains. Finally, IDPs/IDPRs from the third class are strong polyampholytes characterized by high fractions of positively and/or negatively charged residues but have low per residue net charge. Such intrinsically disordered protein can form collapsed structures stabilized mostly by multiple electrostatic interactions between solvated side-chains of opposite sign. 89 The extended IDPs/IDPRs were used as a model system for the analysis of the effect of electrostatic interactions on conformational properties of unfolded proteins, and for testing the quantitative descriptions and predictions of polymer theory related to the influence of charged amino acids on chain dimensions. 90 For example, based on the analysis of the conformational equilibrium of coarse-grained polypeptides as a function of sequence hydrophobicity, charge, and length it has been concluded that the variations in sequence hydrophobicity and charge define a coil-to-globule transition comparable to that seeing in the empirical CH-plot, 12, 91 suggesting that a minimal, polymer physics-based model can capture the elements of global protein conformation. 92 IDPs/IDPRs with very high net charges are expected to be more extended and behave more similar to random coils (i.e., similar to conformations adopted by proteins in the denaturant GdmCl). The analysis of the GdmCl-induced expansion of the unfolded states suggested that protein charge density plays a crucial role in defining the hydrodynamic behavior of the unfolded polypeptide chain. 90 Here, highly charged proteins can exhibit a prominent expansion at low ionic strength that correlates with their net charges. 90 It has been also hypothesized that the pronounced effect of charges on the dimensions of unfolded proteins might have important implications for their cellular functions. 90 Similarly, a comprehensive analysis of the hydrodynamic dimensions of FG-nucleoporins containing large IDPRs with multiple phenylalanineglycine repeats (FG-domains) revealed that under the physiologic conditions in vitro these domains adopt distinct categories of disordered structures, such as molten globule, pre-molten globule, relaxedcoil, extended-coil (as in urea), or very extended-coil (as in GdmCl). 93 The category of intrinsically disordered structure in a given FG-domain was related to its amino acid composition, namely to the content of charged residues, where more charged FG-domains possessed larger hydrodynamic dimensions. 94 Furthermore, FG-nucleporins with higher charge density were shown to be more dynamic than the collapsed-coil FG-domains, being also prone to repel other FG-domains. On the other hand, the collapsedcoil FG-domains were prone to oligomerize. These observations suggested that different types of FGdomains with different aggregation propensities provide molecular basis for two different gating mechanisms operating at the nuclear pore complex at distinct locations; one acting as a hydrogel, and the other as an entropic brush. 94 Therefore, the abundance and peculiarities of the charged residues distribution within the protein sequences might determine physical and biological properties of extended IDPs and IDPRs. Also, simple polymer physics-based reasoning can give reasonably well-justified explanation of the conformational behavior of extended IDPs. In general, the conformational behavior of IDPs is characterized by the low cooperativity (or the complete lack thereof) of the denaturant-induced unfolding, lack of the measurable excess heat absorption peak(s) characteristic for the melting of ordered proteins, "turned out" response to heat and changes in pH, and the ability to gain structure in the presence of various binding partners. 95 The analysis of the temperature effects on structural properties of several extended IDPs revealed that native coils and native pre-molten globules partially fold as the temperature is increased. 1, 73, [95] [96] [97] [98] These heating-induced structural changes in extended IDPs were attributed to the increased strength of the hydrophobic interaction at higher temperatures, leading to a stronger hydrophobic attraction, which is the major driving force for folding. Similarly, extended IDPs/IDPRs are characterized by the "turned out" response to changes in pH, 96,99-102 where a decrease (or increase) in pH induces their partial folding due to the minimization of their high net charges viewed at neutral pH, thereby decreasing charge/charge intramolecular repulsion and permitting hydrophobicdriven collapse to the partially folded conformation. 95 Every Disordered Protein is Disordered in its Own Way Data accumulated so far indicate that intrinsic disorder exists at multiple structural levels and might differently affect different regions/domains of IDPs. This defines noted structural complexity and heterogeneity of IDPs/IDPRs which are further enhanced by the way different proteins/protein regions respond to their environments. Furthermore, since intrinsic disorder is crucial for many biological functions and therefore must prevail in different environments, the amino acid sequences and compositions of IDPs and IDPRs are specifically shaped by the peculiarities of their global and local environments. All this makes the protein intrinsic disorder phenomenon to be so broad that one can even assume that every disordered protein (or at least every family of disordered proteins) is disordered in its own way. This hypothesis has far-reaching consequences since it implies that a general disorder predictor has limited accuracy and cannot predict with equally high accuracy disorder status of all protein sequences due to their heterogeneity. It also implies that some environmental factors definitely should be taken into account when assessing intrinsic disorder in proteins. Several examples are presented below to support the overall validity of these statements. The first example is given by transmembrane (TM) proteins, in which disorder is widely observed (e.g., 40% of human integral plasma proteins were predicted to contain long IDPRs). [103] [104] [105] [106] [107] Furthermore, disorder is unevenly distributed between the cytoplasmic and the external surfaces of these proteins, with cytoplasmic domains being up to threefold more disordered than extracellular domains. 105 Although these analyses gave interesting hints on the abundance of disorder in TM proteins, the obvious weakness of such evaluations is in the fact that they were performed using the disorder predictors developed from structured and disordered regions found in water-soluble proteins. 108 However, the major physico-chemical properties of water-soluble and integral membrane proteins are very different due to the differences in their environments. For example, similar to typical water soluble proteins, the TM regions of membrane proteins are often highly structured, containing a-helices 109 or b-structure, 110 which are especially likely to occur due to the low dielectric constant values within the membrane bilayers. 111, 112 On the other hand, the exterior regions of TM proteins are much more apolar than the exteriors of water-soluble proteins. [113] [114] [115] Therefore, the peculiarities of the membrane environment, with its highly nonpolar nature originating either from lipids or from protein interiors, are especially unfavorable for intrinsic disorder, since propensity for intrinsic disorder is typically encoded in a high content of polar and charged residues. Therefore, the IDPRs found in integral membrane proteins would be expected to be generally localized within the regions external to the membrane bilayer. 108 Also, the distinctive environment of the membrane bilayer imposes constraints on the amino acid composition of integral membrane proteins, even on the regions external to the membrane bilayer. 116, 117 Comprehensive bioinformatics analysis revealed that integral membrane proteins commonly possess IDPRs defined as regions of missing electron density in their crystal structures. 108 Comparison of the IDPRs found in the a-helical and the b-barrel bundle integral membrane proteins with the IDPRs viewed in typical water-soluble proteins revealed the existence of statistically distinct amino acid compositional biases characteristic for these three protein classes. Therefore, the use of specific amino acid signatures of IDPRs found in TM helical bundles and b-barrels can potentially lead to significantly more accurate disorder predictions for these two classes of integral membrane proteins. 108 Another illustrative example of the specific disorderrelated and environment-dependent sequence features is given by archaeal proteins. 46, 51 Based on the levels of predicted disordered residues, archaeal proteins can be grouped into three classes, with ranges of the disordered residue content of 12-21%, 21%-32%, and 32%-38% (see Fig. 2 ). The archaeal proteomes with the highest disorder contents are halophiles and methanophiles. 46, 51 Similar to TM proteins, the estimation of intrinsic disorder in the extremophilic proteins of the microorganisms surviving under hypersaline conditions using predictors developed for the "normal" non-halophilic proteins existing under the normal physiological conditions of 100-150 mM NaCl may not be accurate. 46 In fact, one of the strategies used by the halophilic archaea, which are salt-loving extremophilic organisms that grow optimally at high salt concentrations, to maintain proper osmotic pressure in their cytoplasm is a so-called "salt-in" strategy that involves accumulation of molar concentrations of potassium and chloride in their cytosoles. 118 This strategy requires extensive adaptation of the intracellular proteins to the presence of near-saturating salt concentrations. The proteomes of such "salt-in" organisms are highly acidic, 46, 51 and their proteins are characterized by remarkable instability at conditions of low salt concentrations and by maintaining soluble and active conformations in hypersaline conditions that are generally detrimental to the non-halophilic proteins. [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] Finally, peculiarities of disorder distributions in viral proteins can be used to further support the importance of considering environmental factors. 46, 51 Here, the comprehensive analysis of intrinsic disorder in various completed proteomes revealed that the viral proteomes have the largest variation of disorder content, which ranges from 7.3% disordered residues in the human coronavirus NL63 to 77.3% disordered residues in the Avian carcinoma virus proteome (see Fig. 2 ). 46 The high predicted intrinsic disorder content in viruses has multiple functional implications, where some IDPRs are used in the functioning of viral proteins and help viruses to highjack various pathways of the host cells, others likely have evolved to help viruses accommodate to their hostile habitats, and still others evolved to help viruses in managing their economic usage of genetic material via alternative splicing, overlapping genes, and anti-sense transcription. 128 These findings are in agreement with another study revealing that in comparison with archaea and bacteria, viral and bacteriophagic proteins were significantly more enriched in polar residues and depleted in hydrophobic residues and were close to eukaryotic proteins in terms of their amino acid compositions and the reduced content of the order-promoting residues. 129 Functional protein clouds: Major functional advantages of being intrinsically disordered The high natural abundance of IDPds/IDPRs and their specific structural features indicate that these proteins and regions might carry out important biological functions. This hypothesis has been confirmed by several comprehensive studies, 1, [11] [12] [13] [14] [71] [72] [73] 78, [130] [131] [132] [133] [134] which revealed that these structure-less members of the protein kingdom are abundantly involved in numerous biological processes, where they are frequently found to play different roles in regulation of the functions of their binding partners and in promotion of the assembly of supra-molecular complexes. 1, 4, [11] [12] [13] [14] [15] 31, [70] [71] [72] [73] [76] [77] [78] [79] 131, 132, [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] The conformational plasticity of IDPs/IDPRs provides them with a wide spectrum of exceptional functional advantages over the functional modes of ordered proteins and domains. 4, 10, 11, 13, 32, 71, 72, 77, 78, 131, 132, 134, 141, 142, 150, 151 Some of these advantages are: 1 Increased speed of interaction due to greater capture radius and the ability to spatially search through interaction space; 2 Increased interaction (surface) area per residue; 3 Strengthened encounter complex allows for less stringent spatial orientation requirements; 4 Efficient regulation via rapid degradation; 5 The ability to be involved in one-to-many binding, where a single disordered region binds to several structurally diverse partners; 6 The ability to be involved in many-to-one binding, where many distinct (structured) proteins may bind a single disordered region; 7 The ability to overcome steric restrictions, enabling larger interaction surfaces in protein-protein and protein-ligand complexes than those obtained with rigid partners; 8 The ability to fold upon binding (completely or partially); 9 The ability of some IDPs/IDPRs to form very stable intertwined complexes; 10 The ability of some IDPs/IDPRs to stay substantially disordered in bound state; 11 Binding fuzziness, where different binding mechanisms (e.g., via stabilizing the binding-competent secondary structure elements within the contacting region, or by establishing the longrange electrostatic interactions, or being involved in transient physical contacts with the partner, or even without any apparent ordering) can be employed to accommodate peculiarities of interaction with various partners; 12 Binding plasticity, where an IDPR folds to specific bound conformations (which can be very different) according to the template provided by binding partners; 13 High accessibility of sites targeted for posttranslational modifications (PTMs); 14 Efficient structural and functional regulation via PTMs such as phosphorylation, acetylation, lipidation, ubiquitination, sumoylation, and so forth, allowing for a simple means of modulation of their biological functions; 15 Efficient functional control via regulatory proteolytic attack sites of which are frequently associated with IDPRs; 16 Ease of regulation/redirection and production of otherwise diverse forms by alternative splicing (given the existence of multiple functions in a single disordered protein, and given that each functional element is typically relatively short, alternative splicing could readily generate a set of protein isoforms with a highly diverse set of regulatory elements 152 ); 17 The possibility of overlapping binding sites due to extended linear conformation; 18 Decoupled binding affinity and specificity, where, due to the induced folding, IDP/IDPR can be involved in the formation of specific but weak complexes. In other words, IDP/IDPR might possess high specificity for given partners combined with high k on and k off rates that enable rapid association with the partner without an excessive binding strength. This combination of high specificity with low affinity defines the broad utilization of intrinsic disorder in regulatory interactions where turning a signal off is as important as turning it on; 19 Diverse evolutionary rates with some ID proteins being highly conserved and other ID proteins possessing high evolutionary rates. The latter ones can evolve into sophisticated and complex interaction centers (scaffolds) that can be easily tailored to the needs of divergent organisms; 20 Flexibility that allows masking (or not) of interaction sites or that allows interaction between bound partners; 21 The ability to be involved in the cascade interactions, where IDP binding to the first partner induces partial folding generating a new binding site suitable for interaction with the second partner, and so forth. Many disorderrelated functions (e.g., signaling, control, regulation, and recognition) are incompatible with well-defined, stable 3-D structures. 1, [11] [12] [13] [14] 31, 73, 78, 79, 132, 134, [138] [139] [140] 142, 144, 153 Functions of many IDPs/IDPRs rely on interactions with specific binding partners, and many IDPs/IDPRs tend to undergo disorder-to-order transitions as a result of binding to their specific targets. 12 Functionally, IDPs/IDPRs were grouped in at least six broad classes based on the mode of action. 14,136 These broad classes included protein and RNA chaperones, entropic chains, effectors, scavengers, assemblers and display sites, 14,136 and 28 separate functions, including molecular recognition via binding to other proteins, or to nucleic acids, were assigned for IDPRs in early studies. 71, 72 Later, a rich spectrum of biological functions associated with IDPs/IDPRs was found based on a comprehensive computational study of a correlation between the functional annotations in the Swiss-Prot database and predicted intrinsic disorder. [138] [139] [140] The approach was based on the hypothesis that if a function described by a given keyword relies on intrinsic disorder, then the keyword-associated protein would be expected to have a greater level of predicted disorder compared to the protein randomly chosen from the Swiss-Prot. This analysis revealed that 44% and 34% of Swiss-Prot functional keywords were associated with ordered and disordered proteins, respectively, whereas 22% functional keywords yielded ambiguity in the likely function-structure associations. [138] [139] [140] Interestingly, most of the structured protein-associated key words were shown to be related to enzymatic activities, whereas the majority of the disordered protein-associated keywords were related to signaling and regulation. These results agree well with the notion that enzymatic catalysis requires ordered structure and that effectiveness of signaling is dependent on binding reversibility, a property directly associated with the thermodynamics of disorder-to-order transition induced by binding. [138] [139] [140] Many IDPs and IDPRs undergo a disorderto-order transition upon functioning. 11, 13, 15, 71, 72, 78, 79, [130] [131] [132] 134, [154] [155] [156] [157] When disordered regions bind to signaling partners, the free energy required to bring about the disorder to order transition takes away from the interfacial, contact free energy, with the net result that a highly specific interaction can be combined with a low net free energy of association. 13, 155 High specificity coupled with low affinity is a useful pair of properties for a reversible signaling interaction. Furthermore, a disordered protein can readily bind to multiple partners by changing shape to associate with different targets. 13, 158, 159 All this clearly suggests that there is a new twopathway protein structure-function paradigm, with sequence-to-structure-to-function for enzymes and membrane transport proteins, and sequence-to-disordered ensemble-to-function for proteins and protein regions involved in signaling, regulation, and control. 1, 13, 71, 73, 79 One of the first generalization of this concept was given by The Protein Trinity Hypothesis, which suggested that native proteins can be in one of three states, the solid-like ordered state, the liquid-like collapsed-disordered state, or the gas-like extended-disordered state. 79 Function is then viewed to arise from any one of the three states or from transitions between them. This model was subsequently expanded to include a fourth state (pre-molten globule) and transitions between all four states. 1 In reality, based on the outlined above idea of the continuous spectrum of protein structures, functional proteins contain various amounts of intrinsic disorder and this continuous structural spectrum of protein defines their limitless functional variability. Among intriguing protein functions relying on intrinsic disorder are moonlighting activities, 137 actions of hub proteins, 78, 93, 134, [160] [161] [162] [163] [164] and scaffolding functions. 141, 165 Since all these functions illustrate the notions that the intrinsic disorder concept represents a universal skeleton key (or lock-pick) that helps unlocking seemingly unresolvable mysteries of protein science and therefore can be considered as a new Ariadne's thread that helps navigate the unusual twists of the sophisticated relationships between protein sequence, structure, and function, they are considered in some detail below. Moonlighting proteins. Moonlighting is the ability of a protein to fulfill more than one function. Often, these functions are unrelated or at least are not obviously related to each other. 137, [166] [167] [168] The capability of a protein to be involved in moonlighting or multi-tasking activities represent one of the solutions used by the Nature to increase the organism's complexity without the expansion of the genome size, where by acting differently at distinct points of metabolic networks proteins increase network complexity without increasing the actual size of the network. 137, [166] [167] [168] Among various molecular mechanisms used by the moonlighting proteins to switch between functions are changes in cellular localization, changes in ligand binding, expression in different cell types, and variations of the oligomerization state. 137 In addition to these mechanisms that can be explained within the frames of the traditional structure-function paradigm, consideration of the intrinsic disorder phenomenon opens new possibilities. 137 In fact, one of the peculiar functional advantages of IDPs/IDPRs is their binding promiscuity and ability to be involved in one-to-many signaling, whereby an IDP/IDPR binds structurally different partners in a template-induced folding process. 11, 77, 132, 169 Therefore, IDPs/IDPRs can use the same region or overlapping interaction regions/surfaces to exert distinct effects and employ the disorderbased mechanisms to switch function that relies on their capability to form different conformations upon binding. 137 Such structural malleability of IDPs/ IDPRs defines their ability to participate in unprecedented moonlighting events, where these disordered moonlighting proteins or regions produce the opposing effects (inhibition and activation) on different partners or even the same partner molecule. 137 Hub proteins. Signaling interactions inside the cell can be described as specific and complex networks that can be considered as "scale-free" or "small-world" networks, which have hubs, with many connections, and ends, that have the only connection to just one neighbor. 170, 171 Such scale-free networks combine the local clustering of connections characteristic of regular networks with occasional long-range connections between clusters, as can be expected to occur in random networks. In other words, the distance between nodes in these scalefree networks follows a power-law distribution. 172 Based on their spatiotemporal peculiarities protein hubs were grouped into two broad categories, "date hubs" that binds their numerous partners sequentially, and "party hubs" simultaneously interacting with their partners. 173 Since many IDPs are known to be involved in interaction with large number of distinct partners, they clearly can be considered as hubs in the scale-free protein-protein interaction networks. 78, 134 Based on the systematic analysis of several know hub proteins 134 followed by a series of robust bioinformatics studies, 93,160-164 it was concluded that hubs commonly use disordered regions to bind to multiple partners and that there are at least two primary mechanisms by which disorder is utilized in protein-protein interaction networks where one disordered region binds to many partners or many disordered region bind to one partner. 134 Scaffold proteins. Scaffold proteins constitute an important subclass of hubs that typically have a modest number of interacting partners and that are commonly found at the central parts of functional complexes, where they interact with most of their partners at the same time and therefore act as party hubs. 160 Besides being responsible for bringing together specific proteins within a signaling pathway and providing selective spatial orientation and temporal coordination to facilitate and promote interactions among interacting proteins, some scaffolds can influence the specificity and kinetics of signaling interactions via simultaneous binding to multiple participants in a particular pathway and facilitation and/or modifying the specificity of pathway interactions, 174 other scaffold can change conformations of individual proteins and thus modulate their activities, 174 still other scaffold proteins may modulate the activation of alternative pathways by promoting interactions between various signaling proteins. 141 Analysis of several well-characterized signaling scaffold proteins reveled that their large IDPRs are crucial for the successful scaffold function. 141 A more global bioinformatics analysis revealed that a typical design of a scaffold protein includes a set of short globular domains (80 amino acids on average) connected by long linker regions (150 residues on average) with crucial binding functions. 165 This gave further support to the notion that signaling scaffold proteins utilize the various features of highly flexible ID regions to obtain more functionality from less structure. 141 Disorder and transcription regulation. Conformational plasticity and adaptability associated with intrinsic disorder are crucial for various protein functions. Among the proteins whose functional life is strongly disorder-dependent are transcription factors (TFs) 175, 176 and other proteins involved in transcriptional regulation, such as the mediator complex, 24,177 core and linker histones, 178 and ribosomal proteins. 179 For example, from 83 to 94% of TFs might possess long IDPRs, with the degree of disorder in eukaryotic TFs being significantly higher than in prokaryotic TFs. 175, 176 Also, TFs were shown to be depleted in order-promoting residues and enriched in disorder-promoting residues, and were characterized by high levels of a-molecular recognition feature (MoRF). 175 Furthermore, disorder is unevenly distributed within the TFs, with the degree of disorder in their activation regions being much higher than that in DNA-binding domains. However, the AT-hooks (which are DNA-binding motifs present in many proteins which binds to the (ATAA) and (TATT) repeats of DNA) and basic regions of TF DNA-binding domains are highly disordered suggesting that eukaryotes with their well-developed gene transcription machinery require transcription factor flexibility to be more efficient. 175 A number of interesting and important roles were also ascribed to intrinsic disorder in TFs related to the regulation of heat shock response (so called heat shock factors, HSFs) 180 and in the reprogramming TFs (the Yamanaka factors, namely Sox2, Oct3/4 (Pou5f1), Klf4, and c-Myc, and the Thomson factors, namely Sox2, Oct3, Lin28, and Nanog) overexpression of which is known to generate induced pluripotent stem (iPS) cells from terminally differentiated somatic cells. 181 Disorder in the regulation of cellular pathways. Of special interests are the vital roles of intrinsic disorder in regulation and orchestration of various cellular pathways. One of the illustrative examples of this regulatory role of intrinsic disorder is the canonical Wnt-pathway that involves five proteins, Axin, CKI-a, GSK-3b, APC (adenomatous polyposis coli, also known as deleted in polyposis 2.5 protein), and b-catenin (all shown to contain long IDPRs). This pathway is known to play a number of crucial roles in the development of organism, and the malfunctions of which might lead to various diseases including cancer. 182 The comprehensive analysis of published data revealed that IDPRs found in Wntpathway proteins orchestrate protein-protein interactions, and facilitate PTMs and signaling. 182 Furthermore, the scaffold protein Axin and another large protein, APC, are heavily enriched in disorder and act as flexible concentrators in gathering together all other proteins involved in the Wnt-pathway. 182 Intriguingly, the multifarious roles of highly disordered APC in regulation of b-catenin function were established by showing that disordered APC helps the collection of b-catenin from cytoplasm, facilitates the bcatenin delivery to the binding sites on Axin, and controls the final detachment of b-catenin from Axin. 182 Another important illustration of the involvement of intrinsic disorder in regulation of crucial pathway is given by the process of the programmed cell death (PCD), which is one of the most intricate cellular processes where the cell uses specialized Uversky cellular machinery and intracellular programs to kill itself and which enables metazoans to control cell numbers and eliminate cells that threaten the animal's survival. 183 PCD includes several specific modules, such as apoptosis, autophagy, and programmed necrosis (necroptosis). These modules are not only tightly regulated but also intimately interconnected and are jointly controlled via a complex set of protein-protein interactions. Recently, several large sets of PCD-related proteins across 28 species were analyzed using a wide array of modern bioinformatics tools to understand the role of the intrinsic disorder in controlling and regulating the PCD. 183 This analysis revealed that proteins involved in regulation and execution of PCD possess substantial amount of intrinsic disorder and IDPRs were implemented in a number of crucial functions, such as protein-protein interactions, interactions with other partners including nucleic acids and other ligands, were shown to be enriched in post-translational modification sites, and were characterized by specific evolutionary patterns. 183 Unique catalytic function of a protein is believed to be dictated by its unique 3D structure. This axiom constitutes a cornerstone of the lock-and-key paradigm and it seemed to be able to sustain the furious attack on protein structure-function relationship initiated by the discovery of IDPs and hybrid proteins with ordered domains and IDPRs. In fact, from the vast majority of experimental and computational studies a general conclusion was drawn over and over again, where the functional repertoire of IDPs complemented the functional arsenal of ordered proteins, with ordered proteins being mostly responsible for catalysis and transport and with IDPs doing the majority of other jobs in the cell. On the other hand, all proteins (even the most ordered and tightly folded ones) are intrinsically flexible molecules that undergo conformational changes over a wide range of timescales and amplitudes. 184 In fact, the combination of active site reactivity with the dynamic character of proteins allows enzymes to be promiscuous and remarkably efficient at the same time. 185 Furthermore, in general, dynamic fluctuations are crucial for enzyme catalysis, since they can influence substrate binding and product release, and may even adjust the effective barriers of the catalyzed reactions. [186] [187] [188] [189] [190] Often, dynamic changes in the enzyme during the catalytic reaction can be described using the induced-fit model, where a conversion of one tight conformational ensemble (free enzyme) to another distinct ensemble (bound enzyme) takes place through a series of local substrate-mediated structural rearrangements. 191 Despite this crucial role of local flexibility in the enzymatic catalysis, enzymes are still relatively stable molecules whose dynamic character is restricted to a small set of tightly folded conformations and whose unique (albeit locally flexible) structures are needed for efficient catalysis. From this viewpoint, the presence of intrinsic disorder is expected to be poorly compatible with enzymatic catalysis, which requires a well-organized environment in the active site of the enzyme in order to facilitate the formation of the transition state of the chemical reaction to be catalyzed. 192 In a sharp contrast to this common wisdom supported by a wide array of specific examples, several enzymes were shown to be much more dynamic than the catalytic machines are expected to be, clearly possessing, in their precatalytic states, many characteristic properties of molten globules and retaining unusually high flexibility in structurally defined enzyme-ligand complexes. One of the best characterized examples of such molten globular enzymes is the engineered monomeric form of chorismate mutase from Methanococcus jannaschii (MjCM). 184, [193] [194] [195] Here, a functional monomer (mMjCM) was created by inserting the hinge-loop sequence into the long, dimer-spanning N-terminal helix. 193 In its unbound form, mMjCM was shown to exists as a native molten globule that was described as a dynamic ensemble of a-helical conformers rapidly interconverting on the millisecond timescale. 193 Interaction with natural ligand induced global conformational changes in the molten globular mMjCM promoting formation of a defined enzyme-ligand complex, which, however, preserved unusually high flexibility. 184 Catalytic mechanism of the molten globular mMjCM was described as follows: "Though probably stochastic in nature, internal motions in the complex may generate a collective dynamic matrix that samples catalytically active conformation(s) often enough to achieve rapid turnover in the presence of the true transition state." 184 Therefore, some enzymes can represent a highly dynamic heterogeneous conformational ensemble which is still compatible with efficient catalysis. In agreement with this hypothesis, a molten globular character was described for circularly permuted dihydrofolate reductase (DHFR), 196, 197 and urease G from Bacillus pasteurii (BpUreG). [198] [199] [200] Of these three enzymatic molten globules UreG is the only natural molten globular enzyme known to date, since both circularly permuted DHFR and monomeric MjCM were obtained as a result of some genetic manipulations. Although the number of known native molten globules with enzymatic activity is small, their existence provides an interesting hint on early protein evolution. In fact, simple logics suggests that well-ordered enzymes appear as a result of long evolutionary process, whose very likely starting point was a partially folded polypeptide with some general properties of the molten globule. IDPs/IDPRs can form highly stable complexes, or be involved in signaling interactions where they undergo constant "bound-unbound" transitions, thus acting as dynamic and sensitive "on-off" switches. The ability of these proteins to return to the highly flexible conformations after the completion of a particular function, and their predisposition to gain different conformations depending on the environmental peculiarities, are unique physiological properties of IDPs which allow them to exert different functions in different cellular contests according to a specific conformational state. 4 Due to their lack of rigid structure, combined with the high level of intrinsic dynamics and almost unrestricted flexibility at various structure levels in the non-bound state, as well as due to the unique capability to adjust to structure of the binding partner, IDPs are characterized by a very diverse range of binding modes, creating a multitude of unusual complexes, many of which are not attainable by ordered proteins. 201 Some of these complexes are relatively static, resemble complexes of ordered proteins, and, therefore are suitable for the structure determination by X-ray crystallography. Among these static complexes are: MoRFs, wrappers, chameleons, penetrators, huggers, intertwined strings, long cylindrical containers, connectors, armature, tweezers and forceps, grabbers, tentacles, pullers, and stackers or b-arcs. 201 These binding modes are shown in Supporting Information Figure 1S and briefly described in the Supporting Information Materials. In addition to the static complexes, where bound partners have fixed structures, some IDPs/IDPRs do not fold even in their bound state, forming so-called disordered, dynamic, or fuzzy complexes with ordered proteins, 97, [202] [203] [204] [205] [206] other disordered proteins, [207] [208] [209] or biological membranes. 210, 211 In complexes of some of these IDPs with their binding partners, the disordered regions flanking the interaction interface but not the interface itself remain disordered. Such mode of interaction was recently described as "the flanking fuzziness" in contrast to "the random fuzziness" when the disordered protein remains entirely disordered in the bound state. 75, 212 It is also expected that the similar binding mode can be utilized by disordered protein while interacting with nucleic acids and other biological macromolecules. 201 Physically, binding is considered as joining objects together and suggests spatial and temporal fixation of bound partners. The formation of protein complexes with specific binding partners is expected to bring some fixation (at least at the binding site). Therefore, disordered complexes where interaction of a disordered protein with the binding partners is not accompanied by a disorder-to-order transition within the interaction interface clearly cannot be described by the classical binding paradigm. This contradiction can be resolved assuming that the ordered binding partner and/or disordered protein contain multiple low affinity binding sites. The existence of several similar binding sites combined with a highly flexible and dynamic structure of disordered protein creates a unique situation where any binding site of disordered protein can interact with any binding site of its partner with almost equal probability, in a staccato manner. The low affinity of each individual contact implies that each of them is not stable and can be readily broken. Therefore, such disordered or fuzzy complex can be envisioned as a highly dynamic ensemble in which a disordered protein does not present a single binding site to its partner but resemble a "binding cloud," in which multiple identical binding sites are dynamically distributed in a diffuse manner. In other words, in this staccato-type interaction mode, an disordered protein rapidly changes multiple binding sites while probing binding site(s) of its partner. 201 An additional factor which can help holding a dynamic complex together could be a weak longrange attraction between protein molecules. 213 This long-range attraction is universal for all protein solutions and has a range several times that of the diameter of the protein molecule, much greater than the range of the screened electrostatic repulsion. 213 The most common outcome of these function-related structural changes is the overall increase in the amount of ordered structure. However, functions of some ordered proteins require local or even global unfolding of a unique protein structure. 68 Among specific features of these structural alterations are their induced nature and transient character combined with a wide range of molecular mechanisms by which they can be promoted. 68 These functional unfolding-activating factors include light; mechanical force; changes in pH, temperature, or redox potential; interaction with membrane, ligands, nucleic acids, and proteins; various PTMs; release of autoinhibition due to the unfolding of autoinhibitory domains induced by their interaction with nucleic acids, proteins, membranes, PTMs, and so forth. 68 Among rather unusual factors used by nature to activate proteins via functional unfolding are light and mechanical force. For example, exposure to blue light results in the activation of the photoactive yellow protein (PYP), which is an ordered, water-soluble 14 kDa protein that contains a thioester linked Uversky p-coumaric acid cofactor and serves as a photosensor in Ectothiorhodospira halophila. 214, 215 PYP is a bacterial blue light sensor that undergoes conformational changes upon signal transduction. The absorption of a photon triggers substantial protein unfolding and leads to the formation of the transient signaling state that interacts with the partner molecules. This allows the swimming bacterium to operate the directional switch that protects it from harmful illumination. Comprehensive analysis combining double electron electron resonance spectroscopy (DEER), high resolution NMR, and timeresolved pump-probe X-ray solution scattering (TR-SAXS/WAXS) revealed that the transiently activated and short-lived signaling state of the PYP possessed a large degree of disorder and existed as an ensemble of multiple conformers that exchange on a millisecond time scale. 216, 217 This unusual behavior is illustrated in Figure 4 that shows structures of inactive folded PYP and its light-activated functional form, which is highly disordered. 68 Some proteins undergo local unfolding induced by the mechanical force and therefore can serve as force sensors. 68 Among these natural force sensors are mechanosensitive ion channels that recognize and respond to the membrane tension, which is the mechanical forces applied along the plane of the cell membrane, rather than to the hydrostatic pressure perpendicular to the membrane plane. 220 These ion channels are activated via partial unfolding of some of their functional parts induced by membrane tension. 221 For a long time, the fact that IDPs/IDPRs undergo disorder-to-order transitions either during their functions or in order to be functional was used as one of the strongest arguments against the idea of protein intrinsic disorder. It was stated that most IDPs (those which are not the artifacts of current methods of protein production) are in fact proteins waiting for a partner (PWPs) that serve as parts of a multi-component complex and that do not fold correctly in the absence of other components. 29 Therefore, when folded after binding to their partners, these proteins are not too different from typical ordered proteins. However, one need to keep in mind that a portion of "folding code" that defines the ability of ordered proteins to spontaneously gain a unique biologically active structure is missing for IDPs/IDPRs since they cannot fold spontaneously. This missing portion of the "folding code" (or a part of it) can be supplemented by binding partner(s). As a result, ordered and disordered proteins can be discriminated on a simple basis of temporal correlation between their folding and binding: ordered proteins fold first and then bind to their partners while the IDPs/IDPRs remain disordered until they bind their partners and often preserve substantial disorder in the bound state. 69 Furthermore, numerous cases of functional unfolding (or transient disorder, or upside-down functionality) represent further support to the concept of functional disorder by clearly showing that many proteins possess dormant disorder that needs to be awakened in order to make these proteins functional. It is clear now that the IDPs and IDPRs are real, abundant, diversified, and vital. The highly dynamic nature of IDPs and IDPRs is a visual illustration of the chaos. However, the evolutionary persistence of these highly dynamic proteins (see below), their unique functionality, and involvement in all the major cellular processes evidence that this chaos is tightly controlled. 147 To answer the question as to . Ground state structure was determined by multidimensional NMR spectroscopy. 218 This structure is in agreement with an earlier published 1.4 Å crystal structure, 219 and modeled structure based on combined DEER, TR-SAXS/WAXS, and NMR data. 217 It consists of an open, twisted, 6-stranded, antiparallel b-sheet, which is flanked by four ahelices on both sides. [217] [218] [219] On the contrary, the light-activated form is highly disordered. This structure satisfies DEER, SAXS/ WAXS, and NMR data simultaneously. 217 how these proteins are governed and regulated inside the cell, Gsponer et al. conducted a detailed study focused on the intricate mechanisms of IDP regulation. 222 To this end, all the Saccharomyces cerevisiae proteins were grouped into three classes using one of the available disorder predictors, Dis-oPred2 44 : (i) 1971 highly ordered proteins containing 0-10% of the predicted disorder; (ii) 2711 moderately disordered proteins with 10-30% predicted disordered residues; and (iii) 2020 highly disordered proteins containing 30-100% of the predicted disorder. Then, the correlations between intrinsic disorder and the various regulation steps of protein synthesis and degradation were evaluated. This analysis revealed that the transcriptional rates of mRNAs encoding IDPs and ordered proteins were comparable. However the IDP-encoding transcripts were generally less abundant than transcripts encoding ordered proteins due to the increased decay rates of the transcripts of genes encoding IDPs. 222 Furthermore, IDPs were shown to be less abundant than ordered proteins due to the lower rate of protein synthesis and shorter protein half-lives. As the abundance and half-life in a cell of certain proteins can be further modulated via their PTMs such as phosphorylation, 223 the experimentally determined yeast kinase-substrate network was also analyzed. IDPs were shown to be substrates of twice as many kinases as were ordered proteins. Furthermore, the vast majority of kinases whose substrates were IDPs were either regulated in a cell-cycle dependent manner, or activated upon exposure to particular stimuli or stress. 222 Therefore, PTMs may not only serve as important mechanism for the fine-tuning of the IDP functions but possibly they are necessary to tune the IDP availability under the different cellular conditions. 222 In addition to S. cerevisiae, similar regulation trends were also found in Schizosaccharomyces pombe and Homo sapiens. 222 Based on these observations it has been concluded that both unicellular and multicellular organisms appear to use similar mechanisms for regulation of the intrinsically disordered protein availability. Overall, this study clearly demonstrated that in eukaryotes, there is an evolutionarily conserved tight control of synthesis and clearance of most IDPs. This tight control is directly related to the major roles of IDPs in signaling, where it is crucial to be available in appropriate amounts and not to be present longer than needed. 222 It has been also pointed out that although the abundance of many IDPs is under strict control, some IDPs could be present in cells in large amounts or/and for long periods of time due to either specific PTMs or via interactions with other factors, which could promote changes in cellular localization of IDPs or protect them from the degradation machinery. 13, 70, 138, 223, 224 Overall, this study clearly showed that the chaos seemingly introduced into the protein world by the discovery of IDPs is under the tight control. 147 In an independent study, a global scale relationship between the predicted fraction of protein disorder and protein expression in E. coli was analyzed. 225 This study showed that the fraction of protein disorder was positively correlated with both measured RNA expression levels of E. coli genes in three different growth media and with predicted abundance levels of E. coli proteins. 225 When a subset of 216 E. coli proteins that are known to be essential for the survival and growth of this bacterium were analyzed, the correlation between protein disorder and expression level became even more evident. In fact, essential proteins had on average a much higher fraction of disorder (0.36), had a higher number of proteins classified as completely disordered (19% vs. 2% for E. coli proteome), and were expressed at a higher level in all three media than an average E. coli gene. 225 The manual literature mining for a group of E. coli proteins that had high levels of predicted intrinsic disorder revealed that the disorder predictions matched well with the experimentally elucidated regions of protein flexibility and disorder. 225 A direct link between protein disorder and protein level in E. coli cells could be because the IDPs may carry out the essential control and regulation functions that are needed to respond to the various environmental conditions. Another possibility is that IDPs might undergo more rapid degradation compared to structured proteins, which cells can counter by increasing mRNA levels of the corresponding genes. In this case, higher synthesis and degradation rates could make the levels of these proteins very sensitive to the environment, with slight changes in either production or degradation leading to significant shifts in protein levels. 225 Even more support for the tight control of IDPs inside the cell came from the analysis of cellular regulation of so-called "vulnerable" proteins. 23 The integrity of the soluble protein functional structures is maintained in part by a precise network of hydrogen bonds linking the backbone amide and carbonyl groups. In a well-ordered protein, hydrogen bonds are shielded from water attack, preventing backbone hydration and the total or partial unfolding of the soluble structure under physiological conditions. 226, 227 Since soluble protein structures may be more or less vulnerable to water attack depending on their packing quality, a structural attribute, protein vulnerability, was introduced as the ratio of solvent-exposed backbone hydrogen bonds (which represent local weaknesses of the structure) to the overall number of hydrogen bonds. 23 It has been also pointed out that structural vulnerability can be related to protein intrinsic disorder as the inability of a particular protein fold to protect intramolecular Uversky hydrogen bonds from water attack may result in backbone hydration leading to local or global unfolding. Since binding of a partner can help to exclude water molecules from the microenvironment of the preformed bonds, a vulnerable soluble structure gains extra protection of its backbone hydrogen bonds through the complex formation. 226 To understand the role of structure vulnerability in transcriptome organization, the relationship between the structural vulnerability of a protein and the extent of co-expression of genes encoding its binding partners was analyzed. This study revealed that structural vulnerability can be considered as a determinant of transcriptome organization across tissues and temporal phases. 23 Finally, by interrelating vulnerability, disorder propensity and co-expression patterns, the role of protein intrinsic disorder in transcriptome organization was confirmed, since the correlation between the extent of intrinsic disorder of the most disordered domain in an interacting pair and the expression correlation of the two genes encoding the respective interacting domains was evident. 23 Because of the fact that IDPs are highly abundant and play crucial roles in numerous biological processes, it was not too surprising to find that some of them are involved in human diseases. For example, a number of human diseases originate from the deposition of stable, ordered, filamentous protein aggregates, commonly referred to as amyloid fibrils. In each of these pathological states, a specific protein or protein fragment changes from its natural soluble form into insoluble fibrils, which accumulate in a variety of organs and tissues. [228] [229] [230] [231] [232] [233] [234] Several unrelated proteins including many IDPs are known to be involved in these protein deposition diseases. 234, 235 An illustrative examples of human neurodegenerative diseases associated with IDPs includes Alzheimer's disease (deposition of amyloid-b, tau-protein, a-synuclein fragment NAC) [236] [237] [238] [239] ; various taupathies (accumulation of tau-protein in the form of neurofibrillary tangles) 238 ; Down's syndrome (nonfilamentous amyloid-b deposits) 240 ; Parkinson's disease and other synucleinopathies (deposition of asynuclein) 241 ; prion diseases (deposition of PrP SC ) 242 ; and a family of polyQ diseases, a group of neurodegenerative disorders caused by expansion of GAC trinucleotide repeats coding for PolyQ in the gene products. 243 Furthermore, most mutations in rigid globular proteins associated with accelerated fibrillation and protein deposition diseases have been shown to destabilize the native structure, increasing the steady-state concentration of partially folded (disordered) conformers. [228] [229] [230] [231] [232] [233] [234] The maladies given above have been called conformational diseases, as they are characterized by the conformational changes, misfolding, and aggregation of an underlying protein. However, there is another side to this coin: protein functionality. In fact, many of the proteins associated with the conformational disorders are also involved in recognition, regulation, and cell signaling. For example, functions ascribed to a-synuclein, a protein involved in several neurodegenerative disorders, include binding fatty acids and metal ions; regulation of certain enzymes, transporters, and neurotransmitter vesicles; and regulation of neuronal survival (reviewed in Ref. 241) . Overall, there are about 50 proteins and ligands that interact and/or co-localize with this protein. Furthermore, a-synuclein has amazing structural plasticity and adopts a series of different monomeric, oligomeric, and insoluble conformations (reviewed in Ref. 24) . The choice between these conformations is determined by the peculiarities of the protein environment, suggesting that asynuclein has an exceptional ability to fold in a template-dependent manner. Therefore, the development of the conformational diseases may originate not only from misfolding but also from the misidentification, misregulation, and missignaling of the related proteins. Analysis of so-called polyglutamine diseases gives support to this hypothesis. 244 Polyglutamine diseases are a specific group of hereditary neurodegeneration caused by expansion of CAG triplet repeats in an exon of disease genes which leads to the production of a disease protein containing an expanded polyglutamine, polyQ, stretch. Nine neurodegenerative disorders, including Kennedy's disease, Huntington's diseases, spinocerebellar atrophy- 1, 22, 23, 26, 7, 17 , and dentatorubral pallidoluysian atrophy are known to belong to this class of diseases. [245] [246] [247] [248] In most polyQ diseases, expansion to over 40 repeats leads to the onset. 248 It has been emphasized that such molecular processes as unfolded protein response, protein transport, synaptic transmission, and transcription are implicated in the pathology of polyQ diseases. 244 Importantly, more than 20 transcription-related factors have been reported to interact with pathological polyQ proteins. Furthermore, these interactions were shown to repress the transcription, leading finally to the neuronal dysfunction and death (reviewed in Ref. 244) . These results suggest that polyQ diseases represent kind of transcriptional disorder, 244 supporting our misidentification hypothesis for at least some of the conformational disorders. Disorder is very common in cancer-associated proteins too. In a 2002 study, it was found that 79% of cancer-associated and 66% of cell-signaling proteins contain predicted regions of disorder of 30 residues or longer. 130 In contrast, only 13% of a set of proteins with well-defined ordered structures contained such long regions of predicted disorder. 130 In experimental studies, the presence of disorder has been directly observed in several cancer-associated proteins, including p53, 249 p57 kip2 , 250 Bcl-X L and Bcl-2, 251 c-Fos, 252 a thyroid cancer associated protein TC-1, 253 EWS-FLI1 fusion protein that includes a potent transcriptional activator, the EWS domain, alongside the highly conserved DNA-binding domain FLI1, 254,255 among many other examples. The best characterized example of the important cancerrelated IDP is the tumor suppressor protein p53, which occupies the center of a large signaling network. p53 regulates expression of genes involved in numerous cellular processes, including cell cycle progression, apoptosis induction, DNA repair, as well as others involved in responding to cellular stress. 256 When p53 function is lost, either directly through mutation or indirectly through several other mechanisms, the cell often undergoes cancerous transformation. 257, 258 Cancers showing mutations in p53 are found in colon, lung, esophagus, breast, liver, brain, reticuloendothelial tissues, and hemopoietic tissues. 257 p53 is regulated by several different mechanisms including inhibition of its activity by interaction with E3 ubiquitin ligase Mdm2, which binds to a short stretch of p53 located within the transactivation domain. Mdm2-bound p53 cannot activate or inhibit other genes. Mdm2 ubiquitinates p53 and thus targets it for destruction. Mdm2 also contains a nuclear export signal that causes p53 to be transported out of the nucleus. 259, 260 The possibility of interrupting the action of diseaseassociated proteins (including through modulation of protein-protein interactions) presents an extremely attractive objective for the development of new drugs. Since many proteins associated with various human diseases are either completely disordered or contain long disordered regions, 261, 262 and since some of these disease-related IDPs/IDPRs are involved in recognition, regulation, and signaling, these proteins/regions clearly represent novel potential drug targets. 27 Due to failure to recognize the important role of disorder in protein function, current and evolving methods of drug discovery suffer from an overly rigid view of protein function. In fact, the rational design of enzyme inhibitors depends on the classical view where 3D-structure is an obligatory prerequisite for function. While generally applicable to many enzymatic domains, this view has persisted to influence thinking concerning all protein functions despite numerous examples to the contrary. This is most apparent in the observation that the vast majority of currently available drugs target the active site of enzymes, presumably since these are the only proteins for which the "unique structure-unique function" paradigm is generally applicable. IDPs often bind their partners with relatively short regions that become ordered upon binding. [263] [264] [265] Targeting disorder-based interactions should enable the development of more effective drug discovery techniques. There are at least two potential approaches for the inhibition of the disorder-based interactions, where small molecule either bind to the binding site of the ordered partner to outcompete the IDPs/IDPRs or interacts directly with the IDP/ IDPR. The principles of small molecule binding to IDPRs have not been well studied, but sequence specific, small molecule binding to short peptides has been observed. 266 An interesting twist here is that small molecules can inhibit disorder-based proteinprotein interactions via induction of the dysfunctional ordered structures in targeted IDPR, that is, via the drug-induced misfolding. In agreement with these concepts, small molecules "Nutlins" have been discovered that inhibited the p53-Mdm2 interaction by mimicking the inducible a-helix in p53 (residues 13-29) that binds to Mdm2. 259, 260 Although X-ray crystallographic studies of the p53-Mdm2 complex revealed that the Mdm2 binding region of p53 forms an a-helical structure that binds into a deep groove on the surface of Mdm2, 267 NMR studies showed that the unbound N-terminal region of p53 lacks fixed structure, although it does possess an amphipathic helix part of the time. 249 A close examination of the interface between the proteins reveals that Phe 19 , Trp 23 , and Leu 26 of p53 are the major contributors to the interaction, with the side chains of these three amino acids pointing down into a crevice on the Mdm2 surface. 259, 260 The structure of Nutlin-2 was shown to mimic the crucial residues of p53, with two bromophenyl groups fitting into Mdm2 in the same pockets as Trp 23 and Leu 26 , and an ethyl-ether side chain filling the spot normally taken by Phe 19 . [268] [269] [270] Nutlins and related small molecules increased the level of p53 in cancer cell lines. This drastically decreased the viability of these cells, causing most of them to undergo apoptosis. When one of the nutlins was given orally to mice, a 90% inhibition of tumor growth compared to the control was induced. 260, [268] [269] [270] This successful nutlin story marks the potential beginning of a new era, the signaling-modulation era, in targeting drugs to protein-protein interactions. Importantly, this druggable p53-Mdm2 interaction involves a disorder-to-order transition. Principles of such transitions are generally understood and therefore can use to find similar drug targets, which are inducible a-helices. 271 In addition to nutlins inhibiting p53-Mdm2 interaction, several other small molecules also act by blocking proteinprotein interactions. 272, 273 Some of these interactions involve one structured partner and one disordered partner, with disordered segments becoming a-helix upon binding. 271 Therefore, the p53-Mdm2 complex is not a unique exception and many other disorderbased protein-protein interactions are blocked by a small molecule. All this suggest that there is a cornucopia of new drug targets that would operate by blocking disorder-based protein-protein interactions. For these p53-Mdm2-type examples, the drug molecules mimic a critical region of the disordered partner (which folds upon binding) and compete with this region for its binding site on the structured partner. These druggable interaction sites operate by the coupled binding and folding mechanism. They are small enough and compact enough to be easily mimicked by small molecules. 25 Methods for predicting such binding sites in disordered regions have been developed 274 and the bioinformatics tools to identify which disordered binding regions can be easily mimicked by small molecules have been elaborated. 271 A complementary approach for small molecules to inhibit the disorder-based protein-protein interactions relies on the direct binding of drugs to the IDPs/IDPRs, which is illustrated by the c-Myc-Max story. 275 In order to bind DNA, regulate expression of target genes, and function in most biological contexts, c-Myc transcription factor must dimerize with its obligate heterodimerization partner, Max, which lacks a transactivation segment. Both c-Myc and Max are intrinsically disordered in their monomeric forms. Upon heterodimerization, they undergo coupled binding and folding of their basic-helix-loophelix-leucine zipper domains (bHLHZips). Since the deregulation of c-Myc is related to many types of cancer, the disruption of the c-Myc-Max dimeric complex is one of the approaches for c-Myc inhibition. Several small molecules were found to inhibit the c-Myc-Max dimer formation. 275 These molecules were shown to bind to one of the three discrete sites within the 85-residue bHLHZip domain of c-Myc, which are composed of short contiguous stretches of amino acids that can selectively and independently bind small molecules. 275 Inhibitor binding induces only local conformational changes, preserves the overall disorder of c-Myc, and inhibits interaction with Max. 275 Furthermore, binding of inhibitors to c-Myc was shown to occur simultaneously and independently on the three independent sites. Based on these observations it has been concluded that a rational and generic approach to the inhibition of protein-protein interactions involving IDPs may therefore be possible through the targeting of intrinsically disordered sequence. 275 Recently, a functional misfolding concept was introduced to describe a mechanism preventing IDPs from unwanted interactions with non-native partners. 276 IDPs/IDPRs are characterized by high conformational dynamics and flexibility, the presence of sticky preformed binding elements, and the ability to morph into differently-shaped bound configurations. However, detailed analyses of the conformational behavior and fine structure of several IDPs revealed that the preformed binding elements might be involved in a set of non-native intramolecular interactions. Based on these observations it was proposed that an intrinsically disordered polypeptide chain in its unbound state can be misfolded to sequester the preformed elements inside the noninteractive or less-interactive cage, therefore preventing these elements from the unnecessary and unwanted interactions with non-native binding partners. 276 It is important to remember, however, that the mentioned functional misfolding is related to the ensemble behavior of transiently populated elements of structure. In other words, it describes the behavior of a globally disordered polypeptide chain containing highly dynamic elements of residual structure, so-called interaction-prone preformed fragments, some of which could potentially be related to protein function. 276 This ability of IDRPs/IDPRs to functionally misfold can be used for finding small molecules which would potentially stabilize different members of the functionally misfolded ensemble, and therefore prevent the targeted protein from establishing biological interactions. 277 This approach is very different from the discussed above direct targeting of short IDPRs since it is based on a small molecule binding to a highly dynamic surface created via the transient interaction of preformed interaction-prone fragments. In essence, this approach can be considered as an extension of the well-established structure-based rational drug design elaborated for ordered proteins. In fact, if the structure of a member(s) of the functionally misfolded ensemble can be guessed, then this structure can be used to find small molecules that are potentially able to interact with this structure, utilizing tools originally developed for the rational structure-based drug design for ordered proteins. 277 Ideally, a drug that targets a given protein-protein interaction should be tissue specific. Although some proteins are unique for a given tissue, many more proteins have very wide distribution, being present in several tissues and organs. How can one develop tissue-specific drugs targeting such abundant proteins? Often, tissue specificity for many of the abundant proteins is achieved via the alternative splicing of the corresponding pre-mRNAs, which generates two or more protein isoforms from a single gene. Estimates indicate that between 35 and 60% of human genes yield protein isoforms by means of alternatively spliced mRNA. 278 The added protein diversity from alternative splicing is thought to be important for tissue-specific signaling and regulatory networks in the multicellular organisms. The regions of alternative splicing in proteins are enriched in intrinsic disorder, and it was proposed that associating alternative splicing with protein disorder enables the time-and tissue-specific modulation of protein function. 152 Since disorder is frequently utilized in protein binding regions, having alternative splicing of pre-mRNA coupled to IDPRs can define tissue-specific signaling and regulatory diversity. 152 These findings open a unique opportunity to develop tissue-specific drugs modulating the function of a given ID protein/region (with a unique profile of disorder distribution) in a target tissue and not affecting the functionality of this same protein (with different disorder distribution profile) in other tissues. Wavy pattern of global evolution of intrinsic disorder IDPs/IDPRs are more common in eukaryotes than in less complex organisms. 43, 44, [48] [49] [50] [51] [52] This suggests that disorder, with its ability to be implemented in various signaling, recognition, and regulation pathways and networks, is important for the maintenance of life in eukaryotic and especially muticellular eukaryotic organisms. 4, 45, 78, 134 Also, the finding that alternatively spliced regions of mRNA code for IDPRs much more often than for structured regions suggests that there is a linkage between alternative splicing and signaling by IDPRs that constitutes a plausible mechanism that could underlie and support cell differentiation, which ultimately gave rise to the multicellular eukaryotic organisms. 152 Therefore, one can assume that intrinsic disorder represents a relatively recent evolutionary invention. However, this hypothesis obviously would be wrong if earlier stages of evolution would be taken into account. In fact, the chances that the first polypeptides that appeared in the primordial soup of the primitive Earth possessed well-developed and unique 3D structures are minimal. The Earth formed about 4.5 billion years ago. Scientists dated the first fossils to 3.85 billion years ago. There are still debates and different theories about what happened in those years between the time the earth was cool enough to spawn life and the time the first fossils were formed. At the beginning of the 20th century, Oparin 279 and Haldane 280 proposed that some organic molecules could have been spontaneously produced from the gases of the primitive Earth atmosphere, assuming that this primitive atmosphere was reducing (as opposed to oxygen-rich), and there was an appropriate supply of energy, such as lightning or ultraviolet light. Thirty year later, this hypothesis (that constitutes a cornerstone of the theory of molecular evolution) received strong support from the elegant experiments of Stanley L. Miller and Harold C. Urey who were able to synthesize various organic compounds including some amino acids from non-organic compounds which were believed to represent the major components of the early Earth's atmosphere (water vapor, hydrogen, methane, and ammonia) by putting them into a closed system and running a continuous electric current through the system, to simulate lightning storms believed to be common on the early Earth. 281, 282 However, the Miller-Urey experiment yielded only about half of the modern amino acids 281, 282 suggesting that the first proteins on Earth may have contained only a few amino acids. These findings go in parallel with the biosynthetic theory of the genetic code evolution suggesting that the genetic code evolved from a simpler form that encoded fewer amino acids, 283 probably paralleled by the invention of biosynthetic pathways for new and chemically more complex amino acids. 284 Furthermore, some additional support of the validity of this hypothesis can be found in the standard genetic code (that consists of 4 3 4 3 4 5 64 triplets of nucleotides, codons), which is redundant (64 codons encodes for 20 amino acids). In fact, with only two exceptions, codons encoding one amino acid may differ in any of their three positions. However, only the third positions of some codons may be fourfold degenerate, that is, any nucleotide at this position specifies the same amino acid and all nucleotide substitutions at this site are synonymous. Using these observations as a reflection of the evolutionary development, it was proposed that there was a period during code evolution where the third position was not needed at all and a doublet code preceded the triplet code, giving rise to 4 3 4 5 16 codons encoding for 16 or fewer amino acids, if a termination codon is taken into account. 285 Based on these and many other premises, one can discriminate evolutionary old and new amino acids. In 2000, Eduard N. Trifonov combined 40 different single-factor criteria into a consensus scale and proposed the following temporal order of addition for the amino acids: G/A, V/D, P, S, E/L, T, R, N, K, Q, I, C, H, F, M, Y, W. 286 Even superficial analysis of this sequence reveals that many of the early amino acids (such as G, D, E, P, and S) are disorder-promoting, as they are very abundant in modern IDPs. On the other hand, the major orderpromoting residues (C, W, Y, and F) were added to the genetic code late. This observation is further illustrated by Figure 5 (A) which represents modern genetic code, contains information on the early and late codons (shown by light red and light blue colors, respectively), and on corresponding disorder-and order-promoting residues (shown by red and blue colors, respectively). Codons with intermediate age and disorder-neutral residues are shown by light pink and pink colors, respectively. Figure 5 Uversky illustrates that there is relatively good agreement between the "age" of the residue and its disorderpromoting capacity, with early residues being mostly disorder-promoting, and with the majority of late residues being mostly order-promoting. This conclusion follows from the abundance of the matching colors (light red-red, light blue-blue, and light pinkpink). There are only two noticeable exceptions from these rule, valine and leucine, which are early but order-promoting residues. This strongly suggests that the primordial polypeptides were intrinsically disordered. It is very unlikely that these disordered primordial polypeptides possessed catalytic activity. 287 This hypothesis is in line with the RNA world theory suggesting that during the evolution of enzymatic activity, catalysis was transferred from RNA first to ribonucleoprotein (RNP) and only then to protein. 288 Therefore, the first proteins in the "breakthrough organism" (the first to have encoded protein synthesis) would be nonspecific chaperone-like proteins rather than catalysts. 136, 287 Such RNA chaperone activities of early proteins conferred to their carriers a significant selective advantage in the RNA world, where RNA, which is especially prone to misfolding, 289, 290 was used for both information storage and catalysis. 291 Since the variability of physicochemical properties of amino acids greatly exceeds that of Figure 5 . Peculiarities of disorder evolution. A: Modern genetic code with information on the early and late codons (shown by light red and light blue colors, respectively) and disorder-and order-promoting residues (shown by red and blue colors, respectively). Codons with intermediate ages (i.e., those located between early and late codons) are shown by light pink color, whereas disorder-neutral residues are shown by pink color. B: Wavy pattern of the global disorder evolution. X-axis represents evolutionary time and Y-axis shows disorder content in proteins at given evolutionary time point. Here, primordial proteins are expected to be mostly disordered (left-hand side of the plot), proteins in LUA likely are mostly structured (center of the plot), whereas many protein in eukaryotes are either totally disordered or hybrids containing both ordered and disordered regions (right-hand side of the plot). nucleotides and since protein structures are noticeably more stable than RNA structures, the transition from RNAs (ribozymes) to proteins as carriers of enzymatic activity was a logical evolutionary step. However, efficient catalysis relies on the proper spatial arrangement of catalytic residues which requires a stable structure. 292 Therefore, grafting of the enzymatic activity to proteins generated strong evolutionary pressure toward the well-folded structures. In other words, the global evolution of intrinsic disorder is characterized by a wavy pattern [see Fig. 5 (B)], where highly disordered primordial proteins with primarily RNA-chaperone activities were gradually substituted by the well-folded, highly ordered enzymes that evolved to catalyze the production of all the complex "goodies" crucial for the independent existence of the first cellular organisms. Due to its specific features crucial for the regulation of complex processes, protein intrinsic disorder was reinvented at the subsequent evolutionary steps leading to the development of more complex organisms from the last universal ancestor (i.e., the most recent organism from which all organisms now living on Earth descend 293, 294 ) , and culminating in the appearance of the highly elaborated eukaryotic cells [see There is no simple answer to the question on the comparative evolutionary rates of ordered and IDPs and regions in modern organisms. In fact, it looks like everything is possible, and intrinsically disordered sequences may evolve faster, slower or similar to ordered sequences. For example, disordered and ordered domains of the same protein (e.g., papillomavirus E7 oncoprotein) were shown to possess similar degrees of conservation and co-evolution. 295 Many other IDPs/IDPRs were shown to be characterized by high evolutionary rates 151,296,297 determined by the lack of specific structural restrictions. In fact, the analysis of calcineurins, 10 topoisomerase, 298 ribosomal protein S4, 299 b-subunits of the potassium channel Kvb1.1, 300 and many other proteins showed that disordered regions in these proteins contained more amino acid substitutions, insertions, and deletions than the ordered regions of the same proteins. 151, 301 Furthermore, based on the observation that a significantly higher degree of positive Darwinian selection was observed in IDPRs of proteins compared to regions of a-helix, b-sheet or tertiary structures, it was hypothesized that IDPRs may be required for the genetic variation with adaptive potential and that these regions may be of "central significance for the evolvability of the organism or cell in which they occur." 302 On the other hand, some IDPs and IDPRs are highly conserved. Human a-synuclein (a canonical IDP of 140 residues 140,303 ) differs from its mouse counterpart by merely six residues (4%), and there are just 21 residue differences (12%, which include residue differences at 18 positions and 3 insertions/ deletions) between the human and canary a-synucleins. 304 In flagellin, the ordered central region has greater sequence diversity than its disordered termini. 305 Functionally important conserved regions of predicted disorder were shown to be rather common in proteins from all kingdoms of life, including viruses. 306, 307 Furthermore, many functional domains of a significant size were shown to be intrinsically disordered. 165 Overall, a systematic study of several families of proteins with at least one structurally characterized disordered region revealed that their IDPRs are characterized by highly heterogeneous evolutionary rates, with some disordered amino acid sequences evolving slowly, and others evolving more rapidly than ordered sequences. 151 Also, even different parts of the same disordered region can possess noticeable variability in their divergence during the evolutionary process. 308 Finally, in some disordered proteins, peculiarities of the amino acid composition, and not the amino acid sequence might be conserved. 309, 310 Some Future Directions The last 15 years witnessed a real revolution in our understanding of the protein structure-function relationships. The fact that there is an entire class of polypeptides which do not have rigid structures but possess crucial biological function was heavily underappreciated and ignored for a very long time despite numerous examples scattered in literature. The work which started in my group as an attempt to understand what is so special about several natively unfolded proteins produced a real explosion of interest to structure-less proteins with biological functions. A new field was created and a lot of intriguing information was produced related to structures and functions of IDPs/IDPRs. There is no need to list once again all the discoveries and findings made in this field-they are subjects of many recent reviews and some of them are briefly covered in this article. Although the amount of data generated during the past decade and a half on specific features related to the structural properties of IDPs and IDPRs, their abundance, distribution, functional repertoire, regulation, involvement into the disease pathogenesis, and so forth is vast, it seems that this mass of data produced so far is just a small tip of a humongous iceberg. IDPs/IDPRs continue to bring discoveries almost on a daily basis and even more breakthroughs are expected in future. Modern protein science is at the turning point, but biology still waits for physics. New models explaining various functions of IDPs, their evolution, and involvement in diseases are in great demand, together with the general theory unifying current knowledge on protein structure and function, and with novel experimental and computational tools for focused studies of IDPs/IDPRs.

projects that include this document

Unselected / annnotation		Selected / annnotation
CORD-19_All_docs (0) CORD-19_Custom_license_subset (547) CORD-19-Sentences (547) Epistemic_Statements (251) CORD-19-PD-HP (17) CORD-19-PD-UBERON (54) CORD-19-PD-MONDO (57)

TAB JSON ListView MergeView

CORD-19:f212d6366a45726107a30bde1f4615c28cb5ce22 JSONTXT

projects that include this document

CORD-19:f212d6366a45726107a30bde1f4615c28cb5ce22 JSON TXT