CORD-19:f212d6366a45726107a30bde1f4615c28cb5ce22 JSONTXT 8 Projects

Annnotations TAB TSV DIC JSON TextAE

Id Subject Object Predicate Lexical cue
T1 0-57 Sentence denotes REVIEW A decade and a half of protein intrinsic disorder:
T2 58-89 Sentence denotes Biology still waits for physics
T3 91-99 Sentence denotes Abstract
T4 100-301 Sentence denotes The abundant existence of proteins and regions that possess specific functions without being uniquely folded into unique 3D structures has become accepted by a significant number of protein scientists.
T5 302-532 Sentence denotes Sequences of these intrinsically disordered proteins (IDPs) and IDP regions (IDPRs) are characterized by a number of specific features, such as low overall hydrophobicity and high net charge which makes these proteins predictable.
T6 533-680 Sentence denotes IDPs/IDPRs possess large hydrodynamic volumes, low contents of ordered secondary structure, and are characterized by high structural heterogeneity.
T7 681-791 Sentence denotes They are very flexible, but some may undergo disorder to order transitions in the presence of natural ligands.
T8 792-868 Sentence denotes The degree of these structural rearrangements varies over a very wide range.
T9 869-1026 Sentence denotes IDPs/IDPRs are tightly controlled under the normal conditions and have numerous specific functions that complement functions of ordered proteins and domains.
T10 1027-1123 Sentence denotes When lacking proper control, they have multiple roles in pathogenesis of various human diseases.
T11 1124-1475 Sentence denotes Gaining structural and functional information about these proteins is a challenge, since they do not typically "freeze" while their "pictures are taken." However, despite or perhaps because of the experimental challenges, these fuzzy objects with fuzzy structures and fuzzy functions are among the most interesting targets for modern protein research.
T12 1476-1667 Sentence denotes This review briefly summarizes some of the recent advances in this exciting field and considers some of the basic lessons learned from the analysis of physics, chemistry, and biology of IDPs.
T13 1669-1847 Sentence denotes A bit more than ten years ago, Protein Science published a review entitled "Natively unfolded proteins: a point where biology waits for physics" (Protein Sci 2002 11(4):739-756).
T14 1848-2348 Sentence denotes 1 The major goal of that article was to bring an intriguing protein family of natively unfolded proteins (which are recognized now to constitute a subset of a very broad class of intrinsically disordered proteins, IDPs) out of shadow, to emphasize their lack of ordered structure under physiological conditions (at least ordered structure that could be detected by traditional low resolution techniques), to systemize their major structural properties, and to highlight their biological significance.
T15 2349-2658 Sentence denotes The introduction of such biologically active but essentially unstructured proteins was used to challenge the hitherto dominant structure-centric viewpoint (structure-function paradigm), according to which a specific function of a protein is determined by its unique and rigid three-dimensional (3D) structure.
T16 2659-2933 Sentence denotes The title of the review ("a point where biology waits for physics") was inspired by the observations that many of such "structure-less" proteins analyzed by that time acted as "binders" that did undergo at least partial folding after interaction with their binding partners.
T17 2934-3151 Sentence denotes These observations provoked an idea that these biologically important proteins with little or no ordered structure have to wait to become more folded (and functional) as a result of binding to their specific partners.
T18 3152-3446 Sentence denotes In other words, for these proteins, "biology," that is, the ability to have biological functions, seemed to wait for "physics" which is manifested in their ability to undergo binding-induced folding (at least partial), which is necessary to bring the functional state of these proteins to life.
T19 3447-3812 Sentence denotes 1 At the beginning, the idea that structure-less proteins can be biologically active was taken as a complete heresy by many researchers, since it was absolutely alien to then dominated structure-function paradigm which represented a foundation of the long-standing belief that the specific functionality of a given protein is determined by its unique 3-D structure.
T20 3813-3989 Sentence denotes This structure-function paradigm that describes reasonably well the catalytic behavior of enzymes was based on the "lock-and-key" hypothesis formulated in 1894 by Emil Fischer.
T21 3990-4267 Sentence denotes 2 This viewpoint was solidified by the successful solution of X-ray crystallographic structures of many proteins (as of February 26, 2013 there were 81,922 protein structures in the Protein Data Bank, 3 with 72,761 of these structures being determined by Xray crystallography).
T22 4268-4492 Sentence denotes These many crystal structures reinforced a static view of functional protein, where a rigid active site of an enzyme can be viewed as a sturdy lock that provides an exact fit to only one key, a specific and unique substrate.
T23 4493-4647 Sentence denotes 4 Despite numerous limitations, this lock-and-key model was an extremely fruitful concept that was responsible for the creation of modern protein science.
T24 4648-4881 Sentence denotes 1 Figure 1 (A) shows some of the most obvious scientific consequences of the application of structure-function paradigm which is deservedly placed at the center of the "Big Bang" model that gives rise to the protein science universe.
T25 4882-5116 Sentence denotes 1 Obviously, the consideration of a protein as a rigid crystal-like entity is an oversimplification, since even the most stable and well-folded proteins are dynamic systems that possess different degrees of conformational flexibility.
T26 5117-5362 Sentence denotes This is because of the simple fact that so-called conformational forces, that is, forces stabilizing the secondary structure of a protein and its tertiary fold, are weak and can be broken even at ambient temperatures due to thermal fluctuations.
T27 5363-5575 Sentence denotes 4 The breaking of these weak interactions releases the groups that were involved in these interactions and gives them the possibility to be involved in the formation of new weak interactions of comparable energy.
T28 5576-5910 Sentence denotes 4 Since these structural rearrangements are of relatively small scale and since they occur typically in a time scale that is faster than the time required for structure determination by X-ray crystallography and many other physical techniques, the 3-D structures of proteins determined by these techniques represent averaged pictures.
T29 5911-6057 Sentence denotes 6 Furthermore, one should keep in mind that not all proteins structures which are deposited to PDB are structured throughout their entire lengths.
T30 6058-6359 Sentence denotes Instead, many PDB proteins have portions of their sequences missing from the determined structures (so-called regions of missing electron density) 7, 8 due to the failure of the unobserved atom, side chain, residue, or region to scatter X-rays coherently caused by their flexible or disordered nature.
T31 6360-6540 Sentence denotes Such flexible/disordered regions are rather common in the PDB, since only about 30% of protein structures deposited in the PDB do not have such regions of missing electron density.
T32 6541-6723 Sentence denotes 9 In addition to ordered proteins possessing disordered regions of varying length, the literature contains numerous examples of biologically active proteins with flexible structures.
T33 6724-7085 Sentence denotes 4 Therefore, there is another class of functional proteins and protein regions that contain smaller or larger highly dynamic fragments, and some proteins are even characterized by a complete or almost complete lack of ordered structure under physiological conditions (at least in vitro) which appears to be a critical aspect of these proteins' function in vivo.
T34 7086-7439 Sentence denotes 4, [10] [11] [12] [13] [14] [15] These proteins and protein regions (which are known now as IDPs and IDP regions (IDPRs)) have no single, well-defined equilibrium structure and exist as heterogeneous ensembles of conformers such that no single set of coordinates or backbone Ramachandran angles is sufficient to describe their conformational properties.
T35 7440-7597 Sentence denotes These proteins were independently discovered one-by-one over a long period of time and therefore they were considered as rare exceptions to the general rule.
T36 7598-7840 Sentence denotes Although the phenomenon of biological functionality without stable structure was repeatedly observed, for a long time it was unnoticed by a wide audience because the authors frequently invented new terms to describe their protein of interest.
T37 7841-8522 Sentence denotes 16 In fact, an incomplete list of terms coined in the literature to describe these proteins includes floppy, pliable, rheomorphic, 17 flexible, 18 mobile, 19 partially folded, 20 natively denatured, 21 natively unfolded, 12, 22 natively disordered, 15 intrinsically unstructured, 11, 14 intrinsically denatured, 21 intrinsically unfolded, 22 intrinsically disordered, 13 vulnerable, 23 chameleon, 24 malleable, 25 4D, 26 protein clouds, 27 dancing proteins, 28 proteins waiting for partners, 29 and several other names often representing different combinations of "natively/naturally/inherently/intrinsically" with "unfolded/unstructured/disordered/denatured" among several others.
T38 8523-8739 Sentence denotes Therefore, the majority of the names used in the early literature express that the "unfolded, unstructured, disordered, and denatured" state is a "native, natural, inherent, and intrinsic" property of these proteins.
T39 8740-9015 Sentence denotes 16 Although protein intrinsic disorder is considered now as an established concept and PubMed contains hundreds and hundreds of papers talking about different aspects of IDPs/IDPRs, the route to recognizing these proteins as a novel functional entity was complex and lengthy.
T40 9016-9227 Sentence denotes As it is often the case for new scientific concepts, the idea of structure-less functionality went through the stages of passive ignorance and active denial to scrupulous examination and enthusiastic acceptance.
T41 9228-9453 Sentence denotes For example, it took me more than a year to publish my first paper dedicated to the systematic analysis of such proteins, and the manuscript was successively rejected by 14 journals before it was finally accepted by Proteins.
T42 9454-9662 Sentence denotes 12 However, time showed that the concept of protein intrinsic disorder was a useful invention and could be considered as a universal lock-pick that helps in solving many of the seemingly unsolvable Figure 1 .
T43 9663-9665 Sentence denotes A:
T44 9666-9767 Sentence denotes Protein structure-function paradigm is the "Big Bang" created universe of the modern protein science.
T45 9768-9873 Sentence denotes Some major directions based on the consideration of protein function as lock-and-key mechanism are shown.
T46 9874-9892 Sentence denotes Modified from Ref.
T47 9893-9896 Sentence denotes 1 .
T48 9897-9899 Sentence denotes B:
T49 9900-10041 Sentence denotes Paradigm shift caused by the introduction of the protein intrinsic disorder concept opened a wide array of new directions in protein science.
T50 10042-10363 Sentence denotes In essence, introduction of this concept can be considered as a scientific revolution that, according to Kuhn, 5 "occurs when scientists encounter anomalies that cannot be explained by the universally accepted paradigm within which scientific progress has thereto been made" (http://en.wikipedia.org/wiki/Paradigm_shift).
T51 10364-10400 Sentence denotes Uversky problems in protein science.
T52 10401-10572 Sentence denotes One could say that this idea gave a new boost to the development of the protein science, generating a wide array of principally novel research directions [see Fig. 1(B) ].
T53 10573-11483 Sentence denotes The goals of this review are: (i) to outline some recent advances in the field of IDPs/IDPRs; (ii) to illustrate the usefulness of intrinsic disorder for protein function; (iii) to show that intrinsic disorder can affect different levels of protein structural organization; (iv) to indicate intimate involvement of intrinsic disorder in pathogenesis of various maladies; (v) to emphasize the exceptional structural heterogeneity of IDPs/IDPRs and to show that IDPs are definitely much more structurally complex than random coillike polypeptides; (vi) to accentuate that although this structural heterogeneity is very important for protein functionality, it represents a crucial hurdle for structural characterization of IDPs; (vii) to stress that new experimental and computational approaches and new theories and models are crucially needed for future progression of this field and protein science in general.
T54 11484-11723 Sentence denotes These and other points highlight the current state of the field, where further advances in understanding of the "biology" of IDPs still waits for "physics," with "physics" now being new theories, instrumentation, and analytical approaches.
T55 11724-11928 Sentence denotes Identification of IDPs as unique entities belonging to a new protein tribe is directly related to the recognition that their amino acid sequences are dramatically different from those of ordered proteins.
T56 11929-12314 Sentence denotes 10, 12, 13, [30] [31] [32] For example, it has been pointed out that the low content of hydrophobic residues combined with the high load of charged residues that often gives rise to high net charge of a polypeptide chain represents a characteristic feature of some IDPs (so called extended IDPs or natively unfolded proteins with coil-like or close to coil-like structures, see below).
T57 12315-12620 Sentence denotes 12 Therefore, compact proteins and extended IDPs can be distinguished based only on their net charges and hydropathies using a simple charge-hydropathy (CH) plot, where the IDPs are specifically localized within a specific region of CH phase space and are reliably separated from compact ordered proteins.
T58 12621-12968 Sentence denotes 12 More detailed comparison of amino acid sequences revealed that in comparison with ordered proteins and domains, the IDPs/IDPRs are significantly depleted in order-promoting amino acids (Trp, Tyr, Phe, Ile, Leu, Val, Cys, and Asn), 10, 33 being instead enriched in disorder-promoting residues, such as Ala, Arg, Gly, Gln, Ser, Glu, Lys, and Pro.
T59 12969-13106 Sentence denotes 13, 31, 32, 34, 35 Difference between ordered and disordered proteins goes far beyond these differences in their amino acid compositions.
T60 13107-13568 Sentence denotes In fact, based on the comparison of the 265 amino acid physico-chemical property-based scales (such as hydropathy, net charge, flexibility index, helix propensities, strand propensities, aromaticity, etc.) 34 and more than 6000 composition-based attributes (e.g., all possible combinations having one to four amino acids in the group) 36 it has been concluded that ordered and disordered proteins and regions can be discriminated using many of these attributes.
T61 13569-13862 Sentence denotes 13 Based on the analysis of 517 amino acid scales, a novel amino acid scale, Top-IDP (Trp, Phe, Tyr, Ile Met, Leu, Val, Asn, Cys, Thr, Ala, Gly, Arg, Asp, His, Gln, Lys, Ser, Glu, and Pro), was built to provide ranking for the tendencies of the amino acid residue to promote order or disorder.
T62 13863-14317 Sentence denotes 30 The fact that the sequences of ordered and disordered proteins and regions are noticeably different suggested that IDPs clearly constitute a separate entity inside the protein kingdom, that these proteins can be reliably predicted using various computational tools, [37] [38] [39] [40] [41] [42] and structurally, that IDPs should be very different from ordered globular proteins since peculiarities of amino acid sequence determine protein structure.
T63 14318-14344 Sentence denotes Natural Abundance of IDPs:
T64 14345-14480 Sentence denotes Touching the Tip of the Iceberg Initial systematic analyses revealed that intrinsic disorder in proteins is a rather common phenomenon.
T65 14481-14652 Sentence denotes In fact, as of 2002, the list of experimentally validated natively unfolded proteins with chain length greater than 50 amino acid residues contained more than 100 entries.
T66 14653-14921 Sentence denotes 1 It was also pointed out that this list would probably be doubled if shorter polypeptides 30-50 residues long were included, 1 and that these 100 experimentally validated natively unfolded have at least 250 homologues, which are also expected to be natively unfolded.
T67 14922-15169 Sentence denotes 1, 12 It happened that these "large" numbers (which actually were large enough to make a crucial point that biologically active structure-less proteins represent the new rule and not mere rare exceptions) constitute just a small tip of an iceberg.
T68 15170-15501 Sentence denotes In fact, using computational tools developed for sequence-based intrinsic disorder prediction the wide spread of IDPs and hybrid proteins containing IDPRs was convincingly shown. [43] [44] [45] [46] For example, more than 15,000 out of 91,000 proteins in the thencurrent Swiss Protein database were identified as having long IDPRs.
T69 15502-15798 Sentence denotes 47 The published in 2000 analysis of 31 whole genomes that span the 3 kingdoms of life revealed that many proteins contained segments predicted to have 40 consecutive disordered residues and that the eukaryotes exhibited more disorder by these measures than either the prokaryotes or the archaea.
T70 15799-16043 Sentence denotes 43 Other studies on the abundance of intrinsic disorder in various evolutionary distant species supported these findings and consistently showed that the eukaryotic proteomes had higher fraction of intrinsic disorder than prokaryotic proteomes.
T71 16044-16440 Sentence denotes 44, [48] [49] [50] [51] [52] This conclusion is in line with the results of a comprehensive bioinformatics investigation of the disorder distribution in almost 3500 proteomes from viruses and three kingdoms of life, results of which are shown in Figure 2 as the correlation between the intrinsic disorder content and proteome size for 3484 species from viruses, archaea, bacteria, and eukaryotes.
T72 16441-16762 Sentence denotes 46 Surprisingly, Figure 2 shows that there is a well-defined gap between the prokaryotes and eukaryotes in the plot of fraction of disordered residues on proteome size, where almost all eukaryotes have 32% or more disordered residues, whereas the majority of the prokaryotic species have 27% or fewer disordered residues.
T73 16763-17054 Sentence denotes 46 Therefore, it looks like the fraction of 30% disordered residues serves as a boundary between the prokaryotes and eukaryotes and reflects the existence of a complex step-wise correlation between the increase in the organism complexity and the increase in the amount of intrinsic disorder.
T74 17055-17268 Sentence denotes A gap in the plot of fraction of disordered residues on proteome size parallels a morphological gap between prokaryotic and eukaryotic cells which contain many complex innovations that seemingly arose all at once.
T75 17269-17530 Sentence denotes In other words, this sharp jump in the disorder content in proteomes associated with the transition from prokaryotic to eukaryotic cells suggests that the increase in the morphological complexity of the cell paralleled the increased usage of intrinsic disorder.
T76 17531-17940 Sentence denotes 46 The variability of disorder content in unicellular eukaryotes and rather weak correlation between disorder status and organism complexity (measured as the number of different cell types) is likely related to the wide variability of their habitats, with especially high levels of disorder being found in parasitic host-changing protozoa, the environment of which changes dramatically during their life-span.
T77 17941-18227 Sentence denotes 53 The further support for this hypothesis came from the fact that the intrinsic disorder content in multicellular eukaryotes (which are characterized by more stable and less variable environment of individual cells) was noticeably less variable than that in the unicellular eukaryotes.
T78 18228-18230 Sentence denotes 46
T79 18232-18512 Sentence denotes It was pointed out that IDPs possess noticeable amino acid biases, and many IDPs/IDPRs are characterized by sequence redundancy and low sequence complexity, containing long stretches of various repeats and being completely devoid of some (often many) types of amino acid residues.
T80 18513-18635 Sentence denotes These observations seem to indicate that the sequence space of IDPs/IDPRs should be simpler than that of ordered proteins.
T81 18636-18854 Sentence denotes However, the reality is more complex than conventional wisdom might suggest, and the sequence space attainable by simple IDPs/IDPRs is more diversified than that of the structurally more sophisticated ordered proteins.
T82 18855-19009 Sentence denotes In fact, a 100 residue-long protein in which any of the normally occurring 20 amino acids can be found has a sequence space of 20 100 (10 130 ) sequences.
T83 19010-19092 Sentence denotes 54 Obviously, not all random amino acid sequences can fold into unique structures.
T84 19093-19273 Sentence denotes In other words, a sequence space of a foldable protein (or "foldable" sequence space) is noticeably smaller than the entire sequence space available for a random polypeptide chain.
T85 19274-19544 Sentence denotes For decades, the actual size of "foldable" sequence space continues to be unsolved mystery despite a large body of theoretical, biochemical, and computational work that aims to unravel the relationship between a protein's primary sequence and its resulting 3D structure.
T86 19545-20132 Sentence denotes 55 However, the actual number of different amino acid residues in a given foldable sequence can be dramatically reduced, 54 since all twenty residues are not necessary for protein folding and the actual physicochemical identity of most of the amino acids in a protein is irrelevant. [56] [57] [58] [59] [60] [61] [62] [63] In other words, folding alphabet can be noticeably reduced, 55, 64 and amino acids can be clustered based on some shared features such as homolog substitution frequency, 65 local structural environments, 66 or peculiarities of the tertiary structural environments.
T87 20133-20221 Sentence denotes 67 This simplified folding code further reduces the available "foldable" sequence space.
T88 20222-20235 Sentence denotes 68 Figure 2 .
T89 20236-20370 Sentence denotes Correlation between the intrinsic disorder content and proteome size for 3484 species from viruses, archaea, bacteria, and eukaryotes.
T90 20371-20403 Sentence denotes Each symbol indicates a species.
T91 20404-20705 Sentence denotes There are totally six groups of species: viruses expressing one polyprotein precursor (small red circles filled with blue), other viruses (small red circles), bacteria (small green circles), archaea (blue circles), unicellular eukaryotes (brown squares), and multicellular eukaryotes (pink triangles).
T92 20706-20840 Sentence denotes Each viral polyprotein was analyzed as a single polypeptide chain, without parsing it into the individual proteins before predictions.
T93 20841-20942 Sentence denotes The proteome size is the number of proteins in the proteome of that species and is shown in log base.
T94 20943-21106 Sentence denotes The average fraction of disordered residues is calculated by averaging the fraction of disordered residues of each sequence over the all sequences of that species.
T95 21107-21155 Sentence denotes Disorder prediction is evaluated by PONDR-VSL2B.
T96 21156-21174 Sentence denotes Modified from Ref.
T97 21175-21178 Sentence denotes 46.
T98 21179-21455 Sentence denotes Simply by virtue of their existence, IDPs/IDPRs add a new level of complexity to the sequence-structure relationship, dividing the population of protein sequences into two categories, sequences that yield natively ordered, and sequences that code natively disordered proteins.
T99 21456-21570 Sentence denotes 55 IDPs/IDPRs cannot fold spontaneously and some of them require specific partners to gain more ordered structure.
T100 21571-21737 Sentence denotes Therefore, they do not possess an entire folding code that defines the ability of foldable proteins to fold spontaneously into a unique biologically active structure.
T101 21738-21845 Sentence denotes The missing portion of the IDP folding code (or at least part of it) is supplemented by binding partner(s).
T102 21846-22064 Sentence denotes This defines a principal difference between structured proteins and IDPs/IDPRs: foldable proteins fold first and then bind to their partners whereas IDPs/IDPRs remain disordered until they interact with their partners.
T103 22065-22269 Sentence denotes 68, 69 Furthermore, many IDPs/IDPRs do not require folding to be functional, 1, 4, 13, 14, [70] [71] [72] [73] and some of them form fuzzy complexes, in which they preserve significant amount of disorder.
T104 22270-22563 Sentence denotes 74, 75 All this suggests that the sequence space of IDPs (at least those which either do not fold at all or do not completely fold at binding) is noticeably greater than the "foldable" sequence space due to the removal of restrictions posed by the need to gain ordered structure spontaneously.
T105 22564-22822 Sentence denotes 68 This represents one of the conundrums of intrinsic disorder, where the apparent sequence redundancy and simplicity are combined with the lack of structural restrains leading to the increase in the dimensions and complexity of the available sequence space.
T106 22823-22920 Sentence denotes Also, the existence of a noticeable sequencestructure heterogeneity of IDPs should be emphasized.
T107 22921-23176 Sentence denotes 68 Since the unique 3D-structure of an ordered single-domain protein is defined by the interplay between all (or almost all) of its residues, one could expect that the structure-coding potential is homogeneously distributed within its amino acid sequence.
T108 23177-23351 Sentence denotes On the other hand, a sequence of an IDP/IDPR contains multiple, relatively short functional elements and therefore represents a very complex structural and functional mosaic.
T109 23352-23504 Sentence denotes 68 This important feature defines the known ability of an IDP/IDPR to interact, regulate, and be controlled by multiple structurally unrelated partners.
T110 23505-23768 Sentence denotes 76 Such functional "anatomy" of IDPs/IDPRs is determined by the extremely high level of their sequence heterogeneity, which is further increased due to the ability of a single IDPR to bind to multiple partners gaining very different structures in the bound state.
T111 23769-23771 Sentence denotes 77
T112 23772-24010 Sentence denotes One of the crucial consequences of an extended sequence space and non-homogeneous distribution of foldability (or the structure-coding potential) within amino acid sequences of IDPs and IDPRs is their astonishing structural heterogeneity.
T113 24011-24187 Sentence denotes In fact, a typical IDP/IDPR contains a multitude of elements coding for potentially foldable, partially foldable, differently foldable, or not foldable at all protein segments.
T114 24188-24284 Sentence denotes 68 As a result, different parts of a molecule are ordered (or disordered) to a different degree.
T115 24285-24428 Sentence denotes This distribution is constantly changing in time where a given segment of a protein molecule has different structures at different time points.
T116 24429-24547 Sentence denotes As a result, at any given moment, an IDP has a structure which is different from a structure viewed at another moment.
T117 24548-24768 Sentence denotes 68 Another level of structural heterogeneity is determined by the fact that many proteins are hybrids of ordered and disordered domains and regions, and this mosaic structural organization is crucial for their functions.
T118 24769-24885 Sentence denotes 16 Also, even when they do not possess ordered domains, IDPs are known to have various levels and depth of disorder.
T119 24886-25013 Sentence denotes 78 Over a few past years, an understanding of the available conformational space of IDPs/IDPRs underwent significant evolution.
T120 25014-25125 Sentence denotes In fact, for a long time, IDPs were considered mostly "unstructured" or "natively unfolded" polypeptide chains.
T121 25126-25321 Sentence denotes This was mostly due to the fact that the majority of IDPs analyzed at early stages of the field contained very little ordered structure, that is, they were really mostly unstructured or unfolded.
T122 25322-25678 Sentence denotes Finding and characterization of such "structure-less" proteins was important to build up a strong case to counter-point the dominant view represented by the classical sequence-to-structureto-function paradigm, especially since such fully unstructured, yet functional proteins clearly represented the other extreme of the protein structurefunction spectrum.
T123 25679-25808 Sentence denotes 16 The top half of the Figure 3 illustrates this situation by opposing rock-like ordered proteins and cooked spaghetti-like IDPs.
T124 25809-25985 Sentence denotes However, already in some early studies, it was indicated that IDPs/IDRs could be crudely grouped into two major structural classes, proteins with compact and extended disorder.
T125 25986-26312 Sentence denotes 1, 4, 12, 13, 73 Based on these observations, the protein functionality was ascribed to at least three major protein conformational states, ordered, molten globular, and coil-like, 13, 79 indicating that functional IDPs can be less or more compact and possess smaller or larger amount of flexible secondary/tertiary structure.
T126 26313-26559 Sentence denotes 1, 4, 12, 13, 73, 79 Roughly at the same time, it was emphasized that the extended IDPs (known as natively unfolded proteins) do not represent a uniform entity but contain two broad structural classes, native coils and native pre-molten globules.
T127 26560-26829 Sentence denotes 1 Currently available data suggest that intrinsic disorder possesses multiple flavors, can have multiple faces, and can affect different levels of protein structural organization, where whole proteins, or various protein regions can be disordered to a different degree.
T128 26830-27096 Sentence denotes 68 This new view of structural space of functional proteins can be visualized to form a continuous spectrum of differently disordered conformations extending from fully ordered to completely structure-less proteins, with everything in between (Fig. 3, bottom half) .
T129 27097-27210 Sentence denotes Here, functional proteins can be well-folded and be completely devoid of disordered regions (rock-like scenario).
T130 27211-27577 Sentence denotes Other functional proteins may contain limited number of disordered regions (a grass-on-the rock scenario), or have significant amount of disordered regions (a llama/camel hair scenario), or be molten globule-like (a greasy ball scenario), or behave as pre-molten globules (a spaghetti-and-meatballs/sausage scenario), or be mostly unstructured (a hairball scenario).
T131 27578-27743 Sentence denotes Notably, in this representation, there is no boundary between ordered proteins and IDPs, and, the structure-disorder space of a protein is considered as a continuum.
T132 27744-27874 Sentence denotes It is important to remember that even the most ordered proteins do not resemble "solid rocks" and have some degree of flexibility.
T133 27875-28049 Sentence denotes In fact, a protein molecule is an inherently flexible entity and the presence of this flexibility (even for the most ordered proteins) is crucial for its biological activity.
T134 28050-28212 Sentence denotes 80 Also, another important point to remember is that due to their heteropolymeric nature, proteins are never random coils and always have some residual structure.
T135 28213-28215 Sentence denotes 68
T136 28217-28594 Sentence denotes Protein biophysicists/biochemists working on different aspects of ordered proteins (e.g., analyzing their structural properties, functions, folding, etc.) would find biophysical properties of functional IDPs/IDPRs to be rather unusual since these highly dynamic proteins do not follow the well-accepted wisdom that a protein has to be well-folded to be biologically functional.
T137 28595-28742 Sentence denotes However, the unusualness is a subjective feature, and from the viewpoint of polymer physics the extended IDPs/IDPRs possess the expected behavior .
T138 28743-28782 Sentence denotes Structural heterogeneity of IDPs/IDPRs.
T139 28783-28792 Sentence denotes Top half:
T140 28793-28936 Sentence denotes Bi-colored view of functional proteins which are considered to be either ordered (folded, blue) or completely structure-less (disordered, red).
T141 28937-29070 Sentence denotes Ordered proteins are taken as rigid rocks, whereas IDPs are considered as completely structure-less entities, kind of cooked noodles.
T142 29071-29083 Sentence denotes Bottom half:
T143 29084-29258 Sentence denotes A continuous emission spectrum representing the fact that functional proteins can extend from fully ordered to completely structure-less proteins, with everything in between.
T144 29259-29454 Sentence denotes Intrinsic disorder can have multiple faces, can affect different levels of protein structural organization, and whole proteins, or various protein regions can be disordered to a different degree.
T145 29455-29918 Sentence denotes Some illustrative examples includes ordered proteins that are completely devoid of disordered regions (rock-like type), ordered proteins with limited number of disordered regions (grass-on-the rock type), ordered proteins with significant amount of disordered regions (lhama/camel hair type), molten globule-like collapsed IDPs (greasy ball type), pre-molten globule-like extended IDPs (spaghetti-and-sausage type), and unstructured extended IDPs (hairball type).
T146 29919-30198 Sentence denotes of flexible and charged polymers, whereas the behavior of an ordered protein is rather unexpected (i.e., due to the existence of the native ensemble that for well-folded, ordered proteins can be approximated as a harmonic well around a unique, welldefined equilibrium structure).
T147 30199-30385 Sentence denotes Therefore, one definitely should keep in mind that the "unusual" biophysics of extended IDPs/IDPRs has its roots in the usual polymer physics of highly charged and flexible polypeptides.
T148 30386-30565 Sentence denotes Each protein is believed to be a unique entity that has quite unique primary sequence which governs its 3D structure (or lack thereof) and ensures specific biological function(s).
T149 30566-30681 Sentence denotes Therefore, understanding the effect of sequence variance on the biological performance presents a challenging task.
T150 30682-30851 Sentence denotes However, natural polypeptides have originated as random copolymers of amino acids, which were adjusted or "selected" over evolution based on their functional capacities.
T151 30852-31166 Sentence denotes 56, 81 Despite their differences in primary amino acid sequences, protein molecules in a number of conformational states behave as polymer homologues, suggesting that the volume interactions can be considered as a major driving force responsible for the formation of equilibrium structures or structural ensembles.
T152 31167-31470 Sentence denotes 82 For example, ordered globular proteins and molten globules (both as folding intermediates of globular proteins or as examples of collapsed IDPs) exhibit key properties of polymer globules, where the fluctuations of the molecular density are expected to be much less than the molecular density itself.
T153 31471-31694 Sentence denotes Extended IDPs (both intrinsic coils and intrinsic pre-molten globules) and ordered proteins in the pre-molten globule intermediate state possess properties of squeezed coils, since water is a poor solvent for a polypeptide.
T154 31695-31936 Sentence denotes In fact, even high concentrations of strong denaturants (e.g., urea and GdmCl) are very likely to be bad solvents for protein chains, resulting in the preservation of extensive residual structure even under these harsh denaturing conditions.
T155 31937-32288 Sentence denotes 82 Based on these and related observations, and taking into account the fact that many IDPs/IDPRs are characterized by significant amino acid composition biases, the overall polymeric behavior of these proteins and regions can be mimicked reasonably well by the behavior of low-complexity polypeptides (e.g., homopolypeptide and block copolypeptides).
T156 32289-32784 Sentence denotes Following these ideas, it was shown that water is a poor solvent for polypeptide backbone alone and for the IDPs containing long tracts of polar amino acid residues since polar homo-polypeptides without hydrophobic groups (e.g., polyglutamine or glycineserine block copolypeptides) were shown to prefer collapsed ensembles in aqueous media. [83] [84] [85] [86] [87] [88] Furthermore, even polyglycine was shown to have a tendency to form heterogeneous ensembles of collapsed structures in water.
T157 32785-33151 Sentence denotes 88 A systematic analysis of the conformational behavior of protamines, arginine-rich IDPs involved in the condensation of chromatin during spermatogenesis, and protamine-like peptides revealed that there is a charge-driven coil-to-globule transition in these highly charged polypeptides, where the net charge per residue serves as the discriminating order parameter.
T158 33152-33439 Sentence denotes 89 Overall, the increase in the hydrodynamic dimensions of a polypeptide chain with increase in its net charge per residue can be attributed to the increase in the intramolecular electrostatic repulsions between similarly charged sidechains and the favorable solvation of these moieties.
T159 33440-33553 Sentence denotes 89 Based on these premises, at least three different classes of globule-forming polar/charged IDPs were proposed.
T160 33554-33691 Sentence denotes The first class is comprised by polar tracts which collapse due to water being a poor solvent for a backbone and non-charged side chains.
T161 33692-33879 Sentence denotes The second class is represented by weak polyelectrolytes and weak polyampholytes, which have low per residue net charge and low fractions of positively and/or negatively charged residues.
T162 33880-34121 Sentence denotes These IDPs/ IDPRs form collapsed structures since the driving force responsible for the collapse is not overcome by the intramolecular electrostatic repulsion between the charged side-chains and by their favorable free energies of solvation.
T163 34122-34312 Sentence denotes Furthermore, if such IDPs/ IDPRs possess polyampholytic nature, their globular state could be additionally stabilized by electrostatic interactions between the oppositely charged sidechains.
T164 34313-34497 Sentence denotes Finally, IDPs/IDPRs from the third class are strong polyampholytes characterized by high fractions of positively and/or negatively charged residues but have low per residue net charge.
T165 34498-34669 Sentence denotes Such intrinsically disordered protein can form collapsed structures stabilized mostly by multiple electrostatic interactions between solvated side-chains of opposite sign.
T166 34670-34985 Sentence denotes 89 The extended IDPs/IDPRs were used as a model system for the analysis of the effect of electrostatic interactions on conformational properties of unfolded proteins, and for testing the quantitative descriptions and predictions of polymer theory related to the influence of charged amino acids on chain dimensions.
T167 34986-35436 Sentence denotes 90 For example, based on the analysis of the conformational equilibrium of coarse-grained polypeptides as a function of sequence hydrophobicity, charge, and length it has been concluded that the variations in sequence hydrophobicity and charge define a coil-to-globule transition comparable to that seeing in the empirical CH-plot, 12, 91 suggesting that a minimal, polymer physics-based model can capture the elements of global protein conformation.
T168 35437-35628 Sentence denotes 92 IDPs/IDPRs with very high net charges are expected to be more extended and behave more similar to random coils (i.e., similar to conformations adopted by proteins in the denaturant GdmCl).
T169 35629-35827 Sentence denotes The analysis of the GdmCl-induced expansion of the unfolded states suggested that protein charge density plays a crucial role in defining the hydrodynamic behavior of the unfolded polypeptide chain.
T170 35828-35956 Sentence denotes 90 Here, highly charged proteins can exhibit a prominent expansion at low ionic strength that correlates with their net charges.
T171 35957-36130 Sentence denotes 90 It has been also hypothesized that the pronounced effect of charges on the dimensions of unfolded proteins might have important implications for their cellular functions.
T172 36131-36542 Sentence denotes 90 Similarly, a comprehensive analysis of the hydrodynamic dimensions of FG-nucleoporins containing large IDPRs with multiple phenylalanineglycine repeats (FG-domains) revealed that under the physiologic conditions in vitro these domains adopt distinct categories of disordered structures, such as molten globule, pre-molten globule, relaxedcoil, extended-coil (as in urea), or very extended-coil (as in GdmCl).
T173 36543-36775 Sentence denotes 93 The category of intrinsically disordered structure in a given FG-domain was related to its amino acid composition, namely to the content of charged residues, where more charged FG-domains possessed larger hydrodynamic dimensions.
T174 36776-36943 Sentence denotes 94 Furthermore, FG-nucleporins with higher charge density were shown to be more dynamic than the collapsed-coil FG-domains, being also prone to repel other FG-domains.
T175 36944-37018 Sentence denotes On the other hand, the collapsedcoil FG-domains were prone to oligomerize.
T176 37019-37304 Sentence denotes These observations suggested that different types of FGdomains with different aggregation propensities provide molecular basis for two different gating mechanisms operating at the nuclear pore complex at distinct locations; one acting as a hydrogel, and the other as an entropic brush.
T177 37305-37495 Sentence denotes 94 Therefore, the abundance and peculiarities of the charged residues distribution within the protein sequences might determine physical and biological properties of extended IDPs and IDPRs.
T178 37496-37636 Sentence denotes Also, simple polymer physics-based reasoning can give reasonably well-justified explanation of the conformational behavior of extended IDPs.
T179 37637-38028 Sentence denotes In general, the conformational behavior of IDPs is characterized by the low cooperativity (or the complete lack thereof) of the denaturant-induced unfolding, lack of the measurable excess heat absorption peak(s) characteristic for the melting of ordered proteins, "turned out" response to heat and changes in pH, and the ability to gain structure in the presence of various binding partners.
T180 38029-38227 Sentence denotes 95 The analysis of the temperature effects on structural properties of several extended IDPs revealed that native coils and native pre-molten globules partially fold as the temperature is increased.
T181 38228-38501 Sentence denotes 1, 73, [95] [96] [97] [98] These heating-induced structural changes in extended IDPs were attributed to the increased strength of the hydrophobic interaction at higher temperatures, leading to a stronger hydrophobic attraction, which is the major driving force for folding.
T182 38502-38884 Sentence denotes Similarly, extended IDPs/IDPRs are characterized by the "turned out" response to changes in pH, 96,99-102 where a decrease (or increase) in pH induces their partial folding due to the minimization of their high net charges viewed at neutral pH, thereby decreasing charge/charge intramolecular repulsion and permitting hydrophobicdriven collapse to the partially folded conformation.
T183 38885-39099 Sentence denotes 95 Every Disordered Protein is Disordered in its Own Way Data accumulated so far indicate that intrinsic disorder exists at multiple structural levels and might differently affect different regions/domains of IDPs.
T184 39100-39276 Sentence denotes This defines noted structural complexity and heterogeneity of IDPs/IDPRs which are further enhanced by the way different proteins/protein regions respond to their environments.
T185 39277-39554 Sentence denotes Furthermore, since intrinsic disorder is crucial for many biological functions and therefore must prevail in different environments, the amino acid sequences and compositions of IDPs and IDPRs are specifically shaped by the peculiarities of their global and local environments.
T186 39555-39761 Sentence denotes All this makes the protein intrinsic disorder phenomenon to be so broad that one can even assume that every disordered protein (or at least every family of disordered proteins) is disordered in its own way.
T187 39762-39994 Sentence denotes This hypothesis has far-reaching consequences since it implies that a general disorder predictor has limited accuracy and cannot predict with equally high accuracy disorder status of all protein sequences due to their heterogeneity.
T188 39995-40129 Sentence denotes It also implies that some environmental factors definitely should be taken into account when assessing intrinsic disorder in proteins.
T189 40130-40219 Sentence denotes Several examples are presented below to support the overall validity of these statements.
T190 40220-40635 Sentence denotes The first example is given by transmembrane (TM) proteins, in which disorder is widely observed (e.g., 40% of human integral plasma proteins were predicted to contain long IDPRs). [103] [104] [105] [106] [107] Furthermore, disorder is unevenly distributed between the cytoplasmic and the external surfaces of these proteins, with cytoplasmic domains being up to threefold more disordered than extracellular domains.
T191 40636-40924 Sentence denotes 105 Although these analyses gave interesting hints on the abundance of disorder in TM proteins, the obvious weakness of such evaluations is in the fact that they were performed using the disorder predictors developed from structured and disordered regions found in water-soluble proteins.
T192 40925-41088 Sentence denotes 108 However, the major physico-chemical properties of water-soluble and integral membrane proteins are very different due to the differences in their environments.
T193 41089-41363 Sentence denotes For example, similar to typical water soluble proteins, the TM regions of membrane proteins are often highly structured, containing a-helices 109 or b-structure, 110 which are especially likely to occur due to the low dielectric constant values within the membrane bilayers.
T194 41364-41821 Sentence denotes 111, 112 On the other hand, the exterior regions of TM proteins are much more apolar than the exteriors of water-soluble proteins. [113] [114] [115] Therefore, the peculiarities of the membrane environment, with its highly nonpolar nature originating either from lipids or from protein interiors, are especially unfavorable for intrinsic disorder, since propensity for intrinsic disorder is typically encoded in a high content of polar and charged residues.
T195 41822-41975 Sentence denotes Therefore, the IDPRs found in integral membrane proteins would be expected to be generally localized within the regions external to the membrane bilayer.
T196 41976-42172 Sentence denotes 108 Also, the distinctive environment of the membrane bilayer imposes constraints on the amino acid composition of integral membrane proteins, even on the regions external to the membrane bilayer.
T197 42173-42359 Sentence denotes 116, 117 Comprehensive bioinformatics analysis revealed that integral membrane proteins commonly possess IDPRs defined as regions of missing electron density in their crystal structures.
T198 42360-42646 Sentence denotes 108 Comparison of the IDPRs found in the a-helical and the b-barrel bundle integral membrane proteins with the IDPRs viewed in typical water-soluble proteins revealed the existence of statistically distinct amino acid compositional biases characteristic for these three protein classes.
T199 42647-42876 Sentence denotes Therefore, the use of specific amino acid signatures of IDPRs found in TM helical bundles and b-barrels can potentially lead to significantly more accurate disorder predictions for these two classes of integral membrane proteins.
T200 42877-42880 Sentence denotes 108
T201 42881-43016 Sentence denotes Another illustrative example of the specific disorderrelated and environment-dependent sequence features is given by archaeal proteins.
T202 43017-43223 Sentence denotes 46, 51 Based on the levels of predicted disordered residues, archaeal proteins can be grouped into three classes, with ranges of the disordered residue content of 12-21%, 21%-32%, and 32%-38% (see Fig. 2 ).
T203 43224-43315 Sentence denotes The archaeal proteomes with the highest disorder contents are halophiles and methanophiles.
T204 43316-43633 Sentence denotes 46, 51 Similar to TM proteins, the estimation of intrinsic disorder in the extremophilic proteins of the microorganisms surviving under hypersaline conditions using predictors developed for the "normal" non-halophilic proteins existing under the normal physiological conditions of 100-150 mM NaCl may not be accurate.
T205 43634-43980 Sentence denotes 46 In fact, one of the strategies used by the halophilic archaea, which are salt-loving extremophilic organisms that grow optimally at high salt concentrations, to maintain proper osmotic pressure in their cytoplasm is a so-called "salt-in" strategy that involves accumulation of molar concentrations of potassium and chloride in their cytosoles.
T206 43981-44114 Sentence denotes 118 This strategy requires extensive adaptation of the intracellular proteins to the presence of near-saturating salt concentrations.
T207 44115-44487 Sentence denotes The proteomes of such "salt-in" organisms are highly acidic, 46, 51 and their proteins are characterized by remarkable instability at conditions of low salt concentrations and by maintaining soluble and active conformations in hypersaline conditions that are generally detrimental to the non-halophilic proteins. [118] [119] [120] [121] [122] [123] [124] [125] [126] [127]
T208 44489-44639 Sentence denotes Finally, peculiarities of disorder distributions in viral proteins can be used to further support the importance of considering environmental factors.
T209 44640-44969 Sentence denotes 46, 51 Here, the comprehensive analysis of intrinsic disorder in various completed proteomes revealed that the viral proteomes have the largest variation of disorder content, which ranges from 7.3% disordered residues in the human coronavirus NL63 to 77.3% disordered residues in the Avian carcinoma virus proteome (see Fig. 2 ).
T210 44970-45446 Sentence denotes 46 The high predicted intrinsic disorder content in viruses has multiple functional implications, where some IDPRs are used in the functioning of viral proteins and help viruses to highjack various pathways of the host cells, others likely have evolved to help viruses accommodate to their hostile habitats, and still others evolved to help viruses in managing their economic usage of genetic material via alternative splicing, overlapping genes, and anti-sense transcription.
T211 45447-45816 Sentence denotes 128 These findings are in agreement with another study revealing that in comparison with archaea and bacteria, viral and bacteriophagic proteins were significantly more enriched in polar residues and depleted in hydrophobic residues and were close to eukaryotic proteins in terms of their amino acid compositions and the reduced content of the order-promoting residues.
T212 45817-45820 Sentence denotes 129
T213 45822-45848 Sentence denotes Functional protein clouds:
T214 45849-45910 Sentence denotes Major functional advantages of being intrinsically disordered
T215 45911-46080 Sentence denotes The high natural abundance of IDPds/IDPRs and their specific structural features indicate that these proteins and regions might carry out important biological functions.
T216 46081-46527 Sentence denotes This hypothesis has been confirmed by several comprehensive studies, 1, [11] [12] [13] [14] [71] [72] [73] 78, [130] [131] [132] [133] [134] which revealed that these structure-less members of the protein kingdom are abundantly involved in numerous biological processes, where they are frequently found to play different roles in regulation of the functions of their binding partners and in promotion of the assembly of supra-molecular complexes.
T217 46528-46883 Sentence denotes 1, 4, [11] [12] [13] [14] [15] 31, [70] [71] [72] [73] [76] [77] [78] [79] 131, 132, [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] The conformational plasticity of IDPs/IDPRs provides them with a wide spectrum of exceptional functional advantages over the functional modes of ordered proteins and domains.
T218 46884-46982 Sentence denotes 4, 10, 11, 13, 32, 71, 72, 77, 78, 131, 132, 134, 141, 142, 150, 151 Some of these advantages are:
T219 46983-49415 Sentence denotes 1 Increased speed of interaction due to greater capture radius and the ability to spatially search through interaction space; 2 Increased interaction (surface) area per residue; 3 Strengthened encounter complex allows for less stringent spatial orientation requirements; 4 Efficient regulation via rapid degradation; 5 The ability to be involved in one-to-many binding, where a single disordered region binds to several structurally diverse partners; 6 The ability to be involved in many-to-one binding, where many distinct (structured) proteins may bind a single disordered region; 7 The ability to overcome steric restrictions, enabling larger interaction surfaces in protein-protein and protein-ligand complexes than those obtained with rigid partners; 8 The ability to fold upon binding (completely or partially); 9 The ability of some IDPs/IDPRs to form very stable intertwined complexes; 10 The ability of some IDPs/IDPRs to stay substantially disordered in bound state; 11 Binding fuzziness, where different binding mechanisms (e.g., via stabilizing the binding-competent secondary structure elements within the contacting region, or by establishing the longrange electrostatic interactions, or being involved in transient physical contacts with the partner, or even without any apparent ordering) can be employed to accommodate peculiarities of interaction with various partners; 12 Binding plasticity, where an IDPR folds to specific bound conformations (which can be very different) according to the template provided by binding partners; 13 High accessibility of sites targeted for posttranslational modifications (PTMs); 14 Efficient structural and functional regulation via PTMs such as phosphorylation, acetylation, lipidation, ubiquitination, sumoylation, and so forth, allowing for a simple means of modulation of their biological functions; 15 Efficient functional control via regulatory proteolytic attack sites of which are frequently associated with IDPRs; 16 Ease of regulation/redirection and production of otherwise diverse forms by alternative splicing (given the existence of multiple functions in a single disordered protein, and given that each functional element is typically relatively short, alternative splicing could readily generate a set of protein isoforms with a highly diverse set of regulatory elements 152 ); 17 The possibility of overlapping binding sites due to extended linear conformation;
T1 49416-49571 Sentence denotes 18 Decoupled binding affinity and specificity, where, due to the induced folding, IDP/IDPR can be involved in the formation of specific but weak complexes.
T2 49572-49772 Sentence denotes In other words, IDP/IDPR might possess high specificity for given partners combined with high k on and k off rates that enable rapid association with the partner without an excessive binding strength.
T3 49773-50100 Sentence denotes This combination of high specificity with low affinity defines the broad utilization of intrinsic disorder in regulatory interactions where turning a signal off is as important as turning it on; 19 Diverse evolutionary rates with some ID proteins being highly conserved and other ID proteins possessing high evolutionary rates.
T4 50101-50587 Sentence denotes The latter ones can evolve into sophisticated and complex interaction centers (scaffolds) that can be easily tailored to the needs of divergent organisms; 20 Flexibility that allows masking (or not) of interaction sites or that allows interaction between bound partners; 21 The ability to be involved in the cascade interactions, where IDP binding to the first partner induces partial folding generating a new binding site suitable for interaction with the second partner, and so forth.
T5 50588-50733 Sentence denotes Many disorderrelated functions (e.g., signaling, control, regulation, and recognition) are incompatible with well-defined, stable 3-D structures.
T6 50734-51012 Sentence denotes 1, [11] [12] [13] [14] 31, 73, 78, 79, 132, 134, [138] [139] [140] 142, 144, 153 Functions of many IDPs/IDPRs rely on interactions with specific binding partners, and many IDPs/IDPRs tend to undergo disorder-to-order transitions as a result of binding to their specific targets.
T7 51013-51112 Sentence denotes 12 Functionally, IDPs/IDPRs were grouped in at least six broad classes based on the mode of action.
T8 51113-51406 Sentence denotes 14,136 These broad classes included protein and RNA chaperones, entropic chains, effectors, scavengers, assemblers and display sites, 14,136 and 28 separate functions, including molecular recognition via binding to other proteins, or to nucleic acids, were assigned for IDPRs in early studies.
T9 51407-51953 Sentence denotes 71, 72 Later, a rich spectrum of biological functions associated with IDPs/IDPRs was found based on a comprehensive computational study of a correlation between the functional annotations in the Swiss-Prot database and predicted intrinsic disorder. [138] [139] [140] The approach was based on the hypothesis that if a function described by a given keyword relies on intrinsic disorder, then the keyword-associated protein would be expected to have a greater level of predicted disorder compared to the protein randomly chosen from the Swiss-Prot.
T10 51954-52436 Sentence denotes This analysis revealed that 44% and 34% of Swiss-Prot functional keywords were associated with ordered and disordered proteins, respectively, whereas 22% functional keywords yielded ambiguity in the likely function-structure associations. [138] [139] [140] Interestingly, most of the structured protein-associated key words were shown to be related to enzymatic activities, whereas the majority of the disordered protein-associated keywords were related to signaling and regulation.
T11 52437-52805 Sentence denotes These results agree well with the notion that enzymatic catalysis requires ordered structure and that effectiveness of signaling is dependent on binding reversibility, a property directly associated with the thermodynamics of disorder-to-order transition induced by binding. [138] [139] [140] Many IDPs and IDPRs undergo a disorderto-order transition upon functioning.
T12 52806-53173 Sentence denotes 11, 13, 15, 71, 72, 78, 79, [130] [131] [132] 134, [154] [155] [156] [157] When disordered regions bind to signaling partners, the free energy required to bring about the disorder to order transition takes away from the interfacial, contact free energy, with the net result that a highly specific interaction can be combined with a low net free energy of association.
T13 53174-53295 Sentence denotes 13, 155 High specificity coupled with low affinity is a useful pair of properties for a reversible signaling interaction.
T14 53296-53422 Sentence denotes Furthermore, a disordered protein can readily bind to multiple partners by changing shape to associate with different targets.
T15 53423-53742 Sentence denotes 13, 158, 159 All this clearly suggests that there is a new twopathway protein structure-function paradigm, with sequence-to-structure-to-function for enzymes and membrane transport proteins, and sequence-to-disordered ensemble-to-function for proteins and protein regions involved in signaling, regulation, and control.
T16 53743-54038 Sentence denotes 1, 13, 71, 73, 79 One of the first generalization of this concept was given by The Protein Trinity Hypothesis, which suggested that native proteins can be in one of three states, the solid-like ordered state, the liquid-like collapsed-disordered state, or the gas-like extended-disordered state.
T17 54039-54141 Sentence denotes 79 Function is then viewed to arise from any one of the three states or from transitions between them.
T18 54142-54266 Sentence denotes This model was subsequently expanded to include a fourth state (pre-molten globule) and transitions between all four states.
T19 54267-54528 Sentence denotes 1 In reality, based on the outlined above idea of the continuous spectrum of protein structures, functional proteins contain various amounts of intrinsic disorder and this continuous structural spectrum of protein defines their limitless functional variability.
T20 54529-54721 Sentence denotes Among intriguing protein functions relying on intrinsic disorder are moonlighting activities, 137 actions of hub proteins, 78, 93, 134, [160] [161] [162] [163] [164] and scaffolding functions.
T21 54722-55164 Sentence denotes 141, 165 Since all these functions illustrate the notions that the intrinsic disorder concept represents a universal skeleton key (or lock-pick) that helps unlocking seemingly unresolvable mysteries of protein science and therefore can be considered as a new Ariadne's thread that helps navigate the unusual twists of the sophisticated relationships between protein sequence, structure, and function, they are considered in some detail below.
T22 55165-55187 Sentence denotes Moonlighting proteins.
T23 55188-55263 Sentence denotes Moonlighting is the ability of a protein to fulfill more than one function.
T24 55264-55353 Sentence denotes Often, these functions are unrelated or at least are not obviously related to each other.
T25 55354-55750 Sentence denotes 137, [166] [167] [168] The capability of a protein to be involved in moonlighting or multi-tasking activities represent one of the solutions used by the Nature to increase the organism's complexity without the expansion of the genome size, where by acting differently at distinct points of metabolic networks proteins increase network complexity without increasing the actual size of the network.
T26 55751-56016 Sentence denotes 137, [166] [167] [168] Among various molecular mechanisms used by the moonlighting proteins to switch between functions are changes in cellular localization, changes in ligand binding, expression in different cell types, and variations of the oligomerization state.
T27 56017-56216 Sentence denotes 137 In addition to these mechanisms that can be explained within the frames of the traditional structure-function paradigm, consideration of the intrinsic disorder phenomenon opens new possibilities.
T28 56217-56464 Sentence denotes 137 In fact, one of the peculiar functional advantages of IDPs/IDPRs is their binding promiscuity and ability to be involved in one-to-many signaling, whereby an IDP/IDPR binds structurally different partners in a template-induced folding process.
T29 56465-56734 Sentence denotes 11, 77, 132, 169 Therefore, IDPs/IDPRs can use the same region or overlapping interaction regions/surfaces to exert distinct effects and employ the disorderbased mechanisms to switch function that relies on their capability to form different conformations upon binding.
T30 56735-57027 Sentence denotes 137 Such structural malleability of IDPs/ IDPRs defines their ability to participate in unprecedented moonlighting events, where these disordered moonlighting proteins or regions produce the opposing effects (inhibition and activation) on different partners or even the same partner molecule.
T31 57028-57045 Sentence denotes 137 Hub proteins.
T32 57046-57302 Sentence denotes Signaling interactions inside the cell can be described as specific and complex networks that can be considered as "scale-free" or "small-world" networks, which have hubs, with many connections, and ends, that have the only connection to just one neighbor.
T33 57303-57520 Sentence denotes 170, 171 Such scale-free networks combine the local clustering of connections characteristic of regular networks with occasional long-range connections between clusters, as can be expected to occur in random networks.
T34 57521-57625 Sentence denotes In other words, the distance between nodes in these scalefree networks follows a power-law distribution.
T35 57626-57852 Sentence denotes 172 Based on their spatiotemporal peculiarities protein hubs were grouped into two broad categories, "date hubs" that binds their numerous partners sequentially, and "party hubs" simultaneously interacting with their partners.
T36 57853-58047 Sentence denotes 173 Since many IDPs are known to be involved in interaction with large number of distinct partners, they clearly can be considered as hubs in the scale-free protein-protein interaction networks.
T37 58048-58494 Sentence denotes 78, 134 Based on the systematic analysis of several know hub proteins 134 followed by a series of robust bioinformatics studies, 93,160-164 it was concluded that hubs commonly use disordered regions to bind to multiple partners and that there are at least two primary mechanisms by which disorder is utilized in protein-protein interaction networks where one disordered region binds to many partners or many disordered region bind to one partner.
T38 58495-58517 Sentence denotes 134 Scaffold proteins.
T39 58518-58808 Sentence denotes Scaffold proteins constitute an important subclass of hubs that typically have a modest number of interacting partners and that are commonly found at the central parts of functional complexes, where they interact with most of their partners at the same time and therefore act as party hubs.
T40 58809-59526 Sentence denotes 160 Besides being responsible for bringing together specific proteins within a signaling pathway and providing selective spatial orientation and temporal coordination to facilitate and promote interactions among interacting proteins, some scaffolds can influence the specificity and kinetics of signaling interactions via simultaneous binding to multiple participants in a particular pathway and facilitation and/or modifying the specificity of pathway interactions, 174 other scaffold can change conformations of individual proteins and thus modulate their activities, 174 still other scaffold proteins may modulate the activation of alternative pathways by promoting interactions between various signaling proteins.
T41 59527-59678 Sentence denotes 141 Analysis of several well-characterized signaling scaffold proteins reveled that their large IDPRs are crucial for the successful scaffold function.
T42 59679-59933 Sentence denotes 141 A more global bioinformatics analysis revealed that a typical design of a scaffold protein includes a set of short globular domains (80 amino acids on average) connected by long linker regions (150 residues on average) with crucial binding functions.
T43 59934-60119 Sentence denotes 165 This gave further support to the notion that signaling scaffold proteins utilize the various features of highly flexible ID regions to obtain more functionality from less structure.
T44 60120-60162 Sentence denotes 141 Disorder and transcription regulation.
T45 60163-60283 Sentence denotes Conformational plasticity and adaptability associated with intrinsic disorder are crucial for various protein functions.
T46 60284-60546 Sentence denotes Among the proteins whose functional life is strongly disorder-dependent are transcription factors (TFs) 175, 176 and other proteins involved in transcriptional regulation, such as the mediator complex, 24,177 core and linker histones, 178 and ribosomal proteins.
T47 60547-60709 Sentence denotes 179 For example, from 83 to 94% of TFs might possess long IDPRs, with the degree of disorder in eukaryotic TFs being significantly higher than in prokaryotic TFs.
T48 60710-60908 Sentence denotes 175, 176 Also, TFs were shown to be depleted in order-promoting residues and enriched in disorder-promoting residues, and were characterized by high levels of a-molecular recognition feature (MoRF).
T49 60909-61082 Sentence denotes 175 Furthermore, disorder is unevenly distributed within the TFs, with the degree of disorder in their activation regions being much higher than that in DNA-binding domains.
T50 61083-61423 Sentence denotes However, the AT-hooks (which are DNA-binding motifs present in many proteins which binds to the (ATAA) and (TATT) repeats of DNA) and basic regions of TF DNA-binding domains are highly disordered suggesting that eukaryotes with their well-developed gene transcription machinery require transcription factor flexibility to be more efficient.
T51 61424-61898 Sentence denotes 175 A number of interesting and important roles were also ascribed to intrinsic disorder in TFs related to the regulation of heat shock response (so called heat shock factors, HSFs) 180 and in the reprogramming TFs (the Yamanaka factors, namely Sox2, Oct3/4 (Pou5f1), Klf4, and c-Myc, and the Thomson factors, namely Sox2, Oct3, Lin28, and Nanog) overexpression of which is known to generate induced pluripotent stem (iPS) cells from terminally differentiated somatic cells.
T52 61899-61951 Sentence denotes 181 Disorder in the regulation of cellular pathways.
T53 61952-62076 Sentence denotes Of special interests are the vital roles of intrinsic disorder in regulation and orchestration of various cellular pathways.
T54 62077-62365 Sentence denotes One of the illustrative examples of this regulatory role of intrinsic disorder is the canonical Wnt-pathway that involves five proteins, Axin, CKI-a, GSK-3b, APC (adenomatous polyposis coli, also known as deleted in polyposis 2.5 protein), and b-catenin (all shown to contain long IDPRs).
T55 62366-62532 Sentence denotes This pathway is known to play a number of crucial roles in the development of organism, and the malfunctions of which might lead to various diseases including cancer.
T56 62533-62707 Sentence denotes 182 The comprehensive analysis of published data revealed that IDPRs found in Wntpathway proteins orchestrate protein-protein interactions, and facilitate PTMs and signaling.
T57 62708-62919 Sentence denotes 182 Furthermore, the scaffold protein Axin and another large protein, APC, are heavily enriched in disorder and act as flexible concentrators in gathering together all other proteins involved in the Wnt-pathway.
T58 62920-63241 Sentence denotes 182 Intriguingly, the multifarious roles of highly disordered APC in regulation of b-catenin function were established by showing that disordered APC helps the collection of b-catenin from cytoplasm, facilitates the bcatenin delivery to the binding sites on Axin, and controls the final detachment of b-catenin from Axin.
T59 63242-63674 Sentence denotes 182 Another important illustration of the involvement of intrinsic disorder in regulation of crucial pathway is given by the process of the programmed cell death (PCD), which is one of the most intricate cellular processes where the cell uses specialized Uversky cellular machinery and intracellular programs to kill itself and which enables metazoans to control cell numbers and eliminate cells that threaten the animal's survival.
T60 63675-63786 Sentence denotes 183 PCD includes several specific modules, such as apoptosis, autophagy, and programmed necrosis (necroptosis).
T61 63787-63944 Sentence denotes These modules are not only tightly regulated but also intimately interconnected and are jointly controlled via a complex set of protein-protein interactions.
T62 63945-64167 Sentence denotes Recently, several large sets of PCD-related proteins across 28 species were analyzed using a wide array of modern bioinformatics tools to understand the role of the intrinsic disorder in controlling and regulating the PCD.
T63 64168-64602 Sentence denotes 183 This analysis revealed that proteins involved in regulation and execution of PCD possess substantial amount of intrinsic disorder and IDPRs were implemented in a number of crucial functions, such as protein-protein interactions, interactions with other partners including nucleic acids and other ligands, were shown to be enriched in post-translational modification sites, and were characterized by specific evolutionary patterns.
T64 64603-64606 Sentence denotes 183
T65 64608-64701 Sentence denotes Unique catalytic function of a protein is believed to be dictated by its unique 3D structure.
T66 64702-64952 Sentence denotes This axiom constitutes a cornerstone of the lock-and-key paradigm and it seemed to be able to sustain the furious attack on protein structure-function relationship initiated by the discovery of IDPs and hybrid proteins with ordered domains and IDPRs.
T67 64953-65310 Sentence denotes In fact, from the vast majority of experimental and computational studies a general conclusion was drawn over and over again, where the functional repertoire of IDPs complemented the functional arsenal of ordered proteins, with ordered proteins being mostly responsible for catalysis and transport and with IDPs doing the majority of other jobs in the cell.
T68 65311-65511 Sentence denotes On the other hand, all proteins (even the most ordered and tightly folded ones) are intrinsically flexible molecules that undergo conformational changes over a wide range of timescales and amplitudes.
T69 65512-65681 Sentence denotes 184 In fact, the combination of active site reactivity with the dynamic character of proteins allows enzymes to be promiscuous and remarkably efficient at the same time.
T70 65682-66241 Sentence denotes 185 Furthermore, in general, dynamic fluctuations are crucial for enzyme catalysis, since they can influence substrate binding and product release, and may even adjust the effective barriers of the catalyzed reactions. [186] [187] [188] [189] [190] Often, dynamic changes in the enzyme during the catalytic reaction can be described using the induced-fit model, where a conversion of one tight conformational ensemble (free enzyme) to another distinct ensemble (bound enzyme) takes place through a series of local substrate-mediated structural rearrangements.
T71 66242-66541 Sentence denotes 191 Despite this crucial role of local flexibility in the enzymatic catalysis, enzymes are still relatively stable molecules whose dynamic character is restricted to a small set of tightly folded conformations and whose unique (albeit locally flexible) structures are needed for efficient catalysis.
T72 66542-66839 Sentence denotes From this viewpoint, the presence of intrinsic disorder is expected to be poorly compatible with enzymatic catalysis, which requires a well-organized environment in the active site of the enzyme in order to facilitate the formation of the transition state of the chemical reaction to be catalyzed.
T73 66840-67222 Sentence denotes 192 In a sharp contrast to this common wisdom supported by a wide array of specific examples, several enzymes were shown to be much more dynamic than the catalytic machines are expected to be, clearly possessing, in their precatalytic states, many characteristic properties of molten globules and retaining unusually high flexibility in structurally defined enzyme-ligand complexes.
T74 67223-67385 Sentence denotes One of the best characterized examples of such molten globular enzymes is the engineered monomeric form of chorismate mutase from Methanococcus jannaschii (MjCM).
T75 67386-67540 Sentence denotes 184, [193] [194] [195] Here, a functional monomer (mMjCM) was created by inserting the hinge-loop sequence into the long, dimer-spanning N-terminal helix.
T76 67541-67737 Sentence denotes 193 In its unbound form, mMjCM was shown to exists as a native molten globule that was described as a dynamic ensemble of a-helical conformers rapidly interconverting on the millisecond timescale.
T77 67738-67950 Sentence denotes 193 Interaction with natural ligand induced global conformational changes in the molten globular mMjCM promoting formation of a defined enzyme-ligand complex, which, however, preserved unusually high flexibility.
T78 67951-68426 Sentence denotes 184 Catalytic mechanism of the molten globular mMjCM was described as follows: "Though probably stochastic in nature, internal motions in the complex may generate a collective dynamic matrix that samples catalytically active conformation(s) often enough to achieve rapid turnover in the presence of the true transition state." 184 Therefore, some enzymes can represent a highly dynamic heterogeneous conformational ensemble which is still compatible with efficient catalysis.
T79 68427-68849 Sentence denotes In agreement with this hypothesis, a molten globular character was described for circularly permuted dihydrofolate reductase (DHFR), 196, 197 and urease G from Bacillus pasteurii (BpUreG). [198] [199] [200] Of these three enzymatic molten globules UreG is the only natural molten globular enzyme known to date, since both circularly permuted DHFR and monomeric MjCM were obtained as a result of some genetic manipulations.
T80 68850-69008 Sentence denotes Although the number of known native molten globules with enzymatic activity is small, their existence provides an interesting hint on early protein evolution.
T81 69009-69235 Sentence denotes In fact, simple logics suggests that well-ordered enzymes appear as a result of long evolutionary process, whose very likely starting point was a partially folded polypeptide with some general properties of the molten globule.
T82 69236-69434 Sentence denotes IDPs/IDPRs can form highly stable complexes, or be involved in signaling interactions where they undergo constant "bound-unbound" transitions, thus acting as dynamic and sensitive "on-off" switches.
T83 69435-69828 Sentence denotes The ability of these proteins to return to the highly flexible conformations after the completion of a particular function, and their predisposition to gain different conformations depending on the environmental peculiarities, are unique physiological properties of IDPs which allow them to exert different functions in different cellular contests according to a specific conformational state.
T84 69829-69830 Sentence denotes 4
T85 69831-70256 Sentence denotes Due to their lack of rigid structure, combined with the high level of intrinsic dynamics and almost unrestricted flexibility at various structure levels in the non-bound state, as well as due to the unique capability to adjust to structure of the binding partner, IDPs are characterized by a very diverse range of binding modes, creating a multitude of unusual complexes, many of which are not attainable by ordered proteins.
T86 70257-70433 Sentence denotes 201 Some of these complexes are relatively static, resemble complexes of ordered proteins, and, therefore are suitable for the structure determination by X-ray crystallography.
T87 70434-70467 Sentence denotes Among these static complexes are:
T88 70468-70666 Sentence denotes MoRFs, wrappers, chameleons, penetrators, huggers, intertwined strings, long cylindrical containers, connectors, armature, tweezers and forceps, grabbers, tentacles, pullers, and stackers or b-arcs.
T89 70667-70799 Sentence denotes 201 These binding modes are shown in Supporting Information Figure 1S and briefly described in the Supporting Information Materials.
T90 70800-71120 Sentence denotes In addition to the static complexes, where bound partners have fixed structures, some IDPs/IDPRs do not fold even in their bound state, forming so-called disordered, dynamic, or fuzzy complexes with ordered proteins, 97, [202] [203] [204] [205] [206] other disordered proteins, [207] [208] [209] or biological membranes.
T91 71121-71299 Sentence denotes 210, 211 In complexes of some of these IDPs with their binding partners, the disordered regions flanking the interaction interface but not the interface itself remain disordered.
T92 71300-71489 Sentence denotes Such mode of interaction was recently described as "the flanking fuzziness" in contrast to "the random fuzziness" when the disordered protein remains entirely disordered in the bound state.
T93 71490-71659 Sentence denotes 75, 212 It is also expected that the similar binding mode can be utilized by disordered protein while interacting with nucleic acids and other biological macromolecules.
T94 71660-71787 Sentence denotes 201 Physically, binding is considered as joining objects together and suggests spatial and temporal fixation of bound partners.
T95 71788-71920 Sentence denotes The formation of protein complexes with specific binding partners is expected to bring some fixation (at least at the binding site).
T96 71921-72169 Sentence denotes Therefore, disordered complexes where interaction of a disordered protein with the binding partners is not accompanied by a disorder-to-order transition within the interaction interface clearly cannot be described by the classical binding paradigm.
T97 72170-72317 Sentence denotes This contradiction can be resolved assuming that the ordered binding partner and/or disordered protein contain multiple low affinity binding sites.
T98 72318-72615 Sentence denotes The existence of several similar binding sites combined with a highly flexible and dynamic structure of disordered protein creates a unique situation where any binding site of disordered protein can interact with any binding site of its partner with almost equal probability, in a staccato manner.
T99 72616-72726 Sentence denotes The low affinity of each individual contact implies that each of them is not stable and can be readily broken.
T100 72727-73024 Sentence denotes Therefore, such disordered or fuzzy complex can be envisioned as a highly dynamic ensemble in which a disordered protein does not present a single binding site to its partner but resemble a "binding cloud," in which multiple identical binding sites are dynamically distributed in a diffuse manner.
T101 73025-73187 Sentence denotes In other words, in this staccato-type interaction mode, an disordered protein rapidly changes multiple binding sites while probing binding site(s) of its partner.
T102 73188-73326 Sentence denotes 201 An additional factor which can help holding a dynamic complex together could be a weak longrange attraction between protein molecules.
T103 73327-73541 Sentence denotes 213 This long-range attraction is universal for all protein solutions and has a range several times that of the diameter of the protein molecule, much greater than the range of the screened electrostatic repulsion.
T104 73542-73545 Sentence denotes 213
T105 73547-73675 Sentence denotes The most common outcome of these function-related structural changes is the overall increase in the amount of ordered structure.
T106 73676-73789 Sentence denotes However, functions of some ordered proteins require local or even global unfolding of a unique protein structure.
T107 73790-73979 Sentence denotes 68 Among specific features of these structural alterations are their induced nature and transient character combined with a wide range of molecular mechanisms by which they can be promoted.
T108 73980-74351 Sentence denotes 68 These functional unfolding-activating factors include light; mechanical force; changes in pH, temperature, or redox potential; interaction with membrane, ligands, nucleic acids, and proteins; various PTMs; release of autoinhibition due to the unfolding of autoinhibitory domains induced by their interaction with nucleic acids, proteins, membranes, PTMs, and so forth.
T109 74352-74476 Sentence denotes 68 Among rather unusual factors used by nature to activate proteins via functional unfolding are light and mechanical force.
T110 74477-74756 Sentence denotes For example, exposure to blue light results in the activation of the photoactive yellow protein (PYP), which is an ordered, water-soluble 14 kDa protein that contains a thioester linked Uversky p-coumaric acid cofactor and serves as a photosensor in Ectothiorhodospira halophila.
T111 74757-74866 Sentence denotes 214, 215 PYP is a bacterial blue light sensor that undergoes conformational changes upon signal transduction.
T112 74867-75035 Sentence denotes The absorption of a photon triggers substantial protein unfolding and leads to the formation of the transient signaling state that interacts with the partner molecules.
T113 75036-75148 Sentence denotes This allows the swimming bacterium to operate the directional switch that protects it from harmful illumination.
T114 75149-75540 Sentence denotes Comprehensive analysis combining double electron electron resonance spectroscopy (DEER), high resolution NMR, and timeresolved pump-probe X-ray solution scattering (TR-SAXS/WAXS) revealed that the transiently activated and short-lived signaling state of the PYP possessed a large degree of disorder and existed as an ensemble of multiple conformers that exchange on a millisecond time scale.
T115 75541-75712 Sentence denotes 216, 217 This unusual behavior is illustrated in Figure 4 that shows structures of inactive folded PYP and its light-activated functional form, which is highly disordered.
T116 75713-75827 Sentence denotes 68 Some proteins undergo local unfolding induced by the mechanical force and therefore can serve as force sensors.
T117 75828-76104 Sentence denotes 68 Among these natural force sensors are mechanosensitive ion channels that recognize and respond to the membrane tension, which is the mechanical forces applied along the plane of the cell membrane, rather than to the hydrostatic pressure perpendicular to the membrane plane.
T118 76105-76226 Sentence denotes 220 These ion channels are activated via partial unfolding of some of their functional parts induced by membrane tension.
T119 76227-76460 Sentence denotes 221 For a long time, the fact that IDPs/IDPRs undergo disorder-to-order transitions either during their functions or in order to be functional was used as one of the strongest arguments against the idea of protein intrinsic disorder.
T120 76461-76732 Sentence denotes It was stated that most IDPs (those which are not the artifacts of current methods of protein production) are in fact proteins waiting for a partner (PWPs) that serve as parts of a multi-component complex and that do not fold correctly in the absence of other components.
T121 76733-76859 Sentence denotes 29 Therefore, when folded after binding to their partners, these proteins are not too different from typical ordered proteins.
T122 76860-77096 Sentence denotes However, one need to keep in mind that a portion of "folding code" that defines the ability of ordered proteins to spontaneously gain a unique biologically active structure is missing for IDPs/IDPRs since they cannot fold spontaneously.
T123 77097-77200 Sentence denotes This missing portion of the "folding code" (or a part of it) can be supplemented by binding partner(s).
T124 77201-77533 Sentence denotes As a result, ordered and disordered proteins can be discriminated on a simple basis of temporal correlation between their folding and binding: ordered proteins fold first and then bind to their partners while the IDPs/IDPRs remain disordered until they bind their partners and often preserve substantial disorder in the bound state.
T125 77534-77839 Sentence denotes 69 Furthermore, numerous cases of functional unfolding (or transient disorder, or upside-down functionality) represent further support to the concept of functional disorder by clearly showing that many proteins possess dormant disorder that needs to be awakened in order to make these proteins functional.
T126 77840-77923 Sentence denotes It is clear now that the IDPs and IDPRs are real, abundant, diversified, and vital.
T127 77924-78006 Sentence denotes The highly dynamic nature of IDPs and IDPRs is a visual illustration of the chaos.
T128 78007-78218 Sentence denotes However, the evolutionary persistence of these highly dynamic proteins (see below), their unique functionality, and involvement in all the major cellular processes evidence that this chaos is tightly controlled.
T129 78219-78253 Sentence denotes 147 To answer the question as to .
T130 78254-78329 Sentence denotes Ground state structure was determined by multidimensional NMR spectroscopy.
T131 78330-78493 Sentence denotes 218 This structure is in agreement with an earlier published 1.4 Å crystal structure, 219 and modeled structure based on combined DEER, TR-SAXS/WAXS, and NMR data.
T132 78494-78695 Sentence denotes 217 It consists of an open, twisted, 6-stranded, antiparallel b-sheet, which is flanked by four ahelices on both sides. [217] [218] [219] On the contrary, the light-activated form is highly disordered.
T133 78696-78767 Sentence denotes This structure satisfies DEER, SAXS/ WAXS, and NMR data simultaneously.
T134 78768-78931 Sentence denotes 217 how these proteins are governed and regulated inside the cell, Gsponer et al. conducted a detailed study focused on the intricate mechanisms of IDP regulation.
T135 78932-79335 Sentence denotes 222 To this end, all the Saccharomyces cerevisiae proteins were grouped into three classes using one of the available disorder predictors, Dis-oPred2 44 : (i) 1971 highly ordered proteins containing 0-10% of the predicted disorder; (ii) 2711 moderately disordered proteins with 10-30% predicted disordered residues; and (iii) 2020 highly disordered proteins containing 30-100% of the predicted disorder.
T136 79336-79471 Sentence denotes Then, the correlations between intrinsic disorder and the various regulation steps of protein synthesis and degradation were evaluated.
T137 79472-79586 Sentence denotes This analysis revealed that the transcriptional rates of mRNAs encoding IDPs and ordered proteins were comparable.
T138 79587-79771 Sentence denotes However the IDP-encoding transcripts were generally less abundant than transcripts encoding ordered proteins due to the increased decay rates of the transcripts of genes encoding IDPs.
T139 79772-79921 Sentence denotes 222 Furthermore, IDPs were shown to be less abundant than ordered proteins due to the lower rate of protein synthesis and shorter protein half-lives.
T140 79922-80131 Sentence denotes As the abundance and half-life in a cell of certain proteins can be further modulated via their PTMs such as phosphorylation, 223 the experimentally determined yeast kinase-substrate network was also analyzed.
T141 80132-80215 Sentence denotes IDPs were shown to be substrates of twice as many kinases as were ordered proteins.
T142 80216-80400 Sentence denotes Furthermore, the vast majority of kinases whose substrates were IDPs were either regulated in a cell-cycle dependent manner, or activated upon exposure to particular stimuli or stress.
T143 80401-80605 Sentence denotes 222 Therefore, PTMs may not only serve as important mechanism for the fine-tuning of the IDP functions but possibly they are necessary to tune the IDP availability under the different cellular conditions.
T144 80606-80728 Sentence denotes 222 In addition to S. cerevisiae, similar regulation trends were also found in Schizosaccharomyces pombe and Homo sapiens.
T145 80729-80934 Sentence denotes 222 Based on these observations it has been concluded that both unicellular and multicellular organisms appear to use similar mechanisms for regulation of the intrinsically disordered protein availability.
T146 80935-81087 Sentence denotes Overall, this study clearly demonstrated that in eukaryotes, there is an evolutionarily conserved tight control of synthesis and clearance of most IDPs.
T147 81088-81268 Sentence denotes This tight control is directly related to the major roles of IDPs in signaling, where it is crucial to be available in appropriate amounts and not to be present longer than needed.
T148 81269-81629 Sentence denotes 222 It has been also pointed out that although the abundance of many IDPs is under strict control, some IDPs could be present in cells in large amounts or/and for long periods of time due to either specific PTMs or via interactions with other factors, which could promote changes in cellular localization of IDPs or protect them from the degradation machinery.
T149 81630-81798 Sentence denotes 13, 70, 138, 223, 224 Overall, this study clearly showed that the chaos seemingly introduced into the protein world by the discovery of IDPs is under the tight control.
T150 81799-81954 Sentence denotes 147 In an independent study, a global scale relationship between the predicted fraction of protein disorder and protein expression in E. coli was analyzed.
T151 81955-82187 Sentence denotes 225 This study showed that the fraction of protein disorder was positively correlated with both measured RNA expression levels of E. coli genes in three different growth media and with predicted abundance levels of E. coli proteins.
T152 82188-82409 Sentence denotes 225 When a subset of 216 E. coli proteins that are known to be essential for the survival and growth of this bacterium were analyzed, the correlation between protein disorder and expression level became even more evident.
T153 82410-82684 Sentence denotes In fact, essential proteins had on average a much higher fraction of disorder (0.36), had a higher number of proteins classified as completely disordered (19% vs. 2% for E. coli proteome), and were expressed at a higher level in all three media than an average E. coli gene.
T154 82685-82934 Sentence denotes 225 The manual literature mining for a group of E. coli proteins that had high levels of predicted intrinsic disorder revealed that the disorder predictions matched well with the experimentally elucidated regions of protein flexibility and disorder.
T155 82935-83167 Sentence denotes 225 A direct link between protein disorder and protein level in E. coli cells could be because the IDPs may carry out the essential control and regulation functions that are needed to respond to the various environmental conditions.
T156 83168-83348 Sentence denotes Another possibility is that IDPs might undergo more rapid degradation compared to structured proteins, which cells can counter by increasing mRNA levels of the corresponding genes.
T157 83349-83581 Sentence denotes In this case, higher synthesis and degradation rates could make the levels of these proteins very sensitive to the environment, with slight changes in either production or degradation leading to significant shifts in protein levels.
T158 83582-83731 Sentence denotes 225 Even more support for the tight control of IDPs inside the cell came from the analysis of cellular regulation of so-called "vulnerable" proteins.
T159 83732-83902 Sentence denotes 23 The integrity of the soluble protein functional structures is maintained in part by a precise network of hydrogen bonds linking the backbone amide and carbonyl groups.
T160 83903-84098 Sentence denotes In a well-ordered protein, hydrogen bonds are shielded from water attack, preventing backbone hydration and the total or partial unfolding of the soluble structure under physiological conditions.
T161 84099-84434 Sentence denotes 226, 227 Since soluble protein structures may be more or less vulnerable to water attack depending on their packing quality, a structural attribute, protein vulnerability, was introduced as the ratio of solvent-exposed backbone hydrogen bonds (which represent local weaknesses of the structure) to the overall number of hydrogen bonds.
T162 84435-84725 Sentence denotes 23 It has been also pointed out that structural vulnerability can be related to protein intrinsic disorder as the inability of a particular protein fold to protect intramolecular Uversky hydrogen bonds from water attack may result in backbone hydration leading to local or global unfolding.
T163 84726-84954 Sentence denotes Since binding of a partner can help to exclude water molecules from the microenvironment of the preformed bonds, a vulnerable soluble structure gains extra protection of its backbone hydrogen bonds through the complex formation.
T164 84955-85191 Sentence denotes 226 To understand the role of structure vulnerability in transcriptome organization, the relationship between the structural vulnerability of a protein and the extent of co-expression of genes encoding its binding partners was analyzed.
T165 85192-85342 Sentence denotes This study revealed that structural vulnerability can be considered as a determinant of transcriptome organization across tissues and temporal phases.
T166 85343-85739 Sentence denotes 23 Finally, by interrelating vulnerability, disorder propensity and co-expression patterns, the role of protein intrinsic disorder in transcriptome organization was confirmed, since the correlation between the extent of intrinsic disorder of the most disordered domain in an interacting pair and the expression correlation of the two genes encoding the respective interacting domains was evident.
T167 85740-85742 Sentence denotes 23
T168 85744-85934 Sentence denotes Because of the fact that IDPs are highly abundant and play crucial roles in numerous biological processes, it was not too surprising to find that some of them are involved in human diseases.
T169 85935-86097 Sentence denotes For example, a number of human diseases originate from the deposition of stable, ordered, filamentous protein aggregates, commonly referred to as amyloid fibrils.
T170 86098-86441 Sentence denotes In each of these pathological states, a specific protein or protein fragment changes from its natural soluble form into insoluble fibrils, which accumulate in a variety of organs and tissues. [228] [229] [230] [231] [232] [233] [234] Several unrelated proteins including many IDPs are known to be involved in these protein deposition diseases.
T171 86442-87089 Sentence denotes 234, 235 An illustrative examples of human neurodegenerative diseases associated with IDPs includes Alzheimer's disease (deposition of amyloid-b, tau-protein, a-synuclein fragment NAC) [236] [237] [238] [239] ; various taupathies (accumulation of tau-protein in the form of neurofibrillary tangles) 238 ; Down's syndrome (nonfilamentous amyloid-b deposits) 240 ; Parkinson's disease and other synucleinopathies (deposition of asynuclein) 241 ; prion diseases (deposition of PrP SC ) 242 ; and a family of polyQ diseases, a group of neurodegenerative disorders caused by expansion of GAC trinucleotide repeats coding for PolyQ in the gene products.
T172 87090-87581 Sentence denotes 243 Furthermore, most mutations in rigid globular proteins associated with accelerated fibrillation and protein deposition diseases have been shown to destabilize the native structure, increasing the steady-state concentration of partially folded (disordered) conformers. [228] [229] [230] [231] [232] [233] [234] The maladies given above have been called conformational diseases, as they are characterized by the conformational changes, misfolding, and aggregation of an underlying protein.
T173 87582-87649 Sentence denotes However, there is another side to this coin: protein functionality.
T174 87650-87790 Sentence denotes In fact, many of the proteins associated with the conformational disorders are also involved in recognition, regulation, and cell signaling.
T175 87791-88071 Sentence denotes For example, functions ascribed to a-synuclein, a protein involved in several neurodegenerative disorders, include binding fatty acids and metal ions; regulation of certain enzymes, transporters, and neurotransmitter vesicles; and regulation of neuronal survival (reviewed in Ref.
T176 88072-88078 Sentence denotes 241) .
T177 88079-88179 Sentence denotes Overall, there are about 50 proteins and ligands that interact and/or co-localize with this protein.
T178 88180-88340 Sentence denotes Furthermore, a-synuclein has amazing structural plasticity and adopts a series of different monomeric, oligomeric, and insoluble conformations (reviewed in Ref.
T179 88341-88346 Sentence denotes 24) .
T180 88347-88542 Sentence denotes The choice between these conformations is determined by the peculiarities of the protein environment, suggesting that asynuclein has an exceptional ability to fold in a template-dependent manner.
T181 88543-88733 Sentence denotes Therefore, the development of the conformational diseases may originate not only from misfolding but also from the misidentification, misregulation, and missignaling of the related proteins.
T182 88734-88812 Sentence denotes Analysis of so-called polyglutamine diseases gives support to this hypothesis.
T183 88813-89068 Sentence denotes 244 Polyglutamine diseases are a specific group of hereditary neurodegeneration caused by expansion of CAG triplet repeats in an exon of disease genes which leads to the production of a disease protein containing an expanded polyglutamine, polyQ, stretch.
T184 89069-89387 Sentence denotes Nine neurodegenerative disorders, including Kennedy's disease, Huntington's diseases, spinocerebellar atrophy- 1, 22, 23, 26, 7, 17 , and dentatorubral pallidoluysian atrophy are known to belong to this class of diseases. [245] [246] [247] [248] In most polyQ diseases, expansion to over 40 repeats leads to the onset.
T185 89388-89585 Sentence denotes 248 It has been emphasized that such molecular processes as unfolded protein response, protein transport, synaptic transmission, and transcription are implicated in the pathology of polyQ diseases.
T186 89586-89710 Sentence denotes 244 Importantly, more than 20 transcription-related factors have been reported to interact with pathological polyQ proteins.
T187 89711-89855 Sentence denotes Furthermore, these interactions were shown to repress the transcription, leading finally to the neuronal dysfunction and death (reviewed in Ref.
T188 89856-89862 Sentence denotes 244) .
T189 89863-90047 Sentence denotes These results suggest that polyQ diseases represent kind of transcriptional disorder, 244 supporting our misidentification hypothesis for at least some of the conformational disorders.
T190 90048-90106 Sentence denotes Disorder is very common in cancer-associated proteins too.
T191 90107-90265 Sentence denotes In a 2002 study, it was found that 79% of cancer-associated and 66% of cell-signaling proteins contain predicted regions of disorder of 30 residues or longer.
T192 90266-90400 Sentence denotes 130 In contrast, only 13% of a set of proteins with well-defined ordered structures contained such long regions of predicted disorder.
T193 90401-90815 Sentence denotes 130 In experimental studies, the presence of disorder has been directly observed in several cancer-associated proteins, including p53, 249 p57 kip2 , 250 Bcl-X L and Bcl-2, 251 c-Fos, 252 a thyroid cancer associated protein TC-1, 253 EWS-FLI1 fusion protein that includes a potent transcriptional activator, the EWS domain, alongside the highly conserved DNA-binding domain FLI1, 254,255 among many other examples.
T194 90816-91177 Sentence denotes The best characterized example of the important cancerrelated IDP is the tumor suppressor protein p53, which occupies the center of a large signaling network. p53 regulates expression of genes involved in numerous cellular processes, including cell cycle progression, apoptosis induction, DNA repair, as well as others involved in responding to cellular stress.
T195 91178-91340 Sentence denotes 256 When p53 function is lost, either directly through mutation or indirectly through several other mechanisms, the cell often undergoes cancerous transformation.
T196 91341-91495 Sentence denotes 257, 258 Cancers showing mutations in p53 are found in colon, lung, esophagus, breast, liver, brain, reticuloendothelial tissues, and hemopoietic tissues.
T197 91496-91712 Sentence denotes 257 p53 is regulated by several different mechanisms including inhibition of its activity by interaction with E3 ubiquitin ligase Mdm2, which binds to a short stretch of p53 located within the transactivation domain.
T198 91713-91767 Sentence denotes Mdm2-bound p53 cannot activate or inhibit other genes.
T199 91768-91827 Sentence denotes Mdm2 ubiquitinates p53 and thus targets it for destruction.
T200 91828-91924 Sentence denotes Mdm2 also contains a nuclear export signal that causes p53 to be transported out of the nucleus.
T201 91925-91933 Sentence denotes 259, 260
T202 91935-92147 Sentence denotes The possibility of interrupting the action of diseaseassociated proteins (including through modulation of protein-protein interactions) presents an extremely attractive objective for the development of new drugs.
T203 92148-92462 Sentence denotes Since many proteins associated with various human diseases are either completely disordered or contain long disordered regions, 261, 262 and since some of these disease-related IDPs/IDPRs are involved in recognition, regulation, and signaling, these proteins/regions clearly represent novel potential drug targets.
T204 92463-92646 Sentence denotes 27 Due to failure to recognize the important role of disorder in protein function, current and evolving methods of drug discovery suffer from an overly rigid view of protein function.
T205 92647-92789 Sentence denotes In fact, the rational design of enzyme inhibitors depends on the classical view where 3D-structure is an obligatory prerequisite for function.
T206 92790-92965 Sentence denotes While generally applicable to many enzymatic domains, this view has persisted to influence thinking concerning all protein functions despite numerous examples to the contrary.
T207 92966-93221 Sentence denotes This is most apparent in the observation that the vast majority of currently available drugs target the active site of enzymes, presumably since these are the only proteins for which the "unique structure-unique function" paradigm is generally applicable.
T208 93222-93447 Sentence denotes IDPs often bind their partners with relatively short regions that become ordered upon binding. [263] [264] [265] Targeting disorder-based interactions should enable the development of more effective drug discovery techniques.
T209 93448-93693 Sentence denotes There are at least two potential approaches for the inhibition of the disorder-based interactions, where small molecule either bind to the binding site of the ordered partner to outcompete the IDPs/IDPRs or interacts directly with the IDP/ IDPR.
T210 93694-93854 Sentence denotes The principles of small molecule binding to IDPRs have not been well studied, but sequence specific, small molecule binding to short peptides has been observed.
T211 93855-94077 Sentence denotes 266 An interesting twist here is that small molecules can inhibit disorder-based proteinprotein interactions via induction of the dysfunctional ordered structures in targeted IDPR, that is, via the drug-induced misfolding.
T212 94078-94277 Sentence denotes In agreement with these concepts, small molecules "Nutlins" have been discovered that inhibited the p53-Mdm2 interaction by mimicking the inducible a-helix in p53 (residues 13-29) that binds to Mdm2.
T213 94278-94630 Sentence denotes 259, 260 Although X-ray crystallographic studies of the p53-Mdm2 complex revealed that the Mdm2 binding region of p53 forms an a-helical structure that binds into a deep groove on the surface of Mdm2, 267 NMR studies showed that the unbound N-terminal region of p53 lacks fixed structure, although it does possess an amphipathic helix part of the time.
T214 94631-94886 Sentence denotes 249 A close examination of the interface between the proteins reveals that Phe 19 , Trp 23 , and Leu 26 of p53 are the major contributors to the interaction, with the side chains of these three amino acids pointing down into a crevice on the Mdm2 surface.
T215 94887-95235 Sentence denotes 259, 260 The structure of Nutlin-2 was shown to mimic the crucial residues of p53, with two bromophenyl groups fitting into Mdm2 in the same pockets as Trp 23 and Leu 26 , and an ethyl-ether side chain filling the spot normally taken by Phe 19 . [268] [269] [270] Nutlins and related small molecules increased the level of p53 in cancer cell lines.
T216 95236-95335 Sentence denotes This drastically decreased the viability of these cells, causing most of them to undergo apoptosis.
T217 95336-95455 Sentence denotes When one of the nutlins was given orally to mice, a 90% inhibition of tumor growth compared to the control was induced.
T218 95456-95633 Sentence denotes 260, [268] [269] [270] This successful nutlin story marks the potential beginning of a new era, the signaling-modulation era, in targeting drugs to protein-protein interactions.
T219 95634-95723 Sentence denotes Importantly, this druggable p53-Mdm2 interaction involves a disorder-to-order transition.
T220 95724-95862 Sentence denotes Principles of such transitions are generally understood and therefore can use to find similar drug targets, which are inducible a-helices.
T221 95863-96002 Sentence denotes 271 In addition to nutlins inhibiting p53-Mdm2 interaction, several other small molecules also act by blocking proteinprotein interactions.
T222 96003-96153 Sentence denotes 272, 273 Some of these interactions involve one structured partner and one disordered partner, with disordered segments becoming a-helix upon binding.
T223 96154-96306 Sentence denotes 271 Therefore, the p53-Mdm2 complex is not a unique exception and many other disorderbased protein-protein interactions are blocked by a small molecule.
T224 96307-96446 Sentence denotes All this suggest that there is a cornucopia of new drug targets that would operate by blocking disorder-based protein-protein interactions.
T1 96447-96653 Sentence denotes For these p53-Mdm2-type examples, the drug molecules mimic a critical region of the disordered partner (which folds upon binding) and compete with this region for its binding site on the structured partner.
T2 96654-96741 Sentence denotes These druggable interaction sites operate by the coupled binding and folding mechanism.
T3 96742-96824 Sentence denotes They are small enough and compact enough to be easily mimicked by small molecules.
T4 96825-97053 Sentence denotes 25 Methods for predicting such binding sites in disordered regions have been developed 274 and the bioinformatics tools to identify which disordered binding regions can be easily mimicked by small molecules have been elaborated.
T5 97054-97263 Sentence denotes 271 A complementary approach for small molecules to inhibit the disorder-based protein-protein interactions relies on the direct binding of drugs to the IDPs/IDPRs, which is illustrated by the c-Myc-Max story.
T6 97264-97499 Sentence denotes 275 In order to bind DNA, regulate expression of target genes, and function in most biological contexts, c-Myc transcription factor must dimerize with its obligate heterodimerization partner, Max, which lacks a transactivation segment.
T7 97500-97573 Sentence denotes Both c-Myc and Max are intrinsically disordered in their monomeric forms.
T8 97574-97705 Sentence denotes Upon heterodimerization, they undergo coupled binding and folding of their basic-helix-loophelix-leucine zipper domains (bHLHZips).
T9 97706-97868 Sentence denotes Since the deregulation of c-Myc is related to many types of cancer, the disruption of the c-Myc-Max dimeric complex is one of the approaches for c-Myc inhibition.
T10 97869-97945 Sentence denotes Several small molecules were found to inhibit the c-Myc-Max dimer formation.
T11 97946-98191 Sentence denotes 275 These molecules were shown to bind to one of the three discrete sites within the 85-residue bHLHZip domain of c-Myc, which are composed of short contiguous stretches of amino acids that can selectively and independently bind small molecules.
T12 98192-98332 Sentence denotes 275 Inhibitor binding induces only local conformational changes, preserves the overall disorder of c-Myc, and inhibits interaction with Max.
T13 98333-98464 Sentence denotes 275 Furthermore, binding of inhibitors to c-Myc was shown to occur simultaneously and independently on the three independent sites.
T14 98465-98702 Sentence denotes Based on these observations it has been concluded that a rational and generic approach to the inhibition of protein-protein interactions involving IDPs may therefore be possible through the targeting of intrinsically disordered sequence.
T15 98703-98856 Sentence denotes 275 Recently, a functional misfolding concept was introduced to describe a mechanism preventing IDPs from unwanted interactions with non-native partners.
T16 98857-99060 Sentence denotes 276 IDPs/IDPRs are characterized by high conformational dynamics and flexibility, the presence of sticky preformed binding elements, and the ability to morph into differently-shaped bound configurations.
T17 99061-99268 Sentence denotes However, detailed analyses of the conformational behavior and fine structure of several IDPs revealed that the preformed binding elements might be involved in a set of non-native intramolecular interactions.
T18 99269-99606 Sentence denotes Based on these observations it was proposed that an intrinsically disordered polypeptide chain in its unbound state can be misfolded to sequester the preformed elements inside the noninteractive or less-interactive cage, therefore preventing these elements from the unnecessary and unwanted interactions with non-native binding partners.
T19 99607-99773 Sentence denotes 276 It is important to remember, however, that the mentioned functional misfolding is related to the ensemble behavior of transiently populated elements of structure.
T20 99774-100029 Sentence denotes In other words, it describes the behavior of a globally disordered polypeptide chain containing highly dynamic elements of residual structure, so-called interaction-prone preformed fragments, some of which could potentially be related to protein function.
T21 100030-100303 Sentence denotes 276 This ability of IDRPs/IDPRs to functionally misfold can be used for finding small molecules which would potentially stabilize different members of the functionally misfolded ensemble, and therefore prevent the targeted protein from establishing biological interactions.
T22 100304-100550 Sentence denotes 277 This approach is very different from the discussed above direct targeting of short IDPRs since it is based on a small molecule binding to a highly dynamic surface created via the transient interaction of preformed interaction-prone fragments.
T23 100551-100704 Sentence denotes In essence, this approach can be considered as an extension of the well-established structure-based rational drug design elaborated for ordered proteins.
T24 100705-101019 Sentence denotes In fact, if the structure of a member(s) of the functionally misfolded ensemble can be guessed, then this structure can be used to find small molecules that are potentially able to interact with this structure, utilizing tools originally developed for the rational structure-based drug design for ordered proteins.
T25 101020-101115 Sentence denotes 277 Ideally, a drug that targets a given protein-protein interaction should be tissue specific.
T26 101116-101262 Sentence denotes Although some proteins are unique for a given tissue, many more proteins have very wide distribution, being present in several tissues and organs.
T27 101263-101338 Sentence denotes How can one develop tissue-specific drugs targeting such abundant proteins?
T28 101339-101536 Sentence denotes Often, tissue specificity for many of the abundant proteins is achieved via the alternative splicing of the corresponding pre-mRNAs, which generates two or more protein isoforms from a single gene.
T29 101537-101657 Sentence denotes Estimates indicate that between 35 and 60% of human genes yield protein isoforms by means of alternatively spliced mRNA.
T30 101658-101828 Sentence denotes 278 The added protein diversity from alternative splicing is thought to be important for tissue-specific signaling and regulatory networks in the multicellular organisms.
T31 101829-102061 Sentence denotes The regions of alternative splicing in proteins are enriched in intrinsic disorder, and it was proposed that associating alternative splicing with protein disorder enables the time-and tissue-specific modulation of protein function.
T32 102062-102251 Sentence denotes 152 Since disorder is frequently utilized in protein binding regions, having alternative splicing of pre-mRNA coupled to IDPRs can define tissue-specific signaling and regulatory diversity.
T33 102252-102572 Sentence denotes 152 These findings open a unique opportunity to develop tissue-specific drugs modulating the function of a given ID protein/region (with a unique profile of disorder distribution) in a target tissue and not affecting the functionality of this same protein (with different disorder distribution profile) in other tissues.
T34 102573-102700 Sentence denotes Wavy pattern of global evolution of intrinsic disorder IDPs/IDPRs are more common in eukaryotes than in less complex organisms.
T35 102701-102975 Sentence denotes 43, 44, [48] [49] [50] [51] [52] This suggests that disorder, with its ability to be implemented in various signaling, recognition, and regulation pathways and networks, is important for the maintenance of life in eukaryotic and especially muticellular eukaryotic organisms.
T36 102976-103358 Sentence denotes 4, 45, 78, 134 Also, the finding that alternatively spliced regions of mRNA code for IDPRs much more often than for structured regions suggests that there is a linkage between alternative splicing and signaling by IDPRs that constitutes a plausible mechanism that could underlie and support cell differentiation, which ultimately gave rise to the multicellular eukaryotic organisms.
T37 103359-103467 Sentence denotes 152 Therefore, one can assume that intrinsic disorder represents a relatively recent evolutionary invention.
T38 103468-103577 Sentence denotes However, this hypothesis obviously would be wrong if earlier stages of evolution would be taken into account.
T39 103578-103749 Sentence denotes In fact, the chances that the first polypeptides that appeared in the primordial soup of the primitive Earth possessed well-developed and unique 3D structures are minimal.
T40 103750-103795 Sentence denotes The Earth formed about 4.5 billion years ago.
T41 103796-103857 Sentence denotes Scientists dated the first fossils to 3.85 billion years ago.
T42 103858-104040 Sentence denotes There are still debates and different theories about what happened in those years between the time the earth was cool enough to spawn life and the time the first fossils were formed.
T43 104041-104400 Sentence denotes At the beginning of the 20th century, Oparin 279 and Haldane 280 proposed that some organic molecules could have been spontaneously produced from the gases of the primitive Earth atmosphere, assuming that this primitive atmosphere was reducing (as opposed to oxygen-rich), and there was an appropriate supply of energy, such as lightning or ultraviolet light.
T44 104401-104572 Sentence denotes Thirty year later, this hypothesis (that constitutes a cornerstone of the theory of molecular evolution) received strong support from the elegant experiments of Stanley L.
T45 104573-104593 Sentence denotes Miller and Harold C.
T46 104594-105008 Sentence denotes Urey who were able to synthesize various organic compounds including some amino acids from non-organic compounds which were believed to represent the major components of the early Earth's atmosphere (water vapor, hydrogen, methane, and ammonia) by putting them into a closed system and running a continuous electric current through the system, to simulate lightning storms believed to be common on the early Earth.
T47 105009-105199 Sentence denotes 281, 282 However, the Miller-Urey experiment yielded only about half of the modern amino acids 281, 282 suggesting that the first proteins on Earth may have contained only a few amino acids.
T48 105200-105496 Sentence denotes These findings go in parallel with the biosynthetic theory of the genetic code evolution suggesting that the genetic code evolved from a simpler form that encoded fewer amino acids, 283 probably paralleled by the invention of biosynthetic pathways for new and chemically more complex amino acids.
T49 105497-105740 Sentence denotes 284 Furthermore, some additional support of the validity of this hypothesis can be found in the standard genetic code (that consists of 4 3 4 3 4 5 64 triplets of nucleotides, codons), which is redundant (64 codons encodes for 20 amino acids).
T50 105741-105850 Sentence denotes In fact, with only two exceptions, codons encoding one amino acid may differ in any of their three positions.
T51 105851-106060 Sentence denotes However, only the third positions of some codons may be fourfold degenerate, that is, any nucleotide at this position specifies the same amino acid and all nucleotide substitutions at this site are synonymous.
T52 106061-106406 Sentence denotes Using these observations as a reflection of the evolutionary development, it was proposed that there was a period during code evolution where the third position was not needed at all and a doublet code preceded the triplet code, giving rise to 4 3 4 5 16 codons encoding for 16 or fewer amino acids, if a termination codon is taken into account.
T53 106407-106509 Sentence denotes 285 Based on these and many other premises, one can discriminate evolutionary old and new amino acids.
T54 106510-106528 Sentence denotes In 2000, Eduard N.
T55 106529-106737 Sentence denotes Trifonov combined 40 different single-factor criteria into a consensus scale and proposed the following temporal order of addition for the amino acids: G/A, V/D, P, S, E/L, T, R, N, K, Q, I, C, H, F, M, Y, W.
T56 106738-106921 Sentence denotes 286 Even superficial analysis of this sequence reveals that many of the early amino acids (such as G, D, E, P, and S) are disorder-promoting, as they are very abundant in modern IDPs.
T57 106922-107028 Sentence denotes On the other hand, the major orderpromoting residues (C, W, Y, and F) were added to the genetic code late.
T58 107029-107335 Sentence denotes This observation is further illustrated by Figure 5 (A) which represents modern genetic code, contains information on the early and late codons (shown by light red and light blue colors, respectively), and on corresponding disorder-and order-promoting residues (shown by red and blue colors, respectively).
T59 107336-107449 Sentence denotes Codons with intermediate age and disorder-neutral residues are shown by light pink and pink colors, respectively.
T60 107450-107458 Sentence denotes Figure 5
T61 107460-107710 Sentence denotes Uversky illustrates that there is relatively good agreement between the "age" of the residue and its disorderpromoting capacity, with early residues being mostly disorder-promoting, and with the majority of late residues being mostly order-promoting.
T62 107711-107830 Sentence denotes This conclusion follows from the abundance of the matching colors (light red-red, light blue-blue, and light pinkpink).
T63 107831-107954 Sentence denotes There are only two noticeable exceptions from these rule, valine and leucine, which are early but order-promoting residues.
T64 107955-108041 Sentence denotes This strongly suggests that the primordial polypeptides were intrinsically disordered.
T65 108042-108137 Sentence denotes It is very unlikely that these disordered primordial polypeptides possessed catalytic activity.
T66 108138-108348 Sentence denotes 287 This hypothesis is in line with the RNA world theory suggesting that during the evolution of enzymatic activity, catalysis was transferred from RNA first to ribonucleoprotein (RNP) and only then to protein.
T67 108349-108527 Sentence denotes 288 Therefore, the first proteins in the "breakthrough organism" (the first to have encoded protein synthesis) would be nonspecific chaperone-like proteins rather than catalysts.
T68 108528-108778 Sentence denotes 136, 287 Such RNA chaperone activities of early proteins conferred to their carriers a significant selective advantage in the RNA world, where RNA, which is especially prone to misfolding, 289, 290 was used for both information storage and catalysis.
T69 108779-108884 Sentence denotes 291 Since the variability of physicochemical properties of amino acids greatly exceeds that of Figure 5 .
T70 108885-108921 Sentence denotes Peculiarities of disorder evolution.
T71 108922-108924 Sentence denotes A:
T72 108925-109135 Sentence denotes Modern genetic code with information on the early and late codons (shown by light red and light blue colors, respectively) and disorder-and order-promoting residues (shown by red and blue colors, respectively).
T73 109136-109307 Sentence denotes Codons with intermediate ages (i.e., those located between early and late codons) are shown by light pink color, whereas disorder-neutral residues are shown by pink color.
T74 109308-109310 Sentence denotes B:
T75 109311-109357 Sentence denotes Wavy pattern of the global disorder evolution.
T76 109358-109473 Sentence denotes X-axis represents evolutionary time and Y-axis shows disorder content in proteins at given evolutionary time point.
T77 109474-109992 Sentence denotes Here, primordial proteins are expected to be mostly disordered (left-hand side of the plot), proteins in LUA likely are mostly structured (center of the plot), whereas many protein in eukaryotes are either totally disordered or hybrids containing both ordered and disordered regions (right-hand side of the plot). nucleotides and since protein structures are noticeably more stable than RNA structures, the transition from RNAs (ribozymes) to proteins as carriers of enzymatic activity was a logical evolutionary step.
T78 109993-110119 Sentence denotes However, efficient catalysis relies on the proper spatial arrangement of catalytic residues which requires a stable structure.
T79 110120-110255 Sentence denotes 292 Therefore, grafting of the enzymatic activity to proteins generated strong evolutionary pressure toward the well-folded structures.
T80 110256-110662 Sentence denotes In other words, the global evolution of intrinsic disorder is characterized by a wavy pattern [see Fig. 5 (B)], where highly disordered primordial proteins with primarily RNA-chaperone activities were gradually substituted by the well-folded, highly ordered enzymes that evolved to catalyze the production of all the complex "goodies" crucial for the independent existence of the first cellular organisms.
T81 110663-111084 Sentence denotes Due to its specific features crucial for the regulation of complex processes, protein intrinsic disorder was reinvented at the subsequent evolutionary steps leading to the development of more complex organisms from the last universal ancestor (i.e., the most recent organism from which all organisms now living on Earth descend 293, 294 ) , and culminating in the appearance of the highly elaborated eukaryotic cells [see
T82 111086-111218 Sentence denotes There is no simple answer to the question on the comparative evolutionary rates of ordered and IDPs and regions in modern organisms.
T83 111219-111363 Sentence denotes In fact, it looks like everything is possible, and intrinsically disordered sequences may evolve faster, slower or similar to ordered sequences.
T84 111364-111537 Sentence denotes For example, disordered and ordered domains of the same protein (e.g., papillomavirus E7 oncoprotein) were shown to possess similar degrees of conservation and co-evolution.
T85 111538-111693 Sentence denotes 295 Many other IDPs/IDPRs were shown to be characterized by high evolutionary rates 151,296,297 determined by the lack of specific structural restrictions.
T86 111694-112015 Sentence denotes In fact, the analysis of calcineurins, 10 topoisomerase, 298 ribosomal protein S4, 299 b-subunits of the potassium channel Kvb1.1, 300 and many other proteins showed that disordered regions in these proteins contained more amino acid substitutions, insertions, and deletions than the ordered regions of the same proteins.
T87 112016-112511 Sentence denotes 151, 301 Furthermore, based on the observation that a significantly higher degree of positive Darwinian selection was observed in IDPRs of proteins compared to regions of a-helix, b-sheet or tertiary structures, it was hypothesized that IDPRs may be required for the genetic variation with adaptive potential and that these regions may be of "central significance for the evolvability of the organism or cell in which they occur." 302 On the other hand, some IDPs and IDPRs are highly conserved.
T88 112512-112806 Sentence denotes Human a-synuclein (a canonical IDP of 140 residues 140,303 ) differs from its mouse counterpart by merely six residues (4%), and there are just 21 residue differences (12%, which include residue differences at 18 positions and 3 insertions/ deletions) between the human and canary a-synucleins.
T89 112807-112911 Sentence denotes 304 In flagellin, the ordered central region has greater sequence diversity than its disordered termini.
T90 112912-113067 Sentence denotes 305 Functionally important conserved regions of predicted disorder were shown to be rather common in proteins from all kingdoms of life, including viruses.
T91 113068-113178 Sentence denotes 306, 307 Furthermore, many functional domains of a significant size were shown to be intrinsically disordered.
T92 113179-113509 Sentence denotes 165 Overall, a systematic study of several families of proteins with at least one structurally characterized disordered region revealed that their IDPRs are characterized by highly heterogeneous evolutionary rates, with some disordered amino acid sequences evolving slowly, and others evolving more rapidly than ordered sequences.
T93 113510-113658 Sentence denotes 151 Also, even different parts of the same disordered region can possess noticeable variability in their divergence during the evolutionary process.
T94 113659-113797 Sentence denotes 308 Finally, in some disordered proteins, peculiarities of the amino acid composition, and not the amino acid sequence might be conserved.
T95 113798-113829 Sentence denotes 309, 310 Some Future Directions
T96 113830-113945 Sentence denotes The last 15 years witnessed a real revolution in our understanding of the protein structure-function relationships.
T97 113946-114188 Sentence denotes The fact that there is an entire class of polypeptides which do not have rigid structures but possess crucial biological function was heavily underappreciated and ignored for a very long time despite numerous examples scattered in literature.
T98 114189-114403 Sentence denotes The work which started in my group as an attempt to understand what is so special about several natively unfolded proteins produced a real explosion of interest to structure-less proteins with biological functions.
T99 114404-114527 Sentence denotes A new field was created and a lot of intriguing information was produced related to structures and functions of IDPs/IDPRs.
T100 114528-114710 Sentence denotes There is no need to list once again all the discoveries and findings made in this field-they are subjects of many recent reviews and some of them are briefly covered in this article.
T101 114711-115082 Sentence denotes Although the amount of data generated during the past decade and a half on specific features related to the structural properties of IDPs and IDPRs, their abundance, distribution, functional repertoire, regulation, involvement into the disease pathogenesis, and so forth is vast, it seems that this mass of data produced so far is just a small tip of a humongous iceberg.
T102 115083-115199 Sentence denotes IDPs/IDPRs continue to bring discoveries almost on a daily basis and even more breakthroughs are expected in future.
T103 115200-115284 Sentence denotes Modern protein science is at the turning point, but biology still waits for physics.
T104 115285-115581 Sentence denotes New models explaining various functions of IDPs, their evolution, and involvement in diseases are in great demand, together with the general theory unifying current knowledge on protein structure and function, and with novel experimental and computational tools for focused studies of IDPs/IDPRs.