CORD-19:01e3b313e78a352593be2ff64927192af66619b5 / 422-532 JSONTXT

Title: Viruses are a dominant driver of protein adaptation in mammals Abstract Viruses interact with hundreds to thousands of proteins in mammals, yet adaptation 6 against viruses has only been studied in a few proteins specialized in antiviral defense. Whether adaptation to viruses typically involves only specialized antiviral proteins or 8 affects a broad array of proteins is unknown. Here, we analyze adaptation in ~1,300 9 virus-interacting proteins manually curated from a set of 9,900 proteins conserved 10 across mammals. We show that viruses (i) use the more evolutionarily constrained 11 proteins from the cellular functions they hijack and that (ii) despite this high constraint, 12 virus-interacting proteins account for a high proportion of all protein adaptation in 13 humans and other mammals. Adaptation is elevated in virus-interacting proteins across 14 all functional categories, including both immune and non-immune functions. Our results 15 demonstrate that viruses are one of the most dominant drivers of evolutionary change 16 across mammalian and human proteomes. 17 The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/029397 doi: bioRxiv preprint specialized in antiviral defense, and do not even have any known role in immunity. Many 1 VIPs instead have key functions in basic cellular processes subverted by viruses, and 2 viruses tend to interact with proteins that are functionally important hubs in the protein- 2015) . It is plausible that many VIPs might evolve to limit the impact of the viruses on the 5 host. However, it is unknown whether the war against viruses is fought by a 6 "professional" army of specifically antiviral proteins, or whether it is a global war fought 7 by a broad range of VIPs. One reason to believe that the war against viruses might not affect evolution of a broad 9 array of VIPs is that, contrary to the pattern observed for specifically antiviral proteins, The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/029397 doi: bioRxiv preprint viruses have driven a substantial proportion of all adaptations across the human and 1 mammalian proteomes, establishing that the war against viruses does indeed affect the 2 proteome as a whole. We finally showcase the power of our global scan for adaptation in VIPs by studying the 4 case of aminopeptidase N, a well-known multifunctional enzyme (Mina-Osorio, 2008) 5 used by coronaviruses as a receptor (Delmas et al., 1992; Yeager et al., 1992) . Using 6 our approach we reach an amino-acid level understanding of parallel adaptive evolution 7 in aminopeptidase N in response to coronaviruses in a wide range of mammals. 8 Here we analyze patterns of both adaptive evolution and evolutionary 10 constraint/purifying selection in a large set of 1,256 manually curated VIPs from the low-11 throughput virology literature (Methods and Table S1 available online). We exclude Table S2 and Methods). VIPs in our dataset interact with viral 16 proteins, viral RNA, or viral DNA. Most of them (95%) correspond to an interaction 17 between a human protein and a virus infecting humans (Table S1 ). Human Immunodeficiency Virus type 1 (HIV-1) is the best-represented virus with 240 VIPs, with 19 nine other viruses having at least 50 VIPs (Table S1 ). This dataset represents the largest, most up-to-date set of VIPs backed up by individual 21 low-throughput publications. Nonetheless, given that many VIPs were discovered only (Methods and Table S4 ). These 241 immune VIPs include the VIPs 2 classified as antiviral (Table S4 ) throughout this manuscript. In total, 162 overlapping GO 3 cellular and supracellular processes have more than 50 VIPs (Table S3) . These 4 observations confirm that viruses interact with proteins involved in the majority of basic 5 cellular processes. To disentangle whether the slower evolution of VIPs is due to stronger purifying 14 selection or to a lower rate of adaptation, we use the ratio of non-synonymous 15 polymorphisms to synonymous polymorphisms pN/pS rather than the dN/dS ratio. Unlike 16 dN/dS that is strongly influenced by both the effects of purifying selection and 17 adaptation, pN/pS is primarily determined by the efficiency of purifying selection in 18 removing deleterious non-synonymous mutations. Genome-wide polymorphisms required to measure pN/pS at the scale of the proteome 20 have become available for humans (Abecasis et al., 2012) (1,000 Genomes Project) 21 (Table S5) , and chimpanzee, gorilla, and orangutans (Prado-Martinez et al., 2013) (Great Apes Genome Project) ( Table S6 ). The 1,000 Genomes Project and the Great The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/029397 doi: bioRxiv preprint suited for the estimation of the pN/pS ratio in as many proteins as possible. Specifically, 1 we measure pN/pS as the average across non-human great apes (or as the average in 2 the 1,000 Genomes African populations; Supplemental Methods) using the data from the 3 largest chimpanzee, gorilla, and orangutan populations in order to further limit the noise In line with this, VIPs also show an excess of low frequency (≤10%) deleterious non-12 synonymous variants compared to non-VIPs ( Figure S3 ). In great apes, the average The higher level of purifying selection in VIPs might be due to the fact that VIPs 22 participate in the more constrained host functions, or, alternatively, because within each 23 specific host function, viruses tend to interact with the more constrained proteins. In 24 order to assess these two non-mutually exclusive scenarios we generated 10 4 control 25 sets of non-VIPs chosen to be in the same 162 Gene Ontology processes as VIPs (GO 26 processes with more than 50 VIPs; Table S3 and Methods). In great apes, GO-matched non-VIPs still have a much higher pN/pS ratio compared to 28 VIPs, suggesting that VIPs tend to be more conserved than non-VIPs from the same GO The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/029397 doi: bioRxiv preprint 6 permutation test P=0 after 10 9 iterations). The stronger purifying selection acting on VIPs 1 is apparent within most functions. Figure 1C shows stronger purifying selection in the 20 2 high level GO categories with the most VIPs. In all the 20 GO categories pN/pS is lower 3 in VIPs than in non-VIPs, and the difference is significant for 17 of these categories 4 (Table S3 ). This shows that within a wide range of host functions, viruses tend to target 5 the most conserved proteins. Interestingly, even immune VIPs (Table S4 ) have a significantly reduced pN/pS ratio 7 compared to immune non-VIPs ( Figure 1C ), which suggests that immune proteins in 8 direct contact with viruses are more constrained. The reduction in pN/pS in non-immune 9 VIPs (no antiviral or any other immune function, Table S4 ) is very similar to the reduction 10 observed in the entire set of VIPs ( Figure 1C ). Table S3 Table S5 ). Since VIPs are more 23 constrained than non-VIPs and tend to have more non-synonymous deleterious low 24 frequency variants than non-VIPs (Figures 1 and S3 The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/029397 doi: bioRxiv preprint 7 The classic MK test is known to be biased downward by the presence of slightly 1 deleterious non-synonymous variants and this bias is difficult to eliminate fully even by 2 excluding low frequency variants (Messer and Petrov, 2013) . Note that our application 3 of the classic MK test to discover the higher rate of adaptation in VIPs compared to non-4 VIPs is conservative given that the VIPs have a higher proportion of slightly deleterious We therefore apply an asymptotic modification of the MK test known to provide 10 estimates of α without a downward bias in the presence of slightly deleterious variants 11 (Messer and Petrov, 2013) . To further validate the asymptotic MK test we carry out 12 extensive population simulations (Messer, 2013) to show that this test is indeed robust to 13 a number of potential biases (Supplemental Methods and Table S8 ). 14 Using the asymptotic MK test we estimate that in VIPs, ~27% of the 1,897 amino acid 15 substitutions along the human lineage were adaptive ( Figure 1D ). This proportion is 16 three times higher than the estimated proportion of ~9% in non-VIPs ( Figure 1D ). Thus, although VIPs represent only 13% of the orthologs in our dataset, we estimate that in 18 human evolution they account for almost 30% of all adaptive amino-acid changes. Note The high α in VIPs is not explained by higher rates of adaptation in the host GO 22 processes where VIPs are well represented ( Figure S4 and Methods). Furthermore, the 23 large difference in α observed between VIPs and non-VIPs is robust to a number of 24 potentially confounding factors such as recombination, GC content or gene length (Table 25 S9 and Supplemental Methods). The lower pN/pS in VIPs does not explain their higher α 26 either (Table S9 ). We further use the classic MK test (excluding variants below 10%) to investigate the 28 excess of adaptation for the specific VIPs of ten human viruses and in the 20 high level 29 GO categories with the most VIPs ( Figure 1E and F). We do not use the asymptotic MK Finally and importantly, the 80% of VIPs with no known antiviral or broader immune 6 function (Table S4 ) have a strongly increased rate of adaptation according to both the 7 classic MK test (α=0.26 in VIPs versus -0.02 in non-VIPs, permutation test P=3x10 -7 ; 8 Figure 1F ) and the asymptotic MK test, with the latter estimating α=38% in non-immune 9 VIPs against only 11% for non-immune non-VIPs. Intriguingly, unlike for non-immune 10 VIPs or all VIPs considered together (top of Figure 1F ), immune VIPs, including antiviral 11 VIPs (Table S4) , do not show any increase of adaptation compared to immune non- VIPs. We speculate that this pattern might reflect the masking effect of balancing The increased rate of adaptation in VIPs in the human lineage strongly suggests that 20 VIPs in our dataset, 95% of which interact with modern viruses (Table S1 ), were also 21 VIPs during past human evolution. It is also plausible that a substantial proportion of the The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/029397 doi: bioRxiv preprint 1 mammalian tree used for the analysis (Methods). For a specific coding sequence, the 2 BS-REL test estimates the proportion of codons where the rate of non-synonymous 3 substitutions is higher than the rate of synonymous substitutions (dN/dS>1), which is a 4 hallmark of adaptive evolution. The BS-REL test then compares two competing models 5 of evolution, one with adaptive substitutions and one without adaptive substitutions, and 6 decides which of the two models is the best fit. For each branch of the tree, the BS-REL 7 test provides a P-value that corresponds to the probability that no adaptation occurred in 8 the branch. The product of P-values across all branches in the tree then gives the 9 probability that no adaptation occurred anywhere along the entire tree (Supplemental 10 Methods). The product of P-values is a good measure of whether a specific protein experienced The purifying selection-wise permutation test shows that adaptation has been much 29 more common in VIPs than in non-VIPs across mammals (Figure 2 ). We estimate that VIPs have experienced 77% more adaptation compared to non-VIPs (Figure 2A) . In 31 total, this represents ~76,000 more adaptive amino acid changes in VIPs compared to 32 . CC-BY 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/029397 doi: bioRxiv preprint non-VIPs. We further use an increasingly strict level of evidence for the presence of 1 adaptation, by including only proteins with increasingly low products of P-values; that is, 2 increasingly low probability that no adaptation occurred (Figure 2A ). Figure GO processes with a strong excess of adaptation include cellular processes such as 17 transcription, signal transduction, apoptosis, or post-translational protein modification, 18 but also supracellular processes related to development ( Figure 2C and Table S3 ). Importantly, VIPs with no known immune function (Table S4) Since 95% of the VIPs were discovered for viruses infecting humans, it is possible that 24 the observed excess of adaptation in VIPs in mammals is due to higher rates of 25 adaptation exclusively in the primate branches of the mammalian tree ( Figure S1 ). However, all mammalian clades in the tree show a similar excess of adaptation in VIPs 27 ( Figure 2D ). Primates stand out due to their low overall proportions of positively selected 28 codons compared to the other mammalian clades in the tree ( Figure 2D ). This is most The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/029397 doi: bioRxiv preprint well-known antiviral VIPs ( Figure 3A) , antiviral VIPs where adaptation was previously 1 unknown ( Figure 3B) , and non-antiviral VIPs with diverse, well-studied functions in the 2 mammalian hosts ( Figure 3C ). This phylogenetically widespread excess of adaptation 3 implies that many of the VIPs annotated in humans were also VIPs for a substantial To identify a new non-antiviral protein we first exclude all VIPs with a well-known 16 antiviral activity (Table S4 ) and then select all remaining VIPs with strong overall 17 evidence of adaptation (Table S10 ) and at least 10 branches with signals of adaptation. Because we want to understand how adaptation to viruses proceeded, we then select 19 proteins with i) at least one available tertiary structure, ii) amino acid level resolution of 20 the interaction with one or more viruses, and iii) host tropism. The most positively selected non-antiviral VIP that fulfills all these requirements is Figure 4D ). The consensus was regained only two times after loss 28 ( Figure 4D ). This means that the signals of adaptation detected at the first and third Although we already find a strong signal of increased adaptation, the amount of adaptive 18 evolution that can be attributed to viruses is probably underestimated by our analysis. First, there may still be many undiscovered VIPs. Within the past few years, there has 20 been no sign that the pace of discovery of new VIPs is slowing down ( Figure S2 host? We show that there has been so much adaptation in VIPs that it is very hard to 23 imagine that none of these adaptive events had any consequences on host phenotypes. Interestingly, VIPs tend to be multifunctional proteins. Indeed they represent 13% of all 25 the orthologs in the analysis, 33% of the orthologs with 60 or more annotated GO 26 processes, and 40% of orthologs with 100 or more GO processes ( Figure S8A ). Pleiotropy is more likely in proteins with many functions (He and Zhang, 2006), and the 28 subset of VIPs with many annotated GO processes has an excess of adaptation that is 29 very similar to the one observed when using all VIPs ( Figure S8B ). Adaptation to viruses 30 could thus have affected the evolution of host phenotypes in unexpected ways. In this 31 respect, it is particularly intriguing that VIPs have experienced highly increased rates of 32 adaptation within host functions such as development or neurogenesis (Table S3) . . CC-BY 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/029397 doi: bioRxiv preprint We identified 1,256 VIPs out of a total of 9,861 proteins with orthologs in the genomes of 16 the 24 mammals included in the analysis ( Figure S1 and Tables S1 and S2). Annotation The pN/pS-based purifying selection-wise permutation test We created a permutation test that compares VIPs and non-VIPs with the same amount 12 of purifying selection. This is achieved by using the pN/pS ratio as a proxy for purifying 14 Retrieving of ANPEP mammalian coding sequences 15 We analyzed patterns of adaptation in ANPEP in a tree of mammals including 84 16 species. These species are the ones with annotated, known or predicted mRNAs ( Table 17 S11 for their Genbank identifiers). The coding sequences were extracted from the 18 mRNAs and aligned with PRANK. Gene Ontology-matching control samples 20 We created a permutation scheme that compares VIPs with random samples of non- The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/029397 doi: bioRxiv preprint 1 processes with the highest number of VIPs. The full GO process name for "protein 2 modification" as written in the figure is "post-translational protein modification". D) Asymptotic MK test (Supplemental Methods) for the proportion of adaptive amino acid 4 substitutions (α) in VIPs (blue dots and curve) and non-VIPs (red dots and curve). Pink 5 area: superposition of fitted logarithmic curves for 5,000 random sets of 1,256 non-VIPs 6 (as many as VIPs) where the estimated α falls within α's 95% confidence interval. E) 7 Classic MK test (Supplemental Methods) for VIPs (blue dot) and non-VIPs (red dot and 8 95% confidence interval) for the ten viruses with 50 or more VIPs. F) Same as E) but for 9 the 20 top high level GO processes with the most VIPs below the dotted black line. Above the dotted black line: the classic MK test for all VIPs, for non-immune VIPs and 11 for immune VIPs (Table S4 ). See also Tables S3, S4, S5, S6, S7, S8 and S9 and VIPs (blue dot) and non-VIPs (red dot and 95% confidence interval) in the mammalian 12 clades represented by more than one species in the tree. All: entire tree. Primata: . CC-BY 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/029397 doi: bioRxiv preprint

projects that have annotations to this span

Unselected / annnotation Selected / annnotation
CORD-PICO (1)