Glycoproteomics of viruses While glycoprofiling studies provide important information on distribution and composition of different glycan structures present on a given protein, it lacks the information on individual glycosite location, occupancy and glycan structure heterogeneity. The recent development of instruments equipped with ECD or ETD MS2 fragmentation has allowed for simultaneous determination of the peptide sequence and the position to which the carbohydrate moiety is attached (Levery et al. 2015). When relevant reporter glycan oxonium ions are present and it is easy to predict the glycosylation position within peptides, HCD MS2 fragmentation can also be used (Wuhrer et al. 2007). Interfacing the mass spectrometer with capillary liquid chromatography enables separation of complex proteome-scale glycopeptide mixtures and allows identification of thousands of glycopeptides in a single run. At the single protein level, it is now possible to determine the exact position, structure heterogeneity, and site occupancy for each individual glycosite (Wuhrer et al. 2007). Both N- and O-glycosylation can be analyzed by tandem MS. Biosynthetic features of N-glycosylation enable relatively easy identification and quantification of deglycosylated sites by MS2 sequencing. Due to asparagine deamidation during enzymatic N-glycan removal, the site occupancy can be determined by calculating the intensity ratio of naked (Asn) and deglycosylated (Asp) peptides (Pabst et al. 2012; An et al. 2015). However, the more appropriate methodology includes carrying out the reaction using heavy-oxygen water (O18–H2O), which discriminates deglycosylated sites from spontaneous Asn deamidation (Palmisano et al. 2012; Cao et al. 2017). It is also possible to evaluate the overall distribution of N-glycosite occupancy on intact glycoproteins by metabolically simplifying the glycans to homogeneous structures (Struwe et al. 2017). For O-linked glycans, complete O-glycan removal does not result in chemical peptide modification and cannot be used for identification of glycosylated amino acid positions (Levery et al. 2015). ETD and ECD MS2 techniques allow analysis of intact O-glycopeptides by fragmentation of the peptide backbone without the loss of O-glycan modification, however, it is of limited use for determining site occupancy (Wuhrer et al. 2007; Sihlbom et al. 2009; Zauner et al. 2012). Approximations are yet often made by comparing intensities of nonmodified and glycosylated peptides (Brautigam et al. 2013; Stansell et al. 2015). The relative quantitation of both N- and O-glycosylated peptides is not as accurate due to different ionization efficiencies of nonglycosylated peptides and peptides carrying complex glycans. Dependent on the complexity of the glycans, glycopeptides generally exhibit poorer ionization compared to peptides (Stavenhagen et al. 2013). The analysis of complex proteins therefore often requires enrichment with hydrophilic interaction chromatography or specific lectins, which is particularly relevant for proteome-wide applications (Bunkenborg et al. 2004; Zielinska et al. 2010; Khatri et al. 2014; Levery et al. 2015). However, information regarding site occupancy needs to be sacrificed. N-glycoproteomics Tandem mass spectrometry has been widely used for characterization of N-glycosylation sites on viral proteins, with respect to individual site occupancy status (macroheterogeneity) and site-specific structural diversity (microheterogeneity), as these are important features that can affect protein–protein interactions and immunogenicity. Comprehensive glycoproteomic analysis has, for example, mapped N-linked glycosylation of seasonal influenza A virus H3N2 HA, identifying more than 90 % site occupancy of all putative N-linked glycan sites (An et al. 2015). Moreover, the globular head glycosites, associated with host immune receptor interaction, were strictly high-mannose type (An et al. 2015). In a separate study on the highly pathogenic H5N1 influenza A virus, it was also shown that all potential N-glycosites were consistently occupied between several different strains (Blake et al. 2009), suggesting that N-glycosylation of HA is conserved between the different isolates of the same virus subtype, with a high occupancy of potential N-glycosites. In a similar way, it has been demonstrated that all predicted N-glycan sequons were utilized in Hepatitis C virus E2 and Murray Valley encephalitis virus NS1 (Blitvich et al. 2001; Iacob et al. 2008), where the majority of HCV E2 glycosites were modified with high-mannose type glycans (Iacob et al. 2008). On the densely glycosylated HIV-1 envelope glycoprotein several strategies have been employed to characterize the nature and location of the many glycans, with up to 27 mostly highly occupied N-glycosites identified, some of which were exclusively high-mannose type (Pabst et al. 2012; Go et al. 2013; Yang W et al. 2014). Tandem mass spectrometry has also been used for addressing differences in cell-type specific glycosylation and influence of protein conformation. Out of convenience, soluble HIV-1 gp120 or gp140 preparations are often analyzed. HIV-1 gp120 produced in CHO and 293 T cells had very similar occupancy, degree of fucosylation, sialylation, and glycan maturation, with a larger share of glycosites predominantly carrying hybrid and complex-type N-glycans (Go et al. 2013). The positions of some of the exclusively high-mannose N-glycans were located within the intrinsic mannose patch of gp120 (Go et al. 2013; Behrens et al. 2016). In contrast, a much higher proportion of the N-glycosites on gp140 expressed in CHO, recombinant trimers, or those on gp120 purified from native virions were modified with high-mannose type N-glycans (Pabst et al. 2012; Behrens et al. 2016; Panico et al. 2016; Go et al. 2017). This again signifies the importance of analyzing the native protein conformation for generation of relevant glycosylation patterns. Glycosylation in different cell lines was also investigated for Hendra virus recombinant glycoprotein G, where all seven potential N-glycan sites were occupied in HeLa cells (Colgrave et al. 2012). In contrast, only four sites were N-glycosylated in HEK293, although the degree of glycan maturation was similar in both cell lines (Colgrave et al. 2012). To summarize the results obtained from various N-glycoprofiling and glycoproteomic studies, it seems that most of the putative N-glycosylation sequons are glycosylated with high occupancy on viral proteins. However, the site occupancy of viral protein N-glycosylation can vary in different producer cell lines. Moreover, high-mannose type N-glycans may constitute a substantial, if not the major, proportion of viral N-glycosites, particularly when native protein structure and oligomerization is taken into account. Thus, results obtained from analysis of recombinant monomeric proteins should be interpreted with caution also when considering the occupancy of individual N-glycosylation sites as well as complexity of glycan structures as discussed above. O-glycoproteomics For decades very little information has been available regarding site-specific O-glycosylation of viral proteins. Some of the first described virus-derived site-specific O-glycans were on isoform M of HBV surface antigen purified from patient-derived viral particles (Schmitt et al. 1999). The O-glycosylation site was identified by combining MALDI analysis of exoglycosidase-treated glycopeptide and Edman sequencing of the underlying peptide. The position of the glycan attachment site was deduced by carboxypeptidase digestion and confirmed by collision-induced dissociation tandem mass spectrometry, representing some of the early glycoproteomic experiments of viral proteins (Schmitt et al. 1999). The recent advances in mass spectrometry-based proteomics have resulted in numerous studies addressing O-glycosylation of individual viral glycoproteins. O-glycoproteomic analyses have been performed for several recombinant or isolated viral glycoproteins, including HIV-1 gp120, influenza A virus HA1, HCV E2, HSV-1 gC and Hendra virus glycoprotein G (Colgrave et al. 2012; Brautigam et al. 2013; Go et al. 2013; Yang W et al. 2014; Norden et al. 2015; Stansell et al. 2015). Recombinantly expressed gp120 and HA1 were glycosylated at a single position each, which in both cases was modified with core 1 or core 2 elongated O-glycans (Stansell et al. 2015). Interestingly, site occupancy was much lower in recombinant gp140 undergoing proteolytic cleavage to gp120, assuming equivalent ionization efficiencies of nonmodified and glycosylated peptides. Recombinant gp140 possessed shorter less sialylated, predominantly core 1 O-glycans, again underlining the importance of native protein conformation for analogous studies (Stansell et al. 2015). In contrast, gp120 purified from T-cell derived virions was devoid of the single site found glycosylated in recombinantly expressed gp120 (Stansell et al. 2015). This, however, might be related to the viral strain used, as the single site was reported to be glycosylated in a separate study using virions from a different HIV-1 strain (Yang W et al. 2014). In addition, treatment of passaged or plasma-derived HIV-1 virions with antibodies against O-linked carbohydrate structures resulted in inhibition of cell entry, and virus neutralization (Hansen et al. 1990, 1991), suggesting that HIV-1 gp120 can indeed be O-glycosylated in vivo. Comparison of recombinant gp120 O-glycosylation in two different cell lines revealed predominant core 1 O-glycosylation in CHO cells, compared to core 1, core 2 and core 4 in 293 T cells (Go et al. 2013). In a similar manner, Hendra virus glycoprotein G expressed in HeLa and HEK293 cells, differed considerably with different numbers of O-glycosites identified and carrying different core structures (Colgrave et al. 2012). Recombinant HCV E2 was O-glycosylated at six positions, with predominantly core 1 and core 2 O-glycan structures (Brautigam et al. 2013). More than 80 % occupancy was estimated for five of the six sites, whereas one site had very low occupancy. Moreover, a high level of structural heterogeneity was observed for the O-glycans localized at the individual sites, with up to 14 different structures identified (Brautigam et al. 2013). A recent study on O-linked glycosylation of HSV-1 mucin-like protein gC provided some insight into O-glycan synthesis, suggesting that the eleven O-glycosites were added in an orderly fashion, before elongation took place (Norden et al. 2015). These studies underscore the high heterogeneity of O-glycan structures, which are both cell type and protein specific, and the need for careful selection of candidates and expression cell lines for clinical applications. Moreover, comprehensive analysis of immune responses mounted by these different structures would be highly beneficial. Global O-glycoproteomics While it is clear that single protein-targeted mass spectrometry approaches can provide comprehensive information on single site occupancy and structure heterogeneity, we lack robust methods of analysis for site-specific O-glycosylation in complex or proteome-wide samples. To solve this problem, we recently introduced a method for globally mapping O-glycosylation sites in glycoengineered cell lines lacking O-linked glycan elongation, which is based on Vicia villosa lectin (VVA) affinity enrichment of simple glycopeptides coupled to tandem mass spectrometry (Steentoft et al. 2011). The method is also applicable for analysis of wild type cells, predominantly expressing core 1 O-glycans, by using peanut agglutinin (PNA) enrichment of desialylated glycopeptides (Yang Z et al. 2014). We have applied these methods in the analysis of O-glycosylation of HSV-1 infected human fibroblasts by performing a sequential enrichment with PNA and VVA, thus reporting the first comprehensive viral O-glycoproteome (Bagdonaite et al. 2015). This approach provides several clear advantages: first of all, it allows simultaneous analysis of all viral glycoproteins expressed in an infected cell. Secondly, the strategy takes into account the endogenous glycosylation of a permissive cell, dictated by the repertoire of glycosyltransferases, as well as native conformations of proteins and the cytopathic effects of viral infection. Irrelevant cell lines are often chosen for recombinant expression of viral proteins. The glycosylation obtained in these cells lines does not always reflect the glycosylation pattern in a natural host. Using herpesviruses as a model system, we have applied the same method for defining the O-glycoproteomes of other members of Herpesviridae family—HSV-2, VZV, HCMV and EBV (Bagdonaite et al. 2016; Iversen et al. 2016). The wide occurrence, associated complications and shortage of prophylactic measures make herpesviruses a relevant model system to analyze O-glycans and their importance in viral life cycle (Vazquez et al. 2001; Cohen et al. 2006; Adjei et al. 2008; Kramer et al. 2008; Oxman 2010; Shiley and Blumberg 2010; Lopo et al. 2011; Sauerbrei et al. 2011; Levine et al. 2012; van Rijckevorsel et al. 2012; Astuto et al. 2013; Conde-Glez et al. 2013; Fishman 2013; Gorfinkel et al. 2013; Odland et al. 2013; Pembrey et al. 2013; Rowe et al. 2013; Awasthi and Friedman 2014; Bradley et al. 2014; Fu et al. 2014; Sabugo et al. 2014; Sili et al. 2014; Chen et al. 2015; Cohen 2015; Korndewal et al. 2015; Shaiegan et al. 2015). In addition, the large proteomes of herpesviruses highlight the benefits of global viral O-glycoproteomics. Human herpesviruses encode seven to 12 glycoproteins associated with the viral particle; however, many more viral proteins possess signal peptides and transit through the host secretory pathway. Some of them have previously been investigated for glycan modifications in focused studies. N-linked glycans have been identified on viral envelope glycoproteins from all 8 human herpesviruses (Wenske et al. 1982; Edson and Thorley-Lawson 1983; Friedrichs and Grose 1984; Serafini-Cessi et al. 1984, 1985, 1989; Montalvo et al. 1985; Montalvo and Grose 1986, 1987; Gong et al. 1987; Britt and Vugler 1989; Gong and Kieff 1990; Okuno et al. 1990, 1992; Foa-Tomasi et al. 1992; Nolan and Morgan 1995; Pfeiffer et al. 1995; Hata et al. 1996; Mukai et al. 1997; Chandran et al. 1998; Pertel et al. 1998; Huber and Compton 1999; Li et al. 1999; Zhu et al. 1999; Baghian et al. 2000; Skrincosky et al. 2000; Wu et al. 2000; Maresova et al. 2000; Theiler and Compton 2002; Koyano et al. 2003; Paulsen et al. 2005; Yamagishi et al. 2008; Gore and Hutt-Fletcher 2009; Luo et al. 2015), where individual glycoproteins have been demonstrated to exhibit variable extent and pattern of glycan chain maturation (Wenske et al. 1982; Edson and Thorley-Lawson 1983; Friedrichs and Grose 1984; Serafini-Cessi et al. 1984, 1985, 1989; Montalvo et al. 1985; Montalvo and Grose 1986, 1987; Britt and Vugler 1989; Gong and Kieff 1990; Okuno et al. 1990, 1992; Huber and Compton 1999; Maresova et al. 2000; Theiler and Compton 2002; Yamagishi et al. 2008). A relatively smaller proportion of envelope glycoproteins of herpesviruses have been investigated in terms of O-glycosylation, and in some of these envelope proteins O-glycans have been detected by biochemical assays (Serafini-Cessi, Dall’Olio, Scannavini, Costanzo et al. 1983; Montalvo et al. 1985; Gong et al. 1987; Montalvo and Grose 1987; Serafini-Cessi et al. 1988, 1989; Britt and Vugler 1989; Kari et al. 1992; Yao et al. 1993; Nolan and Morgan 1995; Borza and Hutt-Fletcher 1998; Cardinali et al. 1998; Lake et al. 1998; Peng et al. 1998; Torrisi et al. 1999; Zhu et al. 1999; Wu et al. 2000; Theiler and Compton 2002; Xiao et al. 2007). Only a few of these proteins have merited more thorough investigation with most attention devoted to proteins containing mucin-like domains. HSV-1 attachment factor gC was the first envelope glycoprotein described to carry O-glycans, acquiring distinct structures in different cell types (Olofsson et al. 1981, 1983; Dall’Olio et al. 1985; Lundstrom et al. 1987), and specific O-glycosites have recently been mapped to the mucin-like region (Bagdonaite et al. 2015; Norden et al. 2015). Furthermore, the HSV-2 and VZV orthologs were also found to be O-glycosylated (Zezulak and Spear 1983; Bagdonaite et al. 2016). Similarly, other mucin-like region-containing proteins such as HSV-1 gI, HSV-2 gG, EBV gp150 and gp350 have been shown to accommodate high density of O-glycosylation, and the types of O-glycan structures were identified for some of these proteins (Serafini-Cessi et al. 1985, 1989; Nolan and Morgan 1995; Borza and Hutt-Fletcher 1998; Norberg et al. 2007). The conserved viral fusion effector gB has been shown or predicted to be O-glycosylated in all herpesvirus subfamilies (Serafini-Cessi, Dall’Olio, Scannavini, Costanzo et al. 1983; Gong et al. 1987; Montalvo and Grose 1987; Britt and Vugler 1989). The era of proteome-wide mass spectrometry-based applications allowed robust characterization of viral O-glycoproteomes (Bagdonaite et al. 2015, 2016; Iversen et al. 2016). The characterizations confirmed the identity of the majority of previously described O-glycoproteins of herpesviruses, and provided a tremendous expansion of site-specific O-glycosylation (Bagdonaite et al. 2015, 2016; Iversen et al. 2016). While GalNAc-type O-glycosylation is often associated with dense glycosylation in mucin-like regions, it is also abundantly found in isolation or small clusters in human proteins (Steentoft et al. 2013), which is more difficult to predict. In agreement with this, we have demonstrated ample presence of isolated O-glycan sites on viral glycoproteins of HSV-1, HSV-2, VZV, HCMV and EBV by glycoproteomic approaches (Bagdonaite et al. 2015, 2016; Iversen et al. 2016). Location of O-glycosites identified via proteome-wide MS/MS approaches with respect to protein structural features suggests possible involvement in the protein–protein interactions (Bagdonaite et al. 2015, 2016), as exemplified in subsequent sections. Large scale glycoproteomic analyses of human herpesviruses of varying phylogeny (HSV-1, HSV-2, VZV, HCMV and EBV) have made it possible to compare the O-glycosite patterns in homologous proteins (Bagdonaite et al. 2015, 2016; Iversen et al. 2016). Comparison of O-glycosite conservation between alphaherpesviruses HSV-1 and HSV-2 suggests that sequence homology is an important determinant for O-glycosylation in closely related viruses (Figure 1A and F). Isolated homologous glycosites were mainly situated on highly homologous peptide stretches, whereas densely spaced glycosites in Pro/Ser/Thr-rich regions were glycosylated irrespective of low sequence identity, as expected (Bagdonaite et al. 2016). Several glycoproteins are homologous between all herpesviruses, including gB, gH, gL, gM and gN, of which gB, gH and gL comprise the conserved cell entry machinery (McGeoch et al. 2006). We identified a large number of O-glycosites on HSV-1 fusogenic effector gB, and predicted that a number of O-glycosites could be conserved in most, if not all, human herpesviruses (Figure 1A–C) (Bagdonaite et al. 2016). Based on multiple sequence alignments across investigated herpesvirus family members, enrichment of O-glycosylation was found in the extreme N-terminus of gB regardless of the underlying considerable sequence variation between different herpesviruses. This suggests that glycosylation patches are less dependent on the underlying sequence, and might serve a glycan specific function, such as protection of the N-terminal exposed region of gB from proteolytic degradation or immune recognition. In contrast, conserved single glycosites were predominantly found between HSV-1, HSV-2, and, to a smaller extent, VZV, and suggest that they mainly exert subfamily-specific functions. The conserved protein gH, which is another essential component of the fusion machinery, was found glycosylated in four out of five investigated viruses. Although no clear conserved pattern of glycosylation was observed, the O-glycosites were predominantly localized to the two exposed N-terminal domains involved in interaction with other viral proteins (Figure 1D and E) (Bagdonaite et al. 2016). Fig. 1. O-glycosylation of herpesvirus conserved fusion machinery. (A), Crystal structure representation of HSV-1 gB monomer. From “Heldwein EE Lou H Bender FC Cohen GH Eisenberg RJ Harrison S 2006. Crystal structure of glycoprotein B from herpes simplex virus 1. Science, 313:217–220”. Reprinted with permission from AAAS. Blue boxes mark the parts of the molecule where O-glycans are consistently found between at least two investigated herpesviruses. Modified with permission from the authors. (B), (D) and (F), Conservation of O-linked glycosylation sites on homologous envelope glycoproteins of human herpesviruses (from Bagdonaite et al., 2016). Reprinted with permission. © 2008 The American Society for Biochemistry and Molecular Biology. All rights reserved. Clustal Omega server was used to align amino acid sequences of gB (B), gH (D) and gL (F) between HSV-1 (Bagdonaite et al. 2015), HSV-2 (Iversen et al. 2016), VZV (Bagdonaite et al. 2016), HCMV (Bagdonaite et al. 2016) and EBV (Bagdonaite et al. 2016). Protein backbones are depicted as broken black lines, where spaces represent gaps in the alignment. Individual alignments were drawn to scale (indicated below each graph). Sequence conservation is indicated above the aligned sequences for each set, and is represented by a greyscale barcode that maps to the clustal alignment score, as shown in the legend. In brief, for the clustal alignment score, an asterisk indicates positions with fully conserved residues, a colon indicates conservation of amino acids with strongly similar properties, whereas a period indicates conservation of amino acids with weakly similar properties. Predicted signal peptides and transmembrane regions are shaded in pink and blue, respectively. Unambiguous O-glycosylation sites are shown as yellow squares, whereas ambiguous sites are marked as yellow lines within the protein backbone, where the number below indicates the number of glycosites. An ambiguous O-glycosylation site from our previous publication (Bagdonaite et al. 2015, HSV-1 gB 109–123 (HexHexNAc)) was omitted from the graph, as we cannot exclude the possibility it could be part of an elongated structure on an adjacent site. Reference strain sequences were used for HSV-2, VZV and EBV due to incomplete or unavailable annotation of investigated strains. HSV-1—human herpes simplex virus type 1 (strain 17), HSV-2—human herpes simplex virus type 2 (strain HG52), VZV—varicella-zoster virus (strain Dumas), HCMV—human cytomegalovirus (strain Towne), EBV—Epstein-Barr virus (strain AG876). (C) and (E) Cartoon depiction of HSV-1 gB trimers (C) or gH–gL complexes and accessory proteins (E) of the five herpesviruses. O-glycosylation sites are shown as yellow squares. (B) and (C) Colored boxes mark association with herpesvirus gB domains as defined in (A). In summary, global O-glycoproteomics of viruses open up possibilities to rapidly “scan” the proteome of viruses for O-glycan modifications. Although the occupancy and the relevance of the individual glycan sites are still unknown, the information can be used to follow up by complimentary techniques at individual protein and glycosite level. It can be applied to any human virus of interest; given relevant propagation systems are available. The method, of course, has its limitations, such as a limited number of glycoforms that can be captured, as well as the availability of protein sequences in the databases, which is challenging when analyzing emerging or poorly annotated viruses, as well as clinical isolates. Another aim for the future is to make the results broadly available to the scientific community not only by means of publishing, but also by inclusion into public protein databases. Ideally, a virus database compiling structural data, sequence variability, available glycomic and glycoproteomic data as well as antigenic sites could be created to advance basic and applied research in virology. If sufficient experimental data is compiled, machine learning bioinformatic techniques could be applied to predict glycosylation patterns of emerging viral strains within distinct virus species or even families.