4.2 Sialic acid binding residues Alignment studies like those above are helpful in forming initial hypotheses, but they are clearly not convincing by themselves when the degrees of match between aligned residues is weak; however, it is possible that the actual composition in terms of amino acids in the subsequence and potential motif of interest is more important than their actual order in that subsequence. Based on many studies exemplified by the above, it was noted that tryptophan is a recurrent, albeit not absolutely essential, feature of sugar binding including sialic acids. References to that also recur throughout the biochemical literature. For example, see e.g. Ref. [14], and also Fig. 2 shows the tryptophan interaction with and sialic acid in the Influenza Virus B neuraminidase (PDB Entry 2BAT). As might be expected by their similar aromatic character, the alternative amino acid residues as sugar binders, and residues frequently supporting tryptophan in the binding site, tend to be aromatic sidechains, notably tyrosine (Y), sometimes phenylalanine (F) and also histidine (H). A preliminary survey of sequence motif patterns in sugar binding proteins suggests that invariant amino acid residues across a family of proteins tend be one or more of the above residues supported by negatively charged aspartate (D), asparagine (N), serine (S), threonine (T) glycine (G) and sometimes alanine (A) that provide the hydrogen bonding. However, particularly in regard to the non-aromatic residues, the binding of acidic and non-acidic sugars should probably be distinguished. As discussed later below, charged amino acid residues glutamate (E), arginine (R), and lysine (K) also frequently make intimate contact with sialic acids but that is in three dimensions, not together in a subsequence. A likely relevant observation was that the first set of amino acid residues (the set containing aspartate) and binding sialic acids tended to occur in a subsequence that adopted a local loop conformation, while the second set (that containing glutamate) were frequently associated with α-helices and particularly their termini. However, this was an empirical and qualitative observation regarding a tendency, and a more objective quantification of the importance of the aspartate set is the purpose of the prediction algorithm developed below. Three dimensional considerations, however, give insight and sometimes explain why influences can be somewhat indirect. Hydrogen bonding that occurs between the hydroxyl groups of carbohydrate ligands and polar amino acid residues at the binding site is typically supported by water-mediated hydrogen bonding networks in which serine and threonine are fairly commonly involved. Nonetheless, the most outstanding feature of carbohydrate binding sites from a three dimensional perspective would appear to be the position and orientation of tryptophan (W), tyrosine (Y), and/or phenylalanine (F), which usually provide a hydrophobic plate for close interaction with the planar face of sugar rings, an interaction resembling hydrophobic stacking interactions, as in Fig. 2. The importance of these and to some extent of histidine (H) in a sequence motif seems reasonable. Fig. 2 The influenza virus B neuraminidase tryptophan interaction with and sialic Acid(PDB entry 2BAT). Along with the occasional appearance of cysteine (C), sometimes as a serine (S) and particularly a threonine (T) substitution, the residues mentioned above can be used as the basis of a preliminary and essentially qualitative model for assessment of sugar binding as given column 4 of Table 1. Often a valine (V) substitution was seen in potential glycan binding sites, although this was not significantly supported by the optimization of the predictive technique described below. However, glycans containing sialic acids appear to bind somewhat differently to other sugars. Taking this as a hypothesis and focusing on these, protein sites for binding them may have a variety of affinities for different subtypes. For example, all influenza A virus strains critically depend on sialic acid to bind to host cells and the different forms of sialic acids all show different affinities that change with influenza A virus variety, important because it determines which species can be infected. There has also been very relevant work that can complicate the details of what can be meant by, for example, “sugar binding motif”. Indeed, Zhang and Yap [13] themselves noted that tryptophan was frequently involved in strong protein fold interactions that stabilized the sugar binding domain fold, yet lacked any direct interactions with sugars. In the influenza neuraminidase they comprised three pairs of main chain-side chain interactions: tryptophan (W) 171 (donor) and phenylalanine (F) 179 (acceptor), alanine (A) 210 (donor) and phenylalanine (F) 29 (acceptor), leucine(L) 209 (donor) and tryptophan (W) 171 (acceptor). For example, tryptophan accepted one N–H⋯π bond from Leucine (L) 209 and donates one N–H⋯π bond to phenylalanine 179. It is possible that the aromatic residues could be induced to be more exposed in certain types of binding, but the above interactions appeared to stabilize the structure of S1 fold motif while reducing the active site cavity for ligand binding [13].