3.4 Sequence variants in NPC transcriptome Sequence variants analysis detecting SNP and short INDELS was carried out using VarScan [41]. The 17,389 SNP/INDELS were detected from NP460, HK1, and C666 and the results are summarized in Fig. 5A and Supplement 5. Sequence variants, which are not solely from X666, are also listed in Supplement 5, while variants solely from X666 were removed due to possible contamination of mouse sequences. A total of 62% (10,929/17,389) of variants are from non-protein coding regions, while around 37% (6460/17,389) are from the exonic regions of the genome (Fig. 5A). Of the exonic variants, around one-third (2743/6460) are non-synonymous or frameshift or stopgain/loss, which affect the protein products of the gene. Moreover, 98 out of 2743 protein-affected variants are genes from the catalogue of somatic mutations in the cancer (COSMIC) panel [49]. Of note, we discovered a novel TP53 mutation at Chr17:7,579,335T>G in NP460, HK1, and C666 cell lines. Differences in polymorphism of GSTP1 (Chr11:67,352,689A>G) across NP460, HK1, and C666-1 were also found from the NGS data (Fig. 5B). Sequence variation in the UTR may affect the binding of its regulatory miRNA and influence the level of mRNA expression [50]. We sought to analyze the UTR variant that disrupts the miRNA binding and affects mRNA expression in cell lines. The 712 UTRs from the reference sequence were predicted to be miRNA binding sites. Of these miRNA binding sequences, 452 miRNA binding sites were disrupted by the variation and 184 target sites were predicted to be enhanced (Supplement 6). We further examined the disrupted/ enhanced miRNA-binding pairs by referring to the expression of their corresponding mRNAs and miRNAs. Compared to NP460, almost 100% of the genes with enhanced variants at the UTR in HK1 and C666 responded to the changes of the paired miRNAs, while only 41% and 50% of the genes with disrupted variants at the UTR in HK1 and C666 did so. Three examples of the genes with and without disrupted/enhanced variants are illustrated in Supplemental Fig. 2. From the small library NGS data, we have analyzed human and EBV-encoded isomiRs based on miRBase and these are listed in Supplements 7, 8 and 9, respectively. Supplement 9 shows the selected top three EBV-encoded isomiRs from C666 and X666 with substantial expression of isomiRs discovered in RNASeq. The most abundant sequences in 18 C666 and 19 X666 EBV miRNAs are not the reference ones. Four miRNAs differed from their most abundant isomiRs in C666 and X666. No reference reads of BART-19–5p were detected in X666, while only one read detected in C666, mature reference sequence was out of top three of all detected sequences in mir-BART18-3p in both C666 and X666 (Supplement 9). Mature BHRF-1-1-5p was detected in both C666 and X666, while BHRF1-2-3p and BHFR1-2-5p were detected in X666 only (Table 1a). Four novel EBV-encoded miRNAs have been reported by Chen et al. [26] from clinical samples, which are not included in the miRBase and further reports are lacking. We have detected three (BART16-3p, BART22-5p, BART12-5p) in both C666 and X666 samples; the mature sequences as described by Chen et al. [26], are the most abundant in BART16-3p and BART22-5p (Table 1b).