Identification of a Somatic DUP4 Variant However, we also visualized fibers with an extra partial repeat unit, which we called DUP4b (Figure 1C). This novel variant carries an extra copy of the partial A-B repeat, which harbors the GYPB/GYPA fusion gene. We selected a fosmid probe that spanned the 16 kb insertion specific to the GYPA repeat and showed that the extra copy was at least partly derived from the A repeat, consistent with the extra copy being an extra copy of the partial A-B repeat (Figure S1). To rule out large-scale karyotype changes being responsible for our observations of the additional novel variant (DUP4b), we analyzed metaphase spreads of HG02554 lymphoblastoid cell line using metaphase-FISH, interphase FISH, and multiplex-FISH karyotyping (Figure S2). DUP4 and reference chromosomes could be distinguished by interphase-FISH on the basis of hybridization intensity of a fosmid probe mapping to GYPB (Figure S2B). No evidence of large-scale inter- and intrachromosomal rearrangements or aneuploidy was found in any of our experiments. We hypothesized that DUP4b is a somatic variant that occurred through rearrangement of the original DUP4 variant (which we call DUP4a), but not the reference variant. If this is true, we would expect to observe an equal number of reference and DUP4 fibers from each of the parental chromosomes confirming the heterozygous DUP4 genotype of the source cells, but for the DUP4 fibers to be subdivided into DUP4a and DUP4b variants. Of 24 fibers examined from HG02554, 12 were reference and 12 were DUP4, and, of the 12 DUP4 variants, 7 were DUP4a and 5 were DUP4b, strongly supporting the model where DUP4b is a somatic rearrangement of DUP4a and the presence of two sub-clones (populations) of cells, one with reference and DUP4a haplotypes the other with reference and DUP4b haplotypes. We also analyzed the HG02554 cell line from the Oxford laboratory used in their study5 and confirmed the existence of DUP4b by fiber-FISH. The high frequency of DUP4b variant chromosomes within the cell lines together with the observation of DUP4b in two cell line cultures suggests that DUP4b is a somatic variant of DUP4a that has arisen prior to the passage received by the Oxford laboratory or the Wellcome Sanger Institute, either in the donor individual or early in the cell-culturing process, perhaps increasing in frequency due to the associated transformation cell bottleneck.37 To further characterize the somatic variation observed in HG02554, we Illumina sequenced at high depth (50×) HG02554 DNA purchased directly from Coriell Cell Repositories and extracted from their HG02554 lymphoblastoid cell line rather than extracted from our cell lines, together with peripheral-blood derived genomic DNA from two Tanzanian DUP4 homozygotes and two Tanzanian DUP4 heterozygotes. Analysis of sequence read depth across the glycophorin repeat region showed the same pattern as that observed previously,5 leading to a model that is confirmed by our fiber-FISH data (Figure 2A). DUP4 homozygotes show the expected increase to four copies and six copies in duplicated and triplicated regions, respectively. Figure 2 Sequence Read Depth Analysis of DUP4 Homozygotes and Heterozygotes (A) Normalized sequence read depth of 5 kb windows spanning the reference sequence glycophorin region for five samples. The lines show the Loess regression line (f = 0.1) for homozygotes (blue) and heterozygotes (green). Gene positions and repeats, with respect to the reference sequence, are shown above the plot. (B) The difference in HG02554 sequence read depth compared to the average sequence read depth of the two other heterozygotes C05E and C05P is shown in 5 kb windows across the glycophorin region. Points highlighted in red are significantly different (p < 0.01). We then compared the sequence read depth of HG02554 to the other two DUP4 heterozygotes to search for evidence of an increased copy number of the BAB partial repeat carrying the GYPA/GYPB fusion gene (Figure 1) suggested by our fiber-FISH data, which would reflect somatic mosaicism. HG02554 indeed shows a significant increase in DNA dosage in regions matching the BAB partial repeat, of around about 0.5, reflecting an extra copy of the region in ∼50% of cells (Figure 2B).