Haplotype Classification Based on Linkage Disequilibrium We defined haplotypes centered on rs334 in the 504 continental Africans from the 1000 Genomes Project. We recorded pairwise linkage disequilibrium (LD) between rs334 and all phased, diallelic markers on chromosome 11 (Figure 2). The largest value of r2 with rs334 was 0.407. There was only one marker, rs149481026, with r2≥0.4. This one marker was more strongly associated with rs334 (V˜2=0.406) than the set of four RFLP-predicting markers was (Figure 1A). Figure 2 Linkage Disequilibrium with rs334 We calculated pairwise r2 with rs334 across chromosome 11 among 504 continental Africans from the 1000 Genomes Project. We plotted r2 for all 4,024,958 phased diallelic markers. To strengthen the association with rs334, we investigated lower levels of r2. There were three markers with r2≥0.3: rs183055323, rs149481026, and rs73404549. These three markers ranged across 132.6 kb, including the entire β-globin cluster. This interval was substantially smaller than the average distance between phasing errors. On the basis of these three markers, we identified five unique haplotypes (Table 3). One haplotype contained all occurrences of the Arabian/Indian, Cameroon, and CAR haplotypes. This haplotype contained the ancestral allele at all three markers. The Senegal haplotype was distributed across all five haplotypes, and the Benin haplotype was distributed across four of the five haplotypes. V˜2 between these three markers and rs334 was 0.753 (Figure 1B). Table 3 Distribution of Sickle Haplotypes under a Sequence-Based Classification Scheme Using Three Markers Name Haplotypea Arabian/Indian Benin Cameroon CAR Senegal Atypical Ancestor 0000 NA NA NA NA NA NA HAPA 0010 1 4 2 38 1 1 HAPB 0011 0 2 0 0 1 0 HAPC 0110 0 1 0 0 1 1 HAPD 0111 0 40 0 0 15 0 HAPE 1010 0 0 0 0 43 1 a 0 indicates the reference allele, and 1 indicates the alternate allele according to the coding scheme in the 1000 Genomes Project VCF files. The sickle site rs334 is underlined. To improve cross-classification with the Senegal and Benin haplotypes, we identified 27 markers with r2≥0.2. These markers extended across 725.3 kb, which is still less than the average distance between phasing errors. On the basis of these markers, we identified 59 unique haplotypes, of which 18 carried the sickle allele at rs334. A 19th sickle haplotype was observed once in the ACB sample, and a 20th sickle haplotype was observed once in the Baganda sample (Table 4). V˜2 between these 27 markers and rs334 was 0.728 (Figure 1C). The most common haplotype carried the ancestral allele at all 28 sites and accounted for 68.5% of all haplotypes in the continental Africans. Globally, the ancestral haplotype had a frequency of 91.9%, was the most frequent haplotype in all 26 samples in the 1000 Genomes Project, and was the only haplotype in 15 of those samples, including all ten samples from East Asia and Europe. 13 of the sickle haplotypes in the Baganda, the one in the Zulu, and all four in the Qatari were identical to HAP1, the haplotype most commonly designated CAR (Table 4). Additionally, the four Qatari carriers had (1) >7.8% autosomal African ancestry, (2) an African mitochondrial haplogroup, or (3) an African Y chromosome haplogroup. The three most common haplotypes (HAP1, HAP16, and HAP20) correlated primarily with the CAR, Benin, and Senegal haplotypes, respectively (Table 4). Table 4 Distribution of Sickle Haplotypes under a Sequence-Based Classification Scheme Using 27 Markers Name Haplotypea Arabian/Indian Benin Cameroon CAR Senegal Atypical Ancestor 0000000000000000000000000010 NA NA NA NA NA NA HAP1 0000000000000000000010000010 1 0 2 37 0 1 HAP2 0000000000000000000010110010 0 4 0 0 0 0 HAP3 0000000000000000000010110101 0 0 0 0 1 0 HAP4 0000000000000000001110110101 0 2 0 0 0 0 HAP5 0000000000000000111110110101 0 5 0 0 0 0 HAP6 0000000000000001100011001010 0 0 0 0 3 0 HAP7 0000000100000100111110110101 0 1 0 0 0 0 HAP8 0000000100000100111110110110 0 0 0 0 1 0 HAP9 0000111011111011100011001010 0 0 0 0 1 0 HAP10 0001000000000000000010000010 0 0 0 1 0 0 HAP11 0001000100000100111110110101 0 2 0 0 1 0 HAP12 0011000100000100101110000010 0 0 0 0 1 0 HAP13 0011000100000100111110000010 0 1 0 0 0 1 HAP14 0011000100000100111110110010 0 0 0 0 1 0 HAP15 0011000100000100111110110100 0 2 0 0 1 0 HAP16 0011000100000100111110110101 0 23 0 0 7 0 HAP17 0011000100000100111110110110 0 6 0 0 5 0 HAP18 0011000100000100111110110111 0 1 0 0 0 0 HAP19 1100111011111011100010000010 0 0 0 0 3 1 HAP20 1100111011111011100011001010 0 0 0 0 36 0 a 0 indicates the reference allele, and 1 indicates the alternate allele according to the coding scheme in the 1000 Genomes Project VCF files. The sickle site rs334 is underlined. Given that our sets of 1, 3, and 27 markers more strongly associated with rs334 than did the set of RFLP-predicting markers, we assessed association between our sets of markers and the set of RFLP-predicting markers. Our sets of markers were moderately associated with the set of RFLP-predicting markers when conditioned on the presence of βS and weakly associated when conditioned on the presence of βA (Figure 1).