Neck-Region Length Variation in Worldwide Populations The identical genomic organization of CD209 and CD209L is extended to the neck region, which, in both genes, encodes a track of seven coding repeats of 23 aa each (fig. 1) (Soilleux et al. 2000). A previous study has shown that the length of the neck region of CD209L varied between individuals of European descent (Bashirova et al. 2001). To investigate the degree of polymorphism of the neck region in both CD209 and CD209L, we genotyped it in the entire HGDP-CEPH panel (1,064 individuals from 52 worldwide populations). Striking differences were observed between the two genes (see fig. 5 and table 4 for detailed allele frequencies in each population). For CD209, virtually no variation was observed, and the 7-repeat allele accounted for 99% of the total variability. Despite this limited variation, eight different alleles were observed, with an allele size range of 2–10 repeats, not including a 9-repeat allele. The geographic region that presented the highest variability was the Middle East, with five of the eight different alleles observed (fig. 5A and table 4). For CD209L, a completely different pattern emerged, with strong variation in allelic frequencies of different repeat numbers. Of the seven alleles observed (from 4–10-repeat allele size classes), the three most common overall were the 7- (57.42%), the 5- (23.92%), and the 6- (11.37%) repeat alleles. European, Asian, and Pacific populations presented a mosaic composition of different allelic classes, whereas 7- and 6-repeat alleles accounted for most (96%) of the African diversity (fig. 5B). The strong difference in the neck-region lengths between the two genes was consequently visible in the heterozygosity values: CD209 exhibited an overall heterozygosity of only 2%, whereas CD209L presented a value of 54% (table 5 table 5). Our results showed that the levels of heterozygosity observed at CD209 were considerably lower than expected, regardless of the mutation model considered (i.e., Infinite Site or Stepwise Mutation Models) (table 5). In strong contrast, although not statistically significant for individual populations, CD209L exhibited a pattern of an excess of heterozygosity in all populations. Figure 5 Geographical distribution of the neck-region repeat variation in CD209 (A) and CD209L (B). Population codes are (1) Algerians; (2) Mandenka; (3) Yoruba; (4) Biaka Pygmies; (5) Northeastern Bantu from Kenya; (6) Mbuti Pygmies; (7) San; (8) South African Bantu southeastern/southwestern; (9) French and Basque from France; (10) Italian composite from Bergamo, Tuscany, and Sardinia; (11) Orcadian; (12) Russians; (13) Adygei; (14) Middle Eastern composite sample of Druze, Palestinian, and Bedouin; (15) Yakut; (16) Pakistani composite sample; (17) Chinese composite sample; (18) Japanese; (19) Cambodian; (20) Papuan; (21) Melanesian; (22) Pima; (23) Maya; (24) Piapoco and Curripaco; (25) Surui; and (26) Karitiana. For populations 16 and 17, we have pooled the different Pakistani and Chinese individual populations, respectively. For population details of these two composite groups, see the HGDP-CEPH Web site. Table 4 Allele Relative Frequencies of Neck-Region Repeat Variation in CD209 and CD209L in Individual Populations CD209 CD209L Relative Frequency (%) by No. of Repeats Relative Frequency (%) by No. of Repeats Location and Population Geographic Origin No. of Chromosomes 10 8 7 6 5 4 3 2 HZa 10 9 8 7 6 5 4 HZb Africa: 254 .39 99.21 .39 .02 .39 62.20 33.86 3.54 .50  Biaka Pygmies Central African Republic 72 100 65.28 30.56 4.17 .47  Mbuti Pygmies Democratic Republic of Congo 30 100 43.33 56.67 .47  Bantu, northeastern Kenya 24 100 50.00 37.50 12.50 .83  San Namibia 14 100 35.71 64.29 .71  Yoruban Nigeria 50 2.00 98.00 .04 2.00 78.00 20.00 .32  Mandenkan Senegal 48 97.92 2.08 .04 66.67 29.17 4.17 .54  Bantu, southeastern/southwestern South Africa 16 100 62.50 31.25 6.25 .50 Europe: 322 99.69 .31 .01 1.86 43.17 14.91 33.54 6.52 .62  French France 58 100 48.28 12.07 36.21 3.45 .55  French (Basque) France 48 100 39.58 8.33 39.58 12.50 .50  Sardinian Italy 72 100 1.39 31.94 22.22 34.72 9.72 .61  North Italian Italy (Bergamo) 28 100 .00 46.43 21.43 28.57 3.57 .79  Orcadian Orkney Islands 32 100 9.38 46.88 9.38 28.13 6.25 .69  Russian Russia 50 100 2.00 48.00 12.00 34.00 4.00 .84  Adygei Russian Caucasus 34 97.06 2.94 .06 2.94 50.00 17.65 26.47 2.94 .35 Middle East: 356 .28 97.19 1.97 .28 .28 .06 .84 .28 56.46 17.13 24.72 .56 .61  Druze Israel (Carmel) 96 96.88 3.13 .06 1.04 1.04 53.13 21.88 22.92 .67  Palestinian Israel (Central) 102 .98 99.02 .02 .98 56.86 14.71 27.45 .65  Bedouin Israel (Negev) 98 96.94 3.06 .06 1.02 58.16 14.29 24.49 2.04 .51  Mozabite Algeria (Mzab) 60 95.00 1.67 1.67 1.67 .1 58.33 18.33 23.33 .60 Central/South Asia: 420 .24 99.29 .24 .24 .01 3.81 .95 63.57 4.29 27.38 .52  Pakistanib Pakistan 400 .25 99.25 .25 .25 .02 3.50 1.00 63.50 4.25 27.75 .52  Uygur China 20 100 10.00 65.00 5.00 20.00 .50 East Asia: 482 .21 99.38 .21 .21 .01 11.83 .21 70.12 2.49 15.35 .47  Cambodian Cambodia 22 100 18.18 68.18 4.55 9.09 .36  Chinesec China 348 99.43 .29 .29 .01 12.07 .29 71.26 2.30 14.08 .45  Japanese Japan 62 1.61 98.39 .03 6.45 62.90 3.23 27.42 .58  Yakut Siberia 50 100 14.00 72.00 2.00 12.00 .48 Oceania: 78 100 3.85 26.92 30.77 21.79 16.67 .72  Papuan New Guinea 34 100 41.18 29.41 11.76 17.65 .65  NAN Melanesian Bougainville 44 100 6.82 15.91 31.82 29.55 15.91 .77 Americas: 216 98.61 1.39 .03 8.80 43.98 47.22 .45  Karitiana Brazil 48 100 4.17 56.25 39.58 .54  Surui Brazil 42 92.86 7.14 .14 16.67 83.33 .33  Piapoco and Curripaco Colombia 26 100 19.23 26.92 53.85 .46  Pima Mexico 50 100 8.00 64.00 28.00 .36  Mayan Mexico 50 100 16.00 44.00 40.00 .56   Total 2,128 .05 .14 98.97 .47 .09 .09 .14 .05 .02 .14 5.73 .33 57.42 11.37 23.92 1.08 .54 a Heterozygosity values. b Pakistani populations include Balochi, Brahui, Makrani, Sindhi, Pathan, Burusho, Hazara, and Kalash. c Chinese populations include Han, Dai, Daur, Hezhen, Lahu, Miao, Orogen, She, Tujia, Tu, Xibo, Yi, Mongola, and Naxi. Table 5 Observed and Expected Heterozygosities for the Number of Repeats in the Neck Regions of CD209 and CD209L Findings for Neck Regions of CD209 CD209L Heterozygosity P Heterozygosity P Population Observed Expecteda ISMb SMMc Observed Expecteda ISMb SMMc African 1.6 27.9 .030 .000 50 37 .328 .229 European .6 15.3 .158 .094 62 44 .179 .304 Middle Eastern 5.6 43.1 .018 .000 61 49 .299 .095 Central/South Asian 1.4 35.1 .003 .000 52 43 .387 .098 East Asian 1.2 34.5 .003 .000 47 42 .472 .054 Oceanian .0 … … … 72 53 .071 .337 American 2.8 16.3 .323 .205 45 29 .273 .440 Total sample 2.0 49.7 .002 .000 54 47 .405 .013 Note.— We presented only the expected heterozygosity under the infinite-site model, because no evidence for recurrent mutations were observed in our data, as suggested by the composite CD209L haplotypes that included the repeat variation (fig. 2), as well as by the median-joining networks (results not shown). Significant P values are shown in bold italics. a Under the infinite-site model. b Probability of the observed heterozygosity under the infinite-site model. c Probability of the observed heterozygosity under the stepwise mutational model.