Fig. 4 Estimated population structure. Each individual is represented by a thin vertical line, which is partitioned by STRUCTURE into K colored segments representing estimated membership fractions in each K cluster. Black lines separate individuals of different populations. Populations are labeled below the figure. All population IDs except the 4 HapMap samples (YRI, CEU, CHB, and JPT) are denoted by 4 characters. The first 2 letters indicate the country where the samples were collected or (in the case of Affymetrix) genotyped according to the following convention: AX, Affymetrix; CN, China; ID, Indonesia; IN, India; JP, Japan; KR, Korea; MY, Malaysia; PI, Philippines; SG, Singapore; TH, Thailand; TW, Taiwan. The last 2 letters are unique IDs for the population. Populations from the same linguistic group or neighboring geographic locations tend to share the same cluster. At K = 14, each language family can be specified by a cluster (color), although Sino-Tibetan-speaking populations tend to cluster with both Altaic- and Tai-Kadai-speaking populations. The figure shown for a given K is based on the highest probability run of 10 STRUCTURE runs at that K.