Results To investigate the genetic history of the present-day Greenlandic population, we analyzed genetic data from 4,127 Greenlandic individuals from 15 different locations in Greenland (Figure 1), 547 Greenlandic individuals living in Denmark, 50 Danish individuals, and 208 unrelated individuals from the original HapMap project. All of these individuals were genotyped for 196,224 SNPs on the Illumina MetaboChip, and a small subset of them were exome sequenced as well. The Danish individuals were included to represent the European ancestors of the Greenlanders, which are mainly from Denmark and Norway, and the HapMap individuals were included for comparison to other populations from the rest of the world. Note that some of the results presented below are based on analyses of SNP chip data from all 4,127 Greenlanders and the 50 Danes. Other results are based on analyses of SNP chip data from a restricted subset of the Greenlanders. This subset consists of individuals who are not closely related, do not have any European ancestry (<5% estimated European ancestry), and have not recently migrated within Greenland. Because most of the 15 locations had very few such individuals, only individuals from Qaanaaq (north), Upernavik villages (west), South villages (south), Tasiilaq (east), and Tasiilaq villages (east) were included in these latter analyses. In the following sections, we will refer to the two data sets on which the below results are based as (1) the full data set and (2) the restricted Greenlandic data set. For details about the data sets, including how many of the 196,224 SNP sites did not pass filtering prior to the different analysis, see the Material and Methods. Recent European Gene Flow and Population Structure Using a subset of the genotyped Greenlandic individuals (2,733 individuals from the IHIT cohort), we previously showed in a study focused on disease mapping that there has been a large amount of gene flow from Europe into Greenland and that most Greenlanders have both European and Inuit ancestry.8 To further explore the genetic structure of the Greenlandic population, we here estimated admixture proportions for the full data set by using the program ADMIXTURE23 and stratified the results according to location. First, we assumed that the Greenlandic individuals have ancestry from two ancestral populations (K = 2), so all Danish individuals were assigned one ancestral population, and the Greenlandic individuals were assigned a mixture of both ancestral populations (Figure 2). We interpreted the two ancestry components of the Greenlandic individuals to be European ancestry and Inuit ancestry. In doing so, we found that there has been gene flow from Europeans into most locations in Greenland and that more than 80% of Greenlanders have European ancestry (Figures 2 and 3). On average, the Greenlanders have ∼25% European ancestry; however, some locations in Greenland have a considerably smaller amount of European ancestry. Specifically, participants from Tasiilaq in East Greenland, the small villages in South Greenland (South villages), and Qaanaaq in North Greenland (Thule) have less European ancestry. In fact, most individuals in Tasiilaq and the South villages have only Inuit ancestry (Figure 3). To investigate the population structure within the Inuit ancestry, we inferred ancestry proportions with higher numbers of assumed ancestral populations (K = 3–5). When three ancestral populations (K = 3) were assumed, the Danes were again assigned one ancestry component, but Greenlanders in Qaanaaq in North Greenland and in Tasiilaq in East Greenland were also each assigned their own component (Figure 2). The rest of the Greenlandic locations were inferred to be mixtures of all three components. When four ancestral populations (K = 4) were assumed, the results remained similar, except in this case, all the Greenlandic locations other than Qaanaaq and Tasiilaq were inferred to be mixtures of all four components. These results do not support the claim of a shared genetic component between North and East Greenlanders,13 but it fits well with the geographic regions in Greenland, where the two geographically extreme locations are Qaanaaq in the north and Tasiilaq in the east. Both of these locations are fairly isolated from the west and south of Greenland, where most Greenlanders live. The physical distance between Tasiilaq and Qaanaaq and the rest of the locations might also explain why these locations have less gene flow from Europe. Further increasing the number of assumed ancestry components (K = 5), we found that the areas around Upernavik and Maniitsoq also received their own predominant ancestry component. Interestingly, the South village population was not assigned a unique ancestry component but was the only West Greenlandic population to be assigned a large amount of Tasiilaq ancestry. For the analyses performed with higher K values (K > 2), it should be noted that care should be taken when results are interpreted because the model underlying the program ADMIXTURE might not represent the nature of the data well. First, the fact that none of the individuals from Upernavik villages and Maniitsoq villages were inferred to be 100% from the components that are predominant in these locations at K = 5 could be an indication that this K value is too high. Second, the fact that the individuals from West Greenland were inferred to have ancestry from both Qaanaaq and Tasiilaq under the K = 3–4 models does not necessarily indicate that the West Greenlanders are admixed. These results could also be caused by a scenario where Greenland was settled by Inuit who entered North Greenland and from there migrated to South Greenland along the west coast and from there to East Greenland. We will return to this point later. We also visualized the population structure by using PCA. As can be seen in Figure 4, there are three extreme locations: Denmark representing Europe, Tasiilaq representing East Greenland, and Qaanaaq representing North Greenland. The first principal component reflects an Inuit-to-Europe gradient, whereas the second principal component reflects a within-Inuit gradient from north to east and with intermediate populations in the south. The existence of such a gradient could suggest that modern East Greenlanders are descendants of people who first migrated from north to south along the west coast of Greenland. An alternative explanation is that the South Greenlanders are admixed between East and West Greenlanders and that East Greenlanders are descendants of a separate wave of migration from the north down the east coast. We will return to this point in the section on migration routes. To further investigate the population structure within the Inuit ancestry, we also inferred ancestry proportions and performed PCA of the restricted Greenlandic data set combined with Danish samples. The inferred ancestry proportions are shown in Figure S4, in which the structure among Inuit is clearly visible. However, it is not sufficiently pronounced for each location to be assigned a unique ancestry component. Even though Upernavik villages and South villages represent the extreme ends of West Greenland (including South Greenland), they were not assigned two different components. Instead, they were assigned the same component, although individuals from Upernavik villages harbor a substantial fraction of the Qaanaaq ancestry component. The PCA in Figure S5 suggests the same: Upernavik villages and South villages cluster closely together even though they are physically located far from each other. These results are consistent with FST values estimated by the Weir and Cockerham estimator35 from the same restricted Greenlandic data set combined with HapMap samples (Table S1): FST is a measure of how different populations are genetically, and the fact that the estimated FST between Upernavik villages and South villages is smaller than the estimated FST between Upernavik villages and any of the two other locations suggests that Upernavik villages are genetically closer to South villages than to the other locations. Sex-Biased Gene Flow Bosch et al.6 analyzed mtDNA and Y chromosome DNA from 69 Inuit and demonstrated that the mtDNA, which is maternally inherited, was exclusively of Inuit origin, whereas more than 50% of the Y chromosomes, which are paternally inherited, were of European origin.6 However, their study was based on a small number of individuals from a limited number of locations in Greenland. To more broadly assess and quantify the sex bias in the European gene flow, we estimated the amount of mtDNA of European origin in our much larger full data set. We distinguished between European and Inuit mtDNA by using a single diagnostic mtDNA SNP: the MT1736 marker that defines the A haplogroup. This marker perfectly separates the two populations for all unadmixed individuals (Table S2), and we used it to obtain estimates of the amount of mtDNA of European origin shown in Figure S6. We found that although the mtDNA in Greenland is not exclusively Inuit, the amount of European female ancestry based on mtDNA is only ∼1.0%. This is about 25 times lower than the proportion of autosomal DNA of European origin. The large discrepancy between admixture proportions at autosomal and mtDNA markers is in line with the results of Bosch et al.,6 who concluded that 50% of the male ancestry in Greenland is European. Consequences of Being a Small and Historically Isolated Population The Inuit have not experienced the same population growth as many of the standard reference populations, such as Han Chinese and Europeans. Furthermore, they might have lived in relatively small subpopulations and undergone a series of bottlenecks as they colonized the Arctic. For this reason, we would expect the Greenlanders to have a relatively small effective population size in comparison to East Asian or European populations. Populations with historically small effective sizes are expected to harbor less nucleotide variability than larger populations. To assess whether this is the case for the Greenlandic population, we estimated the SFS for the Greenlandic population by using exome sequencing data generated by Moltke et al.8 from 18 parents from nine trios of Greenlanders with no Danish ancestry. We also estimated SFSs for four HapMap22 populations (CEU, JPT, CHB, and YRI) by using data from 18 unrelated individuals from each of these populations, which were sequenced as part of the 1000 Genomes Project.25 From these five SFSs, we then estimated variability levels for each of the five populations (Table S3). Most notably, this table shows that the variability, measured as the fraction of polymorphic sites, is markedly lower in Greenland than in the other populations. Additionally, the SFSs show that the Greenlandic population harbors proportionally fewer rare variants than the four HapMap populations (Figure 5). Both observations are consistent with a history of small population sizes, isolation, and founder events. We note that the reason we used (exome) sequencing data instead of SNP chip data for the above comparison is that the MetaboChip is biased toward SNPs with high frequency in European populations, and this ascertainment bias could strongly affect the results of a comparison between Greenlanders and other populations, especially Europeans. However, because the SNP ascertainment bias should affect all Greenlandic locations equally, the SNP chip data can be used for comparing nucleotide variation levels between the different locations in Greenland. We did this by estimating nucleotide variation levels, measured as mean MAF, for each Greenlandic location from the full data set without LD (Figure S7). To account for the European admixture, we corrected the allele frequencies for the estimated European ancestry (see Material and Methods for details). Interestingly, a slight decay of genetic variation following a gradient from north to west to south to east can be observed among the Greenlandic locations (Figure S8), which could again indicate support for only one migration wave that moved from north to west to south to east. To investigate whether historical demography has had an effect on LD in the Greenlandic population, we estimated LD in the Greenlandic individuals and compared it to LD patterns in the Danes, both on the basis of data from the full data set (Figure 6). LD among the Greenlanders was markedly higher than among the Danes, which has also been indicated by previous studies.9 However, because of the recent European admixture, these LD estimates do not only reflect the more ancient demographic history of the Greenlandic population. To correct the LD estimates for the admixture and provide LD estimates for the ancestral Inuit and European populations, we also inferred the ancestral haplotype frequencies from the Greenlandic individuals. The model used for inferring the ancestral haplotype frequency assumes that the ancestry is the same for both alleles on the same haplotype and that the ancestries of an individual’s haplotype are conditionally independent on the admixture proportions. The consequence of violating these assumptions seems to have a minimal impact given that the estimates from the unadmixed individuals are similar to their inferred ancestral haplotype frequency (Figure 6). The analyses showed that LD in the ancestral Inuit population was markedly higher than the LD in the present-day Greenlandic population, whereas the ancestral European population had approximately the same amount of LD as the 50 present-day Danish individuals from the full data set (Figure 6). Thus, the recent European admixture has reduced the LD of the Greenlanders significantly. The decreased LD due to gene flow from Europe might seem somewhat counterintuitive, given that admixture creates LD where there previously was none. However, when a population with high LD mixes with a population with lower LD, the resulting population can have intermediate or even lower levels of LD. For example, if two SNPs are in perfect LD in one population, then gene flow from another population without perfect LD will always result in a decrease in LD. This scenario can clearly be seen in Figure S9, where perfect haplotype blocks are present in the Inuit component but almost absent in the European component. Finally, in addition to having a historically small population size, Inuit populations have traditionally lived in small groups, where the probability of mating with a comparatively closely related partner is increased. To investigate to what extent this has affected the population genetically, we estimated inbreeding coefficients for all the individuals and stratified the results according to location (Figure S10). If no admixture correction was performed, the inbreeding coefficients in some locations were estimated to be extremely high with an average value above 0.13. However, after correction for admixture, the average inbreeding coefficients were similar among locations in Greenland; they ranged from F = 0.008 to F = 0.014 and were comparable to coefficients estimated for the Danes (F = 0.007). The individuals with the lowest amount of inbreeding were Greenlanders living in Denmark. The above results suggest that the Greenlandic population is indeed affected by being a historically small and isolated founder population in several ways. However, we note that the population stands out in at least one important way in comparison to well-studied founder populations, such as the Finnish and the Icelandic populations: these other founder populations are all genetically similar to at least one large population, whereas the Inuit are not closely related to any large population. For example, estimates of genetic differentiation are very low between the Icelandic population and both the Norwegian population (FST = 0.0016) and the Scottish population (FST = 0.0020).36 For comparison, on the basis of our SNP chip data, we estimated FST to be 0.12 between the Greenlandic population and the Han Chinese (CHB) HapMap samples (Table S1), and the FST estimate based on sequencing data for the same two populations, which do not suffer from SNP ascertainment bias, was also 0.12 (Table S3). FST between locations in Greenland and Europe ranged from 0.15 to 0.17 for the SNP chip data (Table S1) and was estimated to be 0.16 for the sequencing data (Table S3). We note that these values are higher than the recently reported FST values ranging from 0.039 to 0.101.9 The large difference in FST values between locations in Greenland reported by Pereira et al.9 and the difference between their estimates and ours are most likely a result of the fact that Pereira et al. did not exclude European admixture when estimating FST, whereas our estimates are based only on data from Greenlandic individuals without any European ancestry. This observed difference between the Greenlandic population and other founder populations, such as the Icelandic and Finnish, is most likely due to the fact that the Greenlanders’ ancestral Inuit population was an isolated and small population for a longer period of time than these other populations. Coastal Migration Route The inferred admixture proportions and the estimated geographic variation in levels of nucleotide variation both suggest a model of Greenland settlement from the north to the south and subsequently from the south to the east, given that there is no evidence of shared genetic components between the east (Tasiilaq) and the north (Qaanaaq) or between the east and the northwest (Upernavik). SNP-chip-based FST estimates from individuals without European ancestry (based on the restricted Greenlandic data set combined with HapMap samples) are also consistent with this model: Tasiilaq in East Greenland is genetically furthest away from Qaanaaq in North Greenland and closest to villages in South Greenland (FST = 0.04 and 0.02, respectively, see Table S1). To investigate this further, we used TreeMix29 with Danish individuals as an outgroup to root the tree to infer the maximum-likelihood genetic-drift tree topology relating people from the different locations (Figure 7). We performed this analysis by using allele frequencies estimated from the full data set without LD and corrected for European admixture. The resulting tree is consistent with a single coastal route migration in which each location sequentially splits off along the coastline from the north to the south and subsequently from the south to the east. We obtained a similar TreeMix tree when we performed the same analysis with the restricted Greenlandic data set combined with HapMap samples (this data set includes only Greenlandic individuals with no European ancestry, no close relatives, and no recent migrations between Greenlandic regions; Figure S11). The topology of this tree differs in one respect, though: the placement of the root of the Greenlandic subtree. Whereas the tree inferred from the full data set without LD has Qaanaaq as an outgroup, this tree has the root placed so Upernavik villages are on the same side of the root as Qaanaaq. However, the amount of drift from the root to the split between Qaanaaq and Upernavik villages is at the same time inferred to be very small, which means that this TreeMix result is also consistent with a single migration event. Note that we also tried to run TreeMix on both data sets while allowing for one admixture event. However, the results were inconclusive and seemed likely to reflect artifacts of the method rather than real admixture events, and we therefore have not included them here. To formally test the single coastal migration event, we used D statistics30 estimated from the restricted Greenlandic data set combined with HapMap samples. As shown in Figure 8, this led to the rejection of all tree topologies in which Qaanaaq was not an outgroup to the South villages and Tasiilaq. The same results were obtained when Qaanaaq was replaced by Upernavik villages (Figure S12). If Tasiilaq (East Greenland) was reached via a migration route along the northern coast of Greenland, and thus from the north rather than the south, we would not expect Tasiilaq and the South villages to form an ingroup to both Qaanaaq and Upernavik villages. Likewise, if there were migrations to the east from both the south and the north, then we would expect to reject the topologies where Qaanaaq or Upernavik villages were the outgroup. Hence, the D-statistic-based test results provide further support for the single coastal migration route. Admixture with the Dorset and the Norse Vikings in Greenland The results from the D-statistic-based tests mentioned above also suggest that the Inuit did not, as hypothesized by Helgason et al.,13 interbreed with the Dorset in East Greenland. If individuals in Tasiilaq had ancestry from a previous migration, e.g., the Dorset, then we would expect all trees where Tasiilaq is an ingroup to be rejected. However, the tree with South villages and Tasiilaq as ingroups and Qaanaaq as an outgroup (Figure 8) was not rejected. To further address this question, we also performed a more direct D-statistic-based test of admixture. It has recently been shown with ancient DNA that individuals from the Saqqaq and the Dorset cultures are genetically similar.10 Therefore, using the high-coverage genome of a ∼4,000-year-old sample from the Saqqaq culture,7 we can test for Dorset admixture in East Greenland by estimating D statistics for topologies with a Greenlandic location in East Greenland (Tasiilaq or Tasiilaq villages) and a Greenlandic location in the rest of Greenland (Qaanaaq and South villages) as ingroups and the Saqqaq sample as an outgroup. If the Inuit in East Greenland interbred with the Dorset, we would expect these D statistics (and the Z values estimated from them) to differ significantly from 0. We did not find the Saqqaq sample to be significantly (Z > 3) closer to Tasiilaq than to Qaanaaq or South villages (Figure S13). However, one test (the test of the topology ((H1 = South villages,H2 = Tasiilaq villages),H3 = Saqqaq),H4 = CHB)) was suggestive with D = 0.008 and Z = 2.58, the latter of which is considered significant in some studies. On the basis of these analyses, we cannot exclude that interbreeding took place, given that the D-statistic-based tests used do not have full power to detect admixture events if they involve only low amounts of gene flow. However, it does suggest that the Dorset have not contributed much gene flow to the modern East Greenlanders. The question of whether the Inuit interbred with the Norse Vikings is more difficult to answer given that the Norse Vikings were Europeans just like the later colonizers, whom we know interbred with the Inuit. Hence, to answer this question, one has to separate the recent admixture (taking place from 1721) from any potential older European admixture, which can be difficult. One approach to address this question is to take advantage of the fact that most individuals in the south show no recent European gene flow. The largest Viking settlement was located in Southwest Greenland, and the ancestors of the individuals in the South villages passed this Viking settlement before settling in the south. Thus, if the Inuit and the Norse Vikings interbred in the west before the Inuit settled in the south or they interbred later in South Greenland, then we would expect individuals in the south to have some Norse Viking ancestry. On the contrary, it is very unlikely that the individuals in Qaanaaq would have any such ancestry given that they descend from Inuit who entered Greenland after the Norse Vikings left Greenland. If the Inuit interbred with the Norse Vikings, we would therefore expect to see signatures of ∼600-year-old European admixture in the Greenlanders in the South villages, but not in the Greenlanders in Qaanaaq. However, individuals in the South villages overall have less European ancestry than most other locations, including Qaanaaq (Figure 3), and importantly, more than half of the individuals from the South villages are estimated to have no European ancestry. Out of the 169 individuals from the South villages, only 40 are estimated to have more than 5% European ancestry. As the variance in admixture proportions among individuals decreases fast with time since admixture,37, 38 finding such a large proportion of individuals without admixture is unlikely if the time of admixture is old. Genomes with both Inuit and European ancestry can be divided into alternating “ancestry tracts” along the length of each chromosome, and the distribution of tract lengths in an admixed population carries information about the timing of admixture and the admixture proportion in a population.39, 40, 41, 42 More recent admixture results in longer admixture tracts. To investigate whether European ancestry in the individuals who are estimated to have more than 5% European ancestry can be attributed to Norse Viking admixture, we inferred the length of European ancestry tracts in admixed Greenlander genomes. This analysis showed that admixed individuals from the South villages all have at least one European ancestry tract that is longer than 39 cM. The presence of such large European admixture tracts suggests that a substantial proportion of European admixture originated from interbreeding during the time of Danish colonization, because, as shown in the Material and Methods, the chance that an individual will harbor such a long tract if the admixture time is 25 generations is ∼0.005. However, it does not exclude the possible presence of admixture tracts originating from interbreeding with Norse Viking populations. Because inferring a short ancestry tract with certainty is very difficult, especially with data from the sparse MetaboChip, we did not directly look for specific instances of short tracts expected from more ancient admixture. Instead, we compared the tract-length distributions from Qaanaaq and the South villages. If Norse Vikings are among the ancestors of the Greenlanders in the South villages and not of the Greenlanders in Qaanaaq, we would expect to see a difference in their tract-length distributions such that the South villages have more short tracts. However, when we matched the inferred global admixture proportions between the two locations, the two tract-length distributions were very similar (Figure S14). Thus, the estimated admixture tract distributions do not provide any evidence of Norse Viking admixture.