Material and Methods Subjects and Comparative Datasets We sampled 926 Lebanese men who had three generations of paternal ancestry in the country and who gave informed consent for this study, which was approved by the American University of Beirut IRB Committee. Each provided information on his geographical origin, classified into five regions: (1) Beirut (the capital city), (2) Mount Lebanon in the center, (3) the Bekaa Valley in the east, (4) the north, and (5) the south. Each also provided information on his religious affiliation: (1) Muslim, including the sects Shiite and Sunnite, (2) Christian, including the major sects Maronite, Orthodox, and Catholic, and (3) Druze, a distinct religion that has a 1000-year history and whose followers live mainly in Syria and Lebanon. Comparative data on haplogroup frequencies were obtained from published sources and consenting individuals from the Genographic Public Participation dataset, whose participants can choose to make their data available for subsequent studies. For the Arabian Peninsula, published data from Omani Arabs10, Qatar, United Arab Emirates, and Yemen11 were used; in addition, we used data from the Genographic Public Participation dataset for individuals originating from Oman, Qatar, United Arab Emirates, Yemen, and Saudi Arabia (Table S2 in the Supplemental Data). Data from France12, Germany13, England14, and Italy15 were used to construct a representative western European sample as described below, and data from Turkey were also available.16 Combined Y-SNP plus Y-STR datasets were available from the Arabian Peninsula10, 11 and Turkey16. European data were extracted from the consented Genographic Project Public Participation database (Table S2). Historical Data In addition to the contemporary subjects, we needed estimates of the likely genetic composition of the Crusaders. Historical sources17, 18, 19 show that four Crusades reached Lebanon—the first, second, third, and sixth—and that the main populations contributing were the French, Germans, English, and Italians; these sources suggest that the approximate numbers of men participating from the four countries were similar (Table 1). Y haplogroup frequencies are known in each of these modern populations12, 13, 14, 15, so if we assume that haplogroup frequencies were similar at the time of the Crusades, a weighted average western European haplogroup composition can be constructed (Table 2). This needed to be provided as numbers rather than frequencies for the tests described below. We therefore first scaled the total contribution from each country according to the smallest sample (the French12, n = 45) to produce the “weighted total” column in Table 2. We then divided each weighted total by the haplogroup frequency in that country to give a weighted number for each haplogroup from each country. Finally, we calculated the sum of these weighted numbers for each haplogroup and used the closest integer (bottom row in Table 2) in the analyses below. Genotyping Samples were genotyped with a set of 58 Y-chromosomal binary markers by standard methods20 (Figure 2). These markers define 53 haplogroups (including paragroups), 27 of which were present in the Lebanese sample. We also typed a subset (the first 587 individuals collected, and thus with unbiased ascertainment) with 11 Y-STRs by using standard methods21, 22 (Table S1). STR alleles were named according to current recommendations23, except that “389b” was used in place of “DYS389II”; 398b = (DYS389II − DYS389I). General Statistical Analyses Analysis of molecular variance (AMOVA)24, population pairwise genetic distances, and Mantel tests25 were performed with the package Arlequin 3.11.26 Admixture analyses were carried out with Admix2_0.27 Median-joining networks28 were calculated with Network 4.2 (Fluxus-Engineering). Such networks were highly reticulated, and we reduced reticulations by first weighting the loci according to the inverse of their variance in the dataset used29 and subsequently constructing a reduced-median network30 to form the input of the median-joining network. Male effective population sizes were calculated with BATWING31 with a demographic model that assumed a period of constant size followed by exponential growth; prior values were set for other parameters as described previously.20 Computation of Drift Probabilities We wished to calculate the probability that a haplotype could increase from a deduced initial frequency to an observed current frequency by chance over a period specified by the historical record. In addition, we wished to evaluate the influence that admixture with an outside population might have on this probability. We had detailed data consisting of Y-SNP and Y-STR sets for some relevant groups and relied upon the YHRD database for data from other populations. A number of applications are available for estimating migration rates; these applications account for coalescence, mutation, and migration, including estimates of variation of migration, over a period of time.32, 33, 34, 35, 36, 37, 38 However, none of the packages address the specific question of testing whether drift alone could reasonably account for the emergence of modern levels of haplogroup or haplotype frequencies in the population or how much migration for a specified epoch could affect these rates if the available historical information is incorporated. We have therefore chosen to directly employ a Wright-Fisher model with sampled migration to compute the effects of drift given an admixture event of known duration. The Wright-Fisher model39, 40 entirely replaces each generation with each succeeding one. The offspring select their parents randomly. The following calculation outlines the Wright-Fisher drift model, describing how the probability of seeing some particular number of members of a population carrying a haplotype will evolve over time. Then it considers the following circumstance: Two populations are evolving according to the Wright-Fisher model and the island model of Haldane41. First, a European population carrying a particular haplotype of interest described below (Western European Specific 1, WES1) experiences drift freely. Over some period of time, some number of this population is selected randomly and travels to Lebanon. Each generation, the children randomly select their parents from the mixed Lebanese and migrant European populations. Given that a proportion p parents are of some particular haplotype, the probability that the selected number X(t + 1) of l children out of an effective population of size N is P(X(t+1)=l)=(Nl)pl(1−p)N−l. Given that j out of N parents are of the haplotype of interest, then p = j/N. Therefore, the probability of finding l children of the haplotype of interest given j parents is P(X(t+1)=l|X(t)=j)=(Nl)(jN)l(1−jN)N−l. Given a distribution of probabilities P(X(t) = j) of finding j children of the haplotype of interest at some generation t, the probability P(X(t + 1) = l) of finding l of the haplotype at time t + 1 is P(X(t+1)=l)=∑j=0NP(X(t+1)=l|X(t)=j)P(X(t)=j). The chances pf of finding at least some fraction f of that haplotype after t = T generations is pf=∑j≥f⋅NP(X(T)=j). We can extend the above argument to include the admixture of one population with another if we replace the population sampled by the children with an expanded pool that includes contributions from the incoming population. In this case, a population labeled W carrying among them members of the WES1 haplotype mixes with a native Lebanese population labeled L. Given an effective population NL of Lebanese Christians and an effective population NW of Europeans, the fraction of migrants from which the next generation can choose will be m=NWNL+NW. The fraction of Lebanese Christians bearing the WES1 marker will be pL=jLNL, and that of Europeans will be pW=jWNW. The total admixed fraction of WES1 presented to the next generation will be pA(jL,jW)=(1−m)pL+mpW=jL+jWNL+NW. The number of WES1 individuals, jW, that traveled to Lebanon is a random variable XW(t) that will have a distribution determined by sampling NW admixing WES1 members from the European population, which itself is experiencing drift with probability P(XE(t) = jE) in an effective European population NE. Therefore, the distribution of jW will be determined by P(XW(t)=jW)=∑jE=0NE(NWjW)(jENE)jW(1−jENE)NW−jWP(XE(t)=jE). Then the admixed probability P(XL(t+1)=l|XL(t)=jL,XW(t)=jW) that l children will have selected WES1 parents from NL Lebanese and NW WES1 parents is P(XL(t+1)=l|XL(t)=jL,XW(t)=jW)=(NLl)(pA(jL,jW))l(1−pA(jL,jW))NL−l. If we sum over the distributions of jL and jL, the final probability distribution of possible future selections of WES1 by the children will be P(XL(t+1)=l)=∑jL=0NL∑jW=0NW{P(XL(t+1)=l|XL(t)=jL,XW(t)=jW)×P(XL(t)=jL)P(XW(t)=jW)}. The initial condition of finding p0 assumed as an initial Lebanese fraction of the WES1 marker is specified by requiring P(j,0)={1wherej=⌊p0N⌋0elsewhere. Computations were performed in C++ with the binomial distribution function implemented in the Gnu Scientific Library.42