Appendix A Derivation of Likelihood Here, we derive the likelihood for the multinomial model when estimated haplotypes are used to estimate the parent of origin of alleles. The data considered consists of case-parent trios and control-parent trios and the methodology extends to other sub-pedigrees as previously described.20,21 We consider data observed at one SNP given that some, possibly all, of the data have been phased and parent-of-origin deduced. This gives an expression for the likelihood as follows:L(θ;D)=∏iP(gmi,gfi,gci,Ψi|θ,disi),where D is the data (consisting of genotype data gmi, gfi, and gci for the mother, father, and child, respectively, in the ith trio and the event that the trio has been phased, Ψi = 1, or not, Ψi = 0), conditional on the child’s disease status disi. The full set of multinomial model parameters is given by θ = (R1, R2, S1, S2, Im, Ip, γ11, γ12, γ21, γ22, μ1, μ2, μ3, μ4, μ5, μ6) as defined by Ainsworth et al.20 We haveL(θ;D)=∏i∈diseasedP(gmi,gfi,gci,Ψi|θ,disi=1)∏i∈notdiseasedP(gmi,gfi,gci,Ψi|θ,disi=0),where “diseased” or “not diseased” relate to the sets of case subjects (disi = 1) or control subjects (disi = 0), respectively. The first term of this product where the child is diseased gives∏i∈diseasedP(gmi,gfi,gci,Ψi|θ,disi=1)=∏i∈diseased, phasedP(gmi,gfi,gci|θ,disi=1,Ψi=1)P(Ψi=1|θ,disi=1)×∏i∈diseased, not phasedP(gmi,gfi,gci|θ,disi=1,Ψi=0)P(Ψi=0|θ,disi=1). Now, define nj and nj′ as the number of case-parent trios with genotype data for cell j∈{1, …, 8, 9a, 9b, 10, …, 15} for not phased and phased data, respectively, where the 16 cells are defined as by Ainsworth et al.20 and Nj=nj+nj′ as the total case-parent trios with diseased children for cell j. For phased data, n9a′ and n9b′ are calculated from the estimated haplotypes, and we can define cell 9 as the total of cells 9a and 9b so that n9′=n9a′+n9b′. For unphased data, cells 9a and 9b are not observed or estimated, so we again define cell 9 as the total of cells 9a and 9b, which is observed, so that n9 = n9a + n9b. Define r as the probability of a case-parent trio being phased, assuming independence with the genotypes. This is given by the proportion of trios that have been phased in the dataset. Referring to column 9 of Table 2 in Ainsworth et al,20 we then get∏i∈diseased,phasedP(gmi,gfi,gci|θ,disi=1,Ψi=1)P(Ψi=1|disi=1)=[rR2S2ImIpγ22μ1]n1′×…×[rR2S1ImIpγ12μ4]n8′×[rR1S1Ipγ11μ4]n9a′×[rR1S1Imγ11μ4]n9b′×[rS1μ4]n10′×[rμ6]n15′. When the data are not phased, we have n9 = n9a + n9b. This then gives∏i∈diseased,notphasedP(gmi,gfi,gci|θ,disi=1,Ψi=0)P(Ψi=0|disi=1)=[(1−r)R2S2ImIpγ22μ1]n1×…×[(1−r)R2S1ImIpγ12μ4]n8×[(1−r)R1S1(Ip+Im)γ11μ4]n9×[(1−r)S1μ4]n10×[(1−r)μ6]n15.Therefore,∏i∈diseasedP(gmi,gfi,gci,Ψi|θ,disi=1)=rn1′(1−r)n1×…×rn8′(1−r)n8×rn10′(1−r)n10×…×rn15′(1−r)n15×[R2S2ImIpγ22μ1]N1×…×[R2S1ImIpγ12μ4]N8×[rR1S1Ipγ11μ4]n9a′×[rR1S1Imγ11μ4]n9b′×[(1−r)R1S1(Ip+Im)γ11μ4]n9×[S1μ4]N10×…×[μ6]N15. Similarly, when the child is not diseased we use∏i∈notdiseasedP(gmi,gfi,gci,Ψi|θ,disi=0)=∏i∈notdiseased,phasedP(gmi,gfi,gci|θ,disi=0,Ψi=1)P(Ψi=1|θ,disi=0)×∏i∈notdiseased,notphasedP(gmi,gfi,gci|θ,disi=0,Ψi=0)P(Ψi=1|θ,disi=0). Define mj and mj′ as the number of control-parent trios with genotype data for each cell j for not phased and phased data, respectively, and Mj=mj+mj′ as the total of control-parent trios with diseased children for cell j. Furthermore, define r′ as the probability of a control-parent trio being phased, assuming independence with the genotypes. As before, referring to Table 2 in Ainsworth et al,20 we obtain∏i∈notdiseased,phasedP(gmi,gfi,gci|θ,disi=0,Ψi=1)P(Ψi=1|disi=0)=[r′μ1]m1′×…×[r′μ4]m8′×[r′μ4]m9a′×[r′μ4]m9b′×[r′μ4]m10′×…×[r′μ6]m15′. As with case-parent trios, we define cell 9 as the total of cells 9a and 9b, which is observed, so that m9 = m9a + m9b. This then gives∏i∈notdiseased,notphasedP(gmi,gfi,gci|θ,disi=0,Ψi=0)P(Ψi=0|disi=0)=[(1−r′)μ1]m1×…×[(1−r′)μ4]m8×[(1−r′)μ4]m9×[(1−r′)μ4]m10×…×[(1−r′)μ6]m15.Therefore,∏i∈not diseasedP(gmi,gfi,gci|θ,disi=0)=(r′)m1′(1−r′)m1×…×(r′)m15′(1−r′)m15×[μ1]M1×…×[μ4]M8×[μ4]M9×[μ4]M10×…×[μ6]M15. Therefore, the likelihood of the data isL(θ;D)=rn1′(1−r)n1×…×rn8′(1−r)n8×rn10′(1−r)n10×…×rn15′(1−r)n15×(r′)m1′(1−r′)m1×…×(r′)m15′(1−r′)m15×[R2S2ImIpγ22μ1]N1×[R2S2ImIpγ22μ2]N2×[R1S2Imγ21μ2]N3×[R2S1ImIpγ12μ2]N4×[R1S1Ipγ11μ2]N5×[R1S2Imγ21μ3]N6×[R1Ipμ3]N7×[R2S1ImIpγ12μ4]N8×[rR1S1Ipγ11μ4]n9a′×[rR1S1Imγ11μ4]n9b′×[(1−r)R1S1(Ip+Im)γ11μ4]n9×[S1μ4]N10×[R1S1Imγ11μ5]N11×[S1μ5]N12×[R1Ipμ5]N13×[μ5]N14×[μ6]N15×[μ1]M1×[μ2]M2×[μ2]M3×[μ2]M4×[μ2]M5×[μ3]M6×[μ3]M7×[μ4]M8×[μ4]M9×[μ4]M10×[μ5]M11×[μ5]M12×[μ5]M13×[μ5]M14×[μ6]M15. The log likelihood is therefore given byl(θ;D)=log(L(θ;D))=n1′log(r)+n1log(1−r)+…+n8′log(r)+n8log(1−r)+n10′log(r)+n10log(1−r)+…+n15′log(r)+n15log(1−r)+m1′log(r′)+m1log(1−r′)+…+m15′log(r′)+m15log(1−r′)+N1log(R2S2ImIpγ22μ1)+N2log(R2S2ImIpγ22μ2)+N3log(R1S2Imγ21μ2)+N4log(R2S1ImIpγ12μ2)+N5log(R1S1Ipγ11μ2)+N6log(R1S2Imγ21μ3)+N7log(R1Ipμ3)+N8log(R2S1ImIpγ12μ4)+n9a′log(r)+n9b′log(r)+n9log(1−r)+n9a′log(R1S1Ipγ11μ4)+n9b′log(R1S1Imγ11μ4)+n9log(R1S1(Ip+Im)γ11μ4)+N10log(S1μ4)+N11log(R1S1Imγ11μ5)+N12log(S1μ5)+N13log(R1Ipμ5)+N14log(μ5)+N15log(μ6)+M1log(μ1)+M2log(μ2)+M3log(μ2)+M4log(μ2)+M5log(μ2)+M6log(μ3)+M7log(μ3)+M8log(μ4)+M9log(μ4)+M10log(μ4)+M11log(μ5)+M12log(μ5)+M13log(μ5)+M14log(μ5)+M15log(μ6). We note that, for fixed r and r′, the likehood does not depend on the values of r and r′, so these terms can be dropped. For convenience, in our software implementation, we retain r and r′ in the likelihood calculation, but we find the results are invariant to the choice of r and r′. This is equivalent to considering the overall likelihood as the product of the conditional likelihoods P(gmi, gfi, gci | θ, disi, Ψi = 0) and P(gmi, gfi, gci | θ, disi, Ψi = 1) for the unphased (15 cell) and phased (16 cell) tables, respectively. For other types of sub-pedigrees, including case-mother duos and case-father duos, the calculation proceeds similarly.