PMC:4564992 / 10823-28624
Annnotations
{"target":"https://pubannotation.org/docs/sourcedb/PMC/sourceid/4564992","sourcedb":"PMC","sourceid":"4564992","source_url":"https://www.ncbi.nlm.nih.gov/pmc/4564992","text":"Material and Methods\nFor an overview of the methodology implemented in PREMIM and EMIM, see our previous work.20,21 Here, we shall describe only the essential components relevant to the current manuscript. EMIM uses genotype counts from pedigree data to estimate relative-risk parameters through the use of multinomial modeling. The accompanying program PREMIM pre-processes the pedigree data to supply EMIM with the required genotype count information. Parameters estimable by EMIM include child genotype effects R1 and R2 (the relative risks conferred by the presence of one or two copies of the risk allele in the child), maternal-genotype effects S1 and S2 (the relative risks conferred by the presence of one or two copies of the risk allele in the mother), and maternal and paternal parent-of-origin (or, imprinting) parameters Im and Ip, respectively, which correspond to the factor by which a child’s disease risk is multiplied if they inherit a risk allele from their mother or father. EMIM calculates a log likelihood at each SNP of interest, on the basis of the chosen parameters and assumptions, such as Hardy-Weinberg equilibrium or conditioning on parental genotypes.21\nHere, we aim to increase the power to detect parent-of-origin effects by improving the information regarding the parental origin of a child’s alleles. First, we consider case-parent trios and then case-mother duos. By symmetry, all results for maternally inherited imprinting effects can be applied to paternally inherited imprinting effects, and, similarly, results for case-mother duos can be applied to case-father duos.\nFor case-parent trios, 15 possible genotype combinations (gm,gf,gc) can occur in a mother, father, and child at any given SNP (see Table 2 of Ainsworth et al.20). EMIM fits a multinomial model to the observed counts in these 15 categories. The only configuration in which it is not possible to determine the parental origin is when all three individuals are heterozygous. That is, if we denote the minor allele by “2” and the major allele by “1,” then if both parents and the child have genotype “1/2,” it is not known whether the “2” allele came from the father or the mother. (Throughout this paper we use “2” to denote the minor allele, which is also considered, for convenience, to be the risk allele, although in practice either allele can be modeled as the risk allele).\nIn previous versions of EMIM (≤ 2.07), the multinomial likelihood contribution from such a trio wasP(gm=12,gf=12,gc=12 or 21|dis=1)=μ4R1S1(Ip+Im)γ11where dis = 1 denotes the event that the child is diseased, gc denotes the ordered (maternal and paternal) alleles in the child, μ4 denotes a nuisance (mating-type stratification) parameter, and γ11 denotes an optional mother-child genotype-interaction parameter that can be estimated if desired. This likelihood contribution comes from combining into a single cell (cell 9) the contributions from cells 9a and 9b of Table 2 of Ainsworth et al.,20 in which cell 9a corresponds to the (unobservable) situation that the “2” allele in the child came from the father and 9b to the (unobservable) situation that the “2” allele in the child came from the mother.\nIn our updated version of EMIM, we use the software package SHAPEIT231,32 to estimate haplotypes in the trio and then use this information to infer the parental origin of the “2” allele in the child. We thus consider cells 9a and 9b separately, resulting in a likelihood contribution of μ4R1S1Ipγ11 if a trio is deemed to fall into category 9a or μ4R1S1Imγ11 if a trio is deemed to fall into category 9b. (Trios where there is still some ambiguity regarding parental origin could, in theory, contribute fractional counts to both cells; however, as noted later, we did not find any advantage in allowing for this as compared to just using the most likely parent-of-origin assignment).\nSeparation of cell 9 into two cells, 9a and 9b, leads to a situation in which the multinomial likelihood uses counts from 16 (rather than the originally considered 15) cells. However, we might want to analyze datasets in which only a proportion of the case-parent trios have been phased. This could occur, for example, if some of the case-parent trios have been genotyped only at a candidate SNP (so there are no surrounding SNPs to provide phase information), whereas other trios have been more densely genotyped. Thus, we actually need to consider the counts from 17 cells, with cells 9, 9a, and 9b all considered as separate categories. In Appendix A, we derive the multinomial likelihood that includes data for cells 9, 9a, and 9b in terms of the genotype relative-risk parameters of interest.\nA similar approach can be used for case-mother or case-father duos. Table 3 of Ainsworth et al.20 shows the seven observable genotype combinations in case-mother duos (a similar table can be constructed for case-father duos). Here, cell 4 is the only configuration in which parental origin is not observed. Cell 4 can be divided into two cells: cell 4a, where the risk allele, “2,” is inherited from the father, and cell 4b, where it is inherited from the mother. In previous versions of EMIM (≤ 2.07), the multinomial likelihood contribution from cell 4 was R1S1γ11(Im(μ4 + μ5) + Ip(μ2 + μ4)), whereas now this is separated out into two contributions, R1S1γ11(Ip(μ2 + μ4)) for counts in cell 4a and R1S1γ11(Im(μ4 + μ5)) for counts in cell 4b. To allow for datasets in which only a proportion of the case-mother duos have been phased, we fit a likelihood to the counts from nine cells, with cells 4, 4a, and 4b all considered as separate categories (see Appendix A). A similar process can be carried out with the table for case-father duos. The overall likelihood of the data corresponds to the product of the likelihoods for the tables for different observed family units (including case-parent trios, case-mother duos, case-father duos, and various other case- and control-based tables, see Howey and Cordell21 for details).\n\nWorkflow\nThe following steps are carried out when using PREMIM and EMIM in conjunction with SHAPEIT2 to analyze data from multiple nearby SNPs (including, but not limited to, genome-wide association study [GWAS] data). Note that steps 1–7 are carried out through a single command line call to PREMIM, which automatically invokes SHAPEIT2 as required.1. PREMIM: Case-parent trios and duos are chosen from pedigrees. PREMIM processes the pedigree data and summarizes for each SNP the possible genotype combinations in case-parent trios and duos. The previous version of PREMIM processed pedigrees on a SNP-by-SNP basis so that the chosen case-parent trio (from a larger pedigree) for one SNP might be different from that for another SNP. For haplotype estimation, it is necessary that the same case-parent trio is chosen for every pedigree and SNP, so PREMIM chooses the case-parent trio with the least missing SNP data, but only if the amount of missing data is above a user-specified threshold (default, 50%). Following selection of case-parent trios, case-mother duos are next selected by PREMIM in the same manner except with the extra constraint that only pedigrees that have not had a case-parent trio selected are considered. The case-father duos are then selected with the constraint that only pedigrees that have not had a case-parent trio or case-mother duo selected are considered.\n2. PREMIM: Binary pedigree files are created for case-parent trios and duos. The case-parent trios and duos selected for haplotype estimation are collected together into one PLINK14-format binary pedigree (.bed) file with associated family (.fam) and map (.bim) files.\n3. SHAPEIT2: Haplotype graph is calculated. PREMIM invokes SHAPEIT2 to calculate the haplotype graph with data created in step 2. Several different options are available in SHAPEIT2 to try to improve the accuracy of the phasing at the cost of increasing the processing time. In our experience, we found that the default (slower) settings were best for duos, and thus these are used by default, but for trios these could be changed in order to speed up the processing time. It is also possible to use an external reference panel with SHAPEIT2, and to use a known recombination map, which might be beneficial for some datasets. However, in our experience, case-parent trios and duos generally provide sufficient information for excellent resolution of parental origin even without making use of a reference panel or a known recombination map. SHAPEIT2 also imputes missing data and handles Mendelian errors by setting genotypes to missing.\n4. SHAPEIT2: Haplotypes are estimated. PREMIM then evokes SHAPEIT2 to return the most probable haplotype estimates from the haplotype graph calculated in step 3. It was found (data not shown) that allowing for phase uncertainty through sampling possible haplotypes from the haplotype graph did not improve performance in terms of power or type I error, although in theory this could be done (with the results averaged to generate non-integer cell counts for cells 9a and 9b or for cells 4a and 4b) if desired.\n5. PREMIM: Phased case-parent trio and duo data processed. PREMIM estimates the parent of origin of alleles for ambiguous scenarios by using the phased haplotypes from SHAPEIT2. The total counts for trios and duos that are phased and not phased are also recorded for each SNP and are used to calculate the likelihood. The resolution of ambiguous trios and duos is recorded as cell counts 9a and 9b for case-parent trios and cell counts 4a and 4b for duos.\n6. PREMIM: Phased duo data are adjusted. The estimated counts in cell 4a and 4b for duos have been found to sometimes lead to an inflated test statistic in EMIM. Therefore, these counts are adjusted to reduce the inflation to an acceptable level, see Appendix B for details.\n7. PREMIM: Remaining pedigrees are processed. Any pedigrees without a case-parent trio or duo selected for phasing are processed by PREMIM in the usual manner on a SNP-by-SNP basis, possibly creating other pedigree subunits, such as parents of a case subject, lone case subjects, and control subjects. Each pedigree subunit has a file created with the genotype counts for each SNP. Any case-parent trio or duo data processed without phasing is combined with the phased data to create EMIM input files with counts in all three relevant categories (9, 9a, and 9b for trios or 4, 4a, and 4b for duos).\n8. EMIM: Case-parent trio and duo data are analyzed together with other pedigree data. The genotype count files created by PREMIM are analyzed by EMIM (with a slightly updated-format parameter file that specifies the parameters to estimate and the model assumptions).\n\nAdjustment of Genotype Counts for Duos\nInitial investigations indicated that for ambiguous duos (in which the parent and child are both heterozygous), when the number of minor alleles inherited from the father and mother were estimated with SHAPEIT2, the estimates could be biased, depending on the minor allele frequency and which parent (mother or father) was genotyped, leading to inflated test statistics in EMIM. To correct this bias, we devised an adjustment procedure that relies on the fact that we will have tested many SNPs, most of which will not display parent-of-origin effects. (Our adjustment is thus suitable for GWAS data or data from a set of SNPs that are not expected to show parent-of-origin effects; it would not be suitable for analyzing a small number of candidate SNPs.) See Appendix B and Figure S1 for details and an example of the proposed adjustment procedure. Our adjustment procedure involves fitting curves to the estimated counts that correspond to adjusted versions of the curves expected under the null hypothesis. The fitted curves include an adjustment function, f(p), where p is the minor allele frequency. The cell counts for minor alleles inherited from the father (cell 4a) and mother (cell 4b) are then adjusted by respectively subtracting and adding f(p). (This can result in non-integer values for the adjusted counts, which is not a problem given that the multinomial likelihood maximized by EMIM does not specifically require the counts to be integers, see Appendix A.) This procedure ensures that, for the adjusted counts, there should, on average, be far less bias toward transmissions being estimated as coming from one particular parent. A particular SNP that displays clear evidence of transmission from one parent rather than from another, as expected if genuine parent-of-origin effects exist, will, however, be only marginally affected by this adjustment, given that the vast majority of SNPs are assumed to be non-causal. It will be shown later (see Results) that this reduces inflation of the test statistic and slightly increases the power.\n\nSimulations for Investigating Power and Type I Error\nWe carried out a simulation study to investigate the performance of our proposed new approach. SimPed33 was used to simulate 1,000 (for investigation of power) or 5,000 (for investigation of type I error) replicates of datasets, each with 1,500 family units (case-parent trios, case-mother duos, or case-father duos) typed at 200 SNPs across a “chromosome.” Haplotype blocks of eight SNPs in length were simulated; this was repeated 25 times to give the total of 200 SNPs. If a causal SNP was required (as when estimating power), then the 100th SNP was used. The power or type I error of PREMIM and EMIM under various models was then calculated; detection at SNP numbers 97 to 104 was used as evidence of a true or false finding. Several different PREMIM and EMIM tests were considered: (1) using the parent of origin of alleles as estimated from SHAPEIT2, with and without genotype-count adjustment, (2) using the previous version of PREMIM and EMIM, which categorizes ambiguous trios or duos into a single cell (cell 9 for trios or cell 4 for duos) without estimating parent of origin, and (3) using the known (simulated) parent of origin of alleles. The p value thresholds used to examine power were 10−12, 10−10, and 10−6. For type I error, the p value thresholds used were 6.25 × 10−3, 1.25 × 10−3, and 1.25 × 10−4, which correspond to family-wise error rates (FWERs) of 0.05, 0.01, and 0.001, under the assumption that the eight SNPs tested are independent. Unless otherwise stated, the default options in SHAPEIT2 were used (“−burn 7 −prune 8 −main 20”). For faster analysis (used in the simulation study for case-parent trios), SHAPEIT2 with fast MCMC options (“−burn 1 −prune 1 −main 1”) was used. Tests were performed with PREMIM and EMIM to detect (1) maternally inherited imprinting effects, (2) maternally inherited imprinting effects while allowing for child effects, and, for case-parent trios and type I errors only, (3) maternally inherited imprinting effects while allowing for maternal effects and (4) maternally inherited imprinting effects while allowing for maternal and child effects.\n\nApplication to SLI Data\nSLI is a neurodevelopmental disorder that affects linguistic abilities when development is otherwise normal. In a recent GWAS of 297 affected children in 278 pedigrees, Nudel et al.6 found two chromosomal regions of interest: chromosome 14, with a paternally inherited parent-of-origin effect (Ip, p value = 3.74 × 10−8) and chromosome 5, with a maternally inherited parent-of-origin effect (Im, p value = 1.16 × 10−7). We applied the latest versions of PREMIM and EMIM (using SHAPEIT2 to estimate the parent-of-origin of alleles) to a slightly updated version of this SLI dataset, testing for paternally inherited parent-of-origin effects on chromosome 14 and maternally inherited parent-of-origin effects on chromosome 5. The pedigrees were subjected to quality control measures as described in Anderson et al.34 and Nudel et al,6 but note that the threshold used for exclusion on the basis of heterozygosity rates was ± 3 SD from the mean, and the Hardy-Weinberg equilibrium p value exclusion threshold used in PLINK was 10−6 (and not ± 2 SD and 0.001, respectively, as previously incorrectly specified in Nudel et al.6).\n\nApplication to Tetralogy of Fallot Data\nTetralogy of Fallot (TOF) is the most common form of congenital heart disease, a major source of morbidity and mortality in childhood. In a recent GWAS using a European discovery set of 835 case subjects, 717 additional family members (including both parents for 293 of the case subjects), and 5,159 control subjects, Cordell et al.35 found regions on chromosomes 12 and 13 to be significantly and replicably associated with TOF. Although not reported by Cordell et al.,35 further modeling of the replicating regions via EMIM indicated that the top result on chromosome 12 (at rs11065987) could be equally well modeled by a paternally inherited imprinting effect (Ip, p value = 2.10 × 10−8) as by an allelic effect of a child’s own genotype (p value = 4.06 × 10−8). Also, the top result on chromosome 13 (at rs7982677) could potentially be equally well modeled by a maternally inherited imprinting effect (Im, p value = 9.54 × 10−7) as by an allelic effect of a child’s own genotype (p value 6.41 × 10−7), although there wasn’t sufficient power in either case to distinguish between imprinting and child-genotype effects. Evidence for a maternally inherited imprinting effect on chromosome 12 or a paternally inherited imprinting effect on chromosome 13 was less well supported (Im, p value = 9.04 × 10−5 at rs11065987; Ip, p value 0.00022 at rs7982677). Here, we investigate these findings further by using our updated version of PREMIM and EMIM, which uses SHAPEIT2 to estimate the parent-of-origin of alleles, to test for paternally inherited imprinting effects on chromosome 12 and maternally inherited imprinting effects on chromosome 13.","divisions":[{"label":"title","span":{"begin":0,"end":20}},{"label":"p","span":{"begin":21,"end":1183}},{"label":"p","span":{"begin":1184,"end":1607}},{"label":"p","span":{"begin":1608,"end":2384}},{"label":"p","span":{"begin":2385,"end":3189}},{"label":"p","span":{"begin":3190,"end":3873}},{"label":"p","span":{"begin":3874,"end":4671}},{"label":"p","span":{"begin":4672,"end":5998}},{"label":"sec","span":{"begin":6000,"end":10705}},{"label":"title","span":{"begin":6000,"end":6008}},{"label":"p","span":{"begin":6009,"end":10705}},{"label":"label","span":{"begin":6350,"end":6352}},{"label":"p","span":{"begin":6353,"end":7390}},{"label":"label","span":{"begin":7391,"end":7393}},{"label":"p","span":{"begin":7394,"end":7659}},{"label":"label","span":{"begin":7660,"end":7662}},{"label":"p","span":{"begin":7663,"end":8597}},{"label":"label","span":{"begin":8598,"end":8600}},{"label":"p","span":{"begin":8601,"end":9107}},{"label":"label","span":{"begin":9108,"end":9110}},{"label":"p","span":{"begin":9111,"end":9563}},{"label":"label","span":{"begin":9564,"end":9566}},{"label":"p","span":{"begin":9567,"end":9838}},{"label":"label","span":{"begin":9839,"end":9841}},{"label":"p","span":{"begin":9842,"end":10437}},{"label":"label","span":{"begin":10438,"end":10440}},{"label":"p","span":{"begin":10441,"end":10705}},{"label":"sec","span":{"begin":10707,"end":12804}},{"label":"title","span":{"begin":10707,"end":10745}},{"label":"p","span":{"begin":10746,"end":12804}},{"label":"sec","span":{"begin":12806,"end":14966}},{"label":"title","span":{"begin":12806,"end":12858}},{"label":"p","span":{"begin":12859,"end":14966}},{"label":"sec","span":{"begin":14968,"end":16116}},{"label":"title","span":{"begin":14968,"end":14991}},{"label":"p","span":{"begin":14992,"end":16116}},{"label":"title","span":{"begin":16118,"end":16157}}],"tracks":[{"project":"2_test","denotations":[{"id":"26320892-21181895-2052527","span":{"begin":113,"end":115},"obj":"21181895"},{"id":"26320892-22738121-2052527","span":{"begin":113,"end":115},"obj":"22738121"},{"id":"26320892-22738121-2052528","span":{"begin":1181,"end":1183},"obj":"22738121"},{"id":"26320892-21181895-2052529","span":{"begin":1766,"end":1768},"obj":"21181895"},{"id":"26320892-21181895-2052530","span":{"begin":2978,"end":2980},"obj":"21181895"},{"id":"26320892-22138821-2052531","span":{"begin":3261,"end":3263},"obj":"22138821"},{"id":"26320892-23269371-2052531","span":{"begin":3261,"end":3263},"obj":"23269371"},{"id":"26320892-21181895-2052532","span":{"begin":4767,"end":4769},"obj":"21181895"},{"id":"26320892-22738121-2052533","span":{"begin":5982,"end":5984},"obj":"22738121"},{"id":"26320892-17701901-2052534","span":{"begin":7570,"end":7572},"obj":"17701901"},{"id":"26320892-16224189-2052535","span":{"begin":12960,"end":12962},"obj":"16224189"},{"id":"26320892-24571439-2052536","span":{"begin":15173,"end":15174},"obj":"24571439"},{"id":"26320892-21085122-2052537","span":{"begin":15804,"end":15806},"obj":"21085122"},{"id":"26320892-24571439-2052538","span":{"begin":15823,"end":15824},"obj":"24571439"},{"id":"26320892-24571439-2052539","span":{"begin":16113,"end":16114},"obj":"24571439"},{"id":"26320892-23297363-2052540","span":{"begin":16490,"end":16492},"obj":"23297363"},{"id":"26320892-23297363-2052541","span":{"begin":16628,"end":16630},"obj":"23297363"}],"attributes":[{"subj":"26320892-21181895-2052527","pred":"source","obj":"2_test"},{"subj":"26320892-22738121-2052527","pred":"source","obj":"2_test"},{"subj":"26320892-22738121-2052528","pred":"source","obj":"2_test"},{"subj":"26320892-21181895-2052529","pred":"source","obj":"2_test"},{"subj":"26320892-21181895-2052530","pred":"source","obj":"2_test"},{"subj":"26320892-22138821-2052531","pred":"source","obj":"2_test"},{"subj":"26320892-23269371-2052531","pred":"source","obj":"2_test"},{"subj":"26320892-21181895-2052532","pred":"source","obj":"2_test"},{"subj":"26320892-22738121-2052533","pred":"source","obj":"2_test"},{"subj":"26320892-17701901-2052534","pred":"source","obj":"2_test"},{"subj":"26320892-16224189-2052535","pred":"source","obj":"2_test"},{"subj":"26320892-24571439-2052536","pred":"source","obj":"2_test"},{"subj":"26320892-21085122-2052537","pred":"source","obj":"2_test"},{"subj":"26320892-24571439-2052538","pred":"source","obj":"2_test"},{"subj":"26320892-24571439-2052539","pred":"source","obj":"2_test"},{"subj":"26320892-23297363-2052540","pred":"source","obj":"2_test"},{"subj":"26320892-23297363-2052541","pred":"source","obj":"2_test"}]}],"config":{"attribute types":[{"pred":"source","value type":"selection","values":[{"id":"2_test","color":"#9493ec","default":true}]}]}}