PMC:4564992 / 16823-21528
Annnotations
{"target":"https://pubannotation.org/docs/sourcedb/PMC/sourceid/4564992","sourcedb":"PMC","sourceid":"4564992","source_url":"https://www.ncbi.nlm.nih.gov/pmc/4564992","text":"Workflow\nThe following steps are carried out when using PREMIM and EMIM in conjunction with SHAPEIT2 to analyze data from multiple nearby SNPs (including, but not limited to, genome-wide association study [GWAS] data). Note that steps 1–7 are carried out through a single command line call to PREMIM, which automatically invokes SHAPEIT2 as required.1. PREMIM: Case-parent trios and duos are chosen from pedigrees. PREMIM processes the pedigree data and summarizes for each SNP the possible genotype combinations in case-parent trios and duos. The previous version of PREMIM processed pedigrees on a SNP-by-SNP basis so that the chosen case-parent trio (from a larger pedigree) for one SNP might be different from that for another SNP. For haplotype estimation, it is necessary that the same case-parent trio is chosen for every pedigree and SNP, so PREMIM chooses the case-parent trio with the least missing SNP data, but only if the amount of missing data is above a user-specified threshold (default, 50%). Following selection of case-parent trios, case-mother duos are next selected by PREMIM in the same manner except with the extra constraint that only pedigrees that have not had a case-parent trio selected are considered. The case-father duos are then selected with the constraint that only pedigrees that have not had a case-parent trio or case-mother duo selected are considered.\n2. PREMIM: Binary pedigree files are created for case-parent trios and duos. The case-parent trios and duos selected for haplotype estimation are collected together into one PLINK14-format binary pedigree (.bed) file with associated family (.fam) and map (.bim) files.\n3. SHAPEIT2: Haplotype graph is calculated. PREMIM invokes SHAPEIT2 to calculate the haplotype graph with data created in step 2. Several different options are available in SHAPEIT2 to try to improve the accuracy of the phasing at the cost of increasing the processing time. In our experience, we found that the default (slower) settings were best for duos, and thus these are used by default, but for trios these could be changed in order to speed up the processing time. It is also possible to use an external reference panel with SHAPEIT2, and to use a known recombination map, which might be beneficial for some datasets. However, in our experience, case-parent trios and duos generally provide sufficient information for excellent resolution of parental origin even without making use of a reference panel or a known recombination map. SHAPEIT2 also imputes missing data and handles Mendelian errors by setting genotypes to missing.\n4. SHAPEIT2: Haplotypes are estimated. PREMIM then evokes SHAPEIT2 to return the most probable haplotype estimates from the haplotype graph calculated in step 3. It was found (data not shown) that allowing for phase uncertainty through sampling possible haplotypes from the haplotype graph did not improve performance in terms of power or type I error, although in theory this could be done (with the results averaged to generate non-integer cell counts for cells 9a and 9b or for cells 4a and 4b) if desired.\n5. PREMIM: Phased case-parent trio and duo data processed. PREMIM estimates the parent of origin of alleles for ambiguous scenarios by using the phased haplotypes from SHAPEIT2. The total counts for trios and duos that are phased and not phased are also recorded for each SNP and are used to calculate the likelihood. The resolution of ambiguous trios and duos is recorded as cell counts 9a and 9b for case-parent trios and cell counts 4a and 4b for duos.\n6. PREMIM: Phased duo data are adjusted. The estimated counts in cell 4a and 4b for duos have been found to sometimes lead to an inflated test statistic in EMIM. Therefore, these counts are adjusted to reduce the inflation to an acceptable level, see Appendix B for details.\n7. PREMIM: Remaining pedigrees are processed. Any pedigrees without a case-parent trio or duo selected for phasing are processed by PREMIM in the usual manner on a SNP-by-SNP basis, possibly creating other pedigree subunits, such as parents of a case subject, lone case subjects, and control subjects. Each pedigree subunit has a file created with the genotype counts for each SNP. Any case-parent trio or duo data processed without phasing is combined with the phased data to create EMIM input files with counts in all three relevant categories (9, 9a, and 9b for trios or 4, 4a, and 4b for duos).\n8. EMIM: Case-parent trio and duo data are analyzed together with other pedigree data. The genotype count files created by PREMIM are analyzed by EMIM (with a slightly updated-format parameter file that specifies the parameters to estimate and the model assumptions).","divisions":[{"label":"title","span":{"begin":0,"end":8}},{"label":"label","span":{"begin":350,"end":352}},{"label":"p","span":{"begin":353,"end":1390}},{"label":"label","span":{"begin":1391,"end":1393}},{"label":"p","span":{"begin":1394,"end":1659}},{"label":"label","span":{"begin":1660,"end":1662}},{"label":"p","span":{"begin":1663,"end":2597}},{"label":"label","span":{"begin":2598,"end":2600}},{"label":"p","span":{"begin":2601,"end":3107}},{"label":"label","span":{"begin":3108,"end":3110}},{"label":"p","span":{"begin":3111,"end":3563}},{"label":"label","span":{"begin":3564,"end":3566}},{"label":"p","span":{"begin":3567,"end":3838}},{"label":"label","span":{"begin":3839,"end":3841}},{"label":"p","span":{"begin":3842,"end":4437}},{"label":"label","span":{"begin":4438,"end":4440}}],"tracks":[{"project":"2_test","denotations":[{"id":"26320892-17701901-2052534","span":{"begin":1570,"end":1572},"obj":"17701901"}],"attributes":[{"subj":"26320892-17701901-2052534","pred":"source","obj":"2_test"}]}],"config":{"attribute types":[{"pred":"source","value type":"selection","values":[{"id":"2_test","color":"#93ece5","default":true}]}]}}