testing  

@ewha-bio:220 JSONTXT

A Program for Efficient Phasing of Three-Generation Trio SNP Genotype Data. Here, we report a computer program written in Python, which phases SNP genotypes and infers inherited deletions based on the pattern of Mendelian inheritance within a trio pedigree. When tiered trio genotypes that encompass three generations are available, it narrows a recombination event down to a region between two consecutive heterozygous markers. In addition, the phase information that is inferred from the upper trio that is formed by one of the parents and grandparents can be propagated to phase the genotypes of the lower trio that is formed by the parents and an offspring. Availability: The software is freely available to nonprofit users upon request. Keywords: SNP, phasing, trio, recombination, CNV, Mendelian inconsistencyA single nucleotide polymorphism (SNP) refers to a variation at a single nucleotide sequence position of DNA that is observed commonly within a population (typically at least 1%). Due to its high density in human genomes, it is widely used in disease gene mapping. Humans are diploid organisms, having two copies of each autosomal chromosome. A haplotype is a combination of alleles on a chromosome that are transmitted together from the parent. Phasing is a process that resolves a series of consecutive genotypes into a haplotype of alleles that are transmitted or untransmitted. Because diseasecausing mutations segregate with a particular haplotype, an association analysis that is based on haplotype may be more powerful than that based on genotype (Gabriel et al., 2002). Hence, there is a paramount interest in phasing population genotypes. There are many statistical algorithms for phasing genotype data of unrelated populations (Stephens et al., 2003; Browning, 2008). Due to their statistical nature, there are some errors and uncertainty in these methods. On the other hand, phasing that is based on Mendelian inheritance patterns within a trio family is straightforward and deterministic, as long as one of the members is homozygous at the locus of interest. This approach fails if all of the members of a trio have heterozygous genotypes. There are a number of statistical approaches that deal with this problem (Marchini et al., 2006). However, these methods are known to be slow or produce some errors (Iliadis et al., 2010). For a pedigree in which tiered trios encompass three generations, the upper trio is formed by one of the parents and his or her grandparents, while the lower trio is formed by the parents and their offspring. Of loci with a minor allele frequency of 20%, 5.1% is expected to be heterozygous for all of the members of a trio (Marchini et al., 2006), while it drops to 0.8% for a threegeneration pedigree. The deterministic method that is mentioned above can be extended to these multigeneration trio cases. In such cases, it is also possible to infer an accurate de novo recombination map by comparing the parental phases that have been inherited from grandparents with those that have been transmitted to an offspring. In this paper, we report an efficient software program that implements all of these processes in one pot. During trio phasing, some markers may deviate from Mendelian inheritance patterns. It is called Mendelian inconsistency and has served as a basis for the detection of copy number variation (CNV) (Conrad et al., 2006; Freeman et al., 2006). Our program also reports a list of putative deletion CNVs. By considering three generations of trios simultaneously, it can identify more individuals who might have a CNV. An overview of the program processes is shown in Fig. 1. Phasing amounts to deduce which parental allele has been transmitted. If a parent has a homozygous genotype and thus the same allele, it is immaterial which one has been transmitted. As we know which allele of a child has been transmitted from the homozygous parent, Some SNPs may violate Mendelian inheritance patterns and can not be phased as above. For example, a parental homozygote allele may not be found in the child. If this is reproduced in multiple individuals and thus is due to neither experimental error nor de novo mutation, it can be due to a parental deletion mutation that has been transmitte d to the child. This concept has been instrumental in discovering the deletion polymorphisms of CNVs (Conrad et al., 2006; Freeman et al., 2006). By treating such a deletion as one of the alleles, our software can phase it as well. The trio phasing that we have implemented above can not tell the phases of the parental chromosomes before meiosis but can for those after the recombination. If we can phase the genotype of a parent independently, we can compare the two phases before and after meiosis and deduce the approximate locus where de novo recombination has occurred. The parental phase before meiosis can be inferred if the genotypes of both grandparents are available through the trio phasing that we have implemented above (Fig. 2). For an SNP that is heterozygous for the parent, it can be determined whether the allele that passed on to the offspring is from the parent’s maternal or paternal chromosome. The location of the recombination can hence be localized to the region that is spanned by the two closest flanking heterozygous markers in the parent (Kong et al., 2010). We call this the ‘recombination region’. ‘’We have mentioned a socalled ‘allheterozygote’ case above, in which all three members of a trio family have heterozygous genotypes. In such a case, phasing is still possible if the genotypes of both grandparents of either parent are available and at least one of them is homozygous. For example, let us assume that the genotypes of the mother’s paternal and maternal genotypes are available (Fig. 3). In this case, the lower trio members all have heterozygous ‘TC’ genotypes (‘allheterozygote’), while the grandfather has a homozygous geno type, which enables phasing of the upper trio (Fig. 3b). At this stage of the process, we still can not tell which allele of the mother has been transmitted to the offspring. By phasing the neighboring SNPs in both the upper and lower trios, we can construct a recombination map as described above and tell whether the maternal part of this locus originated from the grandfather or grandmother (Fig. 3a). The locus must be outside of the so-called ‘recombination region’ mentioned above, within which the grandparental origin of the allele is uncertain. If we can be sure that the locus inherited the allele from the maternal grandfather, as shown in Fig. 3a, the genotype of the offspring can be phased (Fig. 3c), and subsequently, that of the father can be phased as well (Fig. 3d). The algorithm has been implemented as a computer program, written in Python, executable in both the Linux and Windows operating systems. The software reads in files that are formatted as used by PLINK that is, PED, MAP formats and outputs three files, one each for ‘recombination region’, ‘allheterozygote’ information, and the final phases. It is freely available upon request. We applied this software to a genotype dataset of 194 trio families collected in Korea using Illumina 370K Duo SNP chips (Park et al., 2009). About 6% of the SNPs were socalled ‘allheterozygous’, whose genotypes of all three trio members were heterozygous (Table 1). This is similar to the frequency that is expected for trios 5.1% of loci with a minor allele frequency of 20% (Marchini et al., 2006). Among the fami lies in the dataset, three-generation trio genotype data are available for 29 families. After phasing, based on single-tier trio genotypes, followed by recombination mapping, based on two-tier trio genotypes, the ‘allheterozygous’ genotypes were phased as described. Phasing by this method failed for about 1% of the SNPs, as none in the pedigree had a homozygous genotype (Table 1). As a byproduct of our algorithm, the de novo recombination events per chromosome were counted, ranging from 7 to 10 (Table 1). In addition, putative CNVs were inferred, based on Mendelian inconsistency: 2237 of them had a frequency of 1% or more. Park et al. also analyzed an expanded dataset that included basically the same trio families in this analysis, as well as 199 additional families of KoreanVietnamese origin (Park et al., 2009). They had filtered the candidate CNVs by examining their cluster images, which we did not have access to, reporting putative 1029 CNVs, which was about half of the number of our result. Among 1029 CNVs that were reported by Park et al., 1014 were found in our results, achieving a recovery rate of 98.5%. The small portion of CNVs that were observed by Park et al. and not by us may be due to the difference in sample sizes. The genotype phasing information obtained here covers more than 99% of the SNP markers in the dataset. The phasing result from our software can be fed into other statistical phasing programs, such as PHASE, in order to resolve the phases of the rest of the markers. The highly accurate phased haplotypes that have been obtained here may serve as seeds for and thus facilitate the phasing of genotypes of unrelated individuals (Howie et al., 2009). The authors are grateful to Dr. Jong Young Lee and colleagues in the National Institute of Health, Korea Center for Disease Control, for kindly providing the trio genotype dataset used in this work. This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Education, Science, and Technology (NRF-2010-0021811).

Annnotations TAB TSV DIC JSON TextAE Lectin_function IAV-Glycan

  • Denotations: 0
  • Blocks: 0
  • Relations: 0