PMC:1687185 / 15885-22462 JSONTXT

Annnotations TAB JSON ListView MergeView

{"target":"https://pubannotation.org/docs/sourcedb/PMC/sourceid/1687185","sourcedb":"PMC","sourceid":"1687185","source_url":"https://www.ncbi.nlm.nih.gov/pmc/1687185","text":"3.2 Hessian Computation and Dynamical System for the Scoring Function\nIn order to present our algorithm, we have defined the dynamical system corresponding to the log-likelihood function and the PSSM. The key contribution of the paper is the development of this nonlinear dynamical system which will enable us to realize the geometric and dynamic nature of the likelihood surface by allowing us to understand the topology and convergence behaviour of any given subspace on the surface. We construct the following gradient system in order to locate critical points of the objective function (4):\nQ˙ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGrbqugaGaaaaa@2DE0@(t) = -∇ A(Q)     (5)\nOne can realize that this transformation preserves all of the critical points [20]. Now, we will describe the construction of the gradient system and the Hessian in detail. In order to reduce the dominance of one variable over the other, the values of each of the nucleotides that belong to the consensus pattern at the position k will be represented in terms of the other three nucleotides in that particular column. Let Pikdenote the kth position in the segment Pi. This will also minimize the dominance of the eigenvector directions when the Hessian is obtained. The variables in the scoring function are transformed into new variables described in Table 2. Thus, Eq. (4) can be rewritten in terms of the 3l variables as follows:\nTable 2 Position Weight Matrix. A count of nucleotides j ∈ {A, T, G, C} at each position k = 1..l in all the sequences of the data set. Ck is the kth nucleotide of the consensus pattern which represents the nucleotide with the highest value in that column. Let the consensus pattern be GACT...G and bj be the background.\nj k = b k = 1 k = 2 K = 3 k = 4 ... k = l\nA b A w 1 C 2 w 7 w 10 ... w 3l-2\nT b T w 2 w 4 w 8 C 4 ... w 3l-1\nG b G C 1 w 5 w 9 w 11 ... C l\nC b C W 3 w 6 C 3 w 12 ... W 3l A ( Q ) = ∑ i = 1 t ∑ k = 1 l l o g   f i k ( w 3 k − 2 , w 3 k − 1 , w 3 k ) i       ( 6 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGbbqqcqGGOaakcqWGrbqucqGGPaqkcqGH9aqpdaaeWbqaamaaqahabaacbiGae8hBaWMae83Ba8Mae83zaCgaleaacqWGRbWAcqGH9aqpcqaIXaqmaeaacqWGSbaBa0GaeyyeIuoaaSqaaiabdMgaPjabg2da9iabigdaXaqaaiabdsha0bqdcqGHris5aOGaeeiiaaIaemOzay2aaSbaaSqaaiabdMgaPjabdUgaRbqabaGccqGGOaakcqWG3bWDdaWgaaWcbaGaeG4mamJaem4AaSMaeyOeI0IaeGOmaidabeaakiabcYcaSiabdEha3naaBaaaleaacqaIZaWmcqWGRbWAcqGHsislcqaIXaqmaeqaaOGaeiilaWIaem4DaC3aaSbaaSqaaiabiodaZiabdUgaRbqabaGccqGGPaqkdaWgaaWcbaGaemyAaKgabeaakiaaxMaacaWLjaWaaeWaaeaacqaI2aGnaiaawIcacaGLPaaaaaa@6150@\nwhere fik can take the values {w3k-2, w3k-1, w3k, 1 - (w3k-2 + w3k-1 + w3k)} depending on the Pik value. The first derivative of the scoring function is a one dimensional vector with 3l elements.\n∇ A = [ ∂ A ∂ w 1 ∂ A ∂ w 2 ∂ A ∂ w 3 .... ∂ A ∂ w 3 l ] T       ( 7 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqGHhis0cqWGbbqqcqGH9aqpdaWadaqaamaalaaabaGaeyOaIyRaemyqaeeabaGaeyOaIyRaem4DaC3aaSbaaSqaaiabigdaXaqabaaaaOWaaSaaaeaacqGHciITcqWGbbqqaeaacqGHciITcqWG3bWDdaWgaaWcbaGaeGOmaidabeaaaaGcdaWcaaqaaiabgkGi2kabdgeabbqaaiabgkGi2kabdEha3naaBaaaleaacqaIZaWmaeqaaaaakiabc6caUiabc6caUiabc6caUiabc6caUmaalaaabaGaeyOaIyRaemyqaeeabaGaeyOaIyRaem4DaC3aaSbaaSqaaiabiodaZiabdYgaSbqabaaaaaGccaGLBbGaayzxaaWaaWbaaSqabeaacqWGubavaaGccaWLjaGaaCzcamaabmaabaGaeG4naCdacaGLOaGaayzkaaaaaa@5671@\nand each partial derivative is given by\n∂ A ∂ w p = ∑ i = 1 t ∂ f i p ∂ w p f i k ( w 3 k − 2 , w 3 k − 1 , w 3 k )       ( 8 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaWcaaqaaiabgkGi2kabdgeabbqaaiabgkGi2kabdEha3naaBaaaleaacqWGWbaCaeqaaaaakiabg2da9maaqahabaWaaSaaaeaadaWcaaqaaiabgkGi2kabdAgaMnaaBaaaleaacqWGPbqAcqWGWbaCaeqaaaGcbaGaeyOaIyRaem4DaC3aaSbaaSqaaiabdchaWbqabaaaaaGcbaGaemOzay2aaSbaaSqaaiabdMgaPjabdUgaRbqabaGccqGGOaakcqWG3bWDdaWgaaWcbaGaeG4mamJaem4AaSMaeyOeI0IaeGOmaidabeaakiabcYcaSiabdEha3naaBaaaleaacqaIZaWmcqWGRbWAcqGHsislcqaIXaqmaeqaaOGaeiilaWIaem4DaC3aaSbaaSqaaiabiodaZiabdUgaRbqabaGccqGGPaqkaaaaleaacqWGPbqAcqGH9aqpcqaIXaqmaeaacqWG0baDa0GaeyyeIuoakiaaxMaacaWLjaWaaeWaaeaacqaI4aaoaiaawIcacaGLPaaaaaa@614C@\n∀p = 1, 2 ... 3l and k = round(p/3)+ 1\nThe Hessian ∇2A is a block diagonal matrix of block size 3 × 3. For a given sequence, the entries of the 3 × 3 block will be the same if that nucleotide belongs to the consensus pattern (Ck). The gradient system is mainly obtained for enabling us to identify the stability boundaries and stability regions on the likelihood surface. The theoretical details of these concepts are published in [20]. The stability region of each local maximum is an approximate convergence zone of the EM algorithm. If we can identify all the saddle points on the stability boundary of a given local maximum, then we will be able to find all the corresponding Tier-1 local maxima. Tier-1 local maximum is defined as the new local maximum that is connected to the original local maximum through one decomposition point. Similarly, we can define Tier-2 and Tier-k local maxima that will take 2 and k decomposition points respectively. However, finding every saddle point is computationally intractable and hence we have adopted a heuristic by generating the eigenvector directions of the PSSM at the local maximum. Also, for such a complicated likelihood function, it is not efficient to compute all saddle points on the stability boundary. Hence, one can obtain new local maxima by obtaining the exit points instead of the saddle points. The point along a particular direction where the function has the lowest value starting from the given local maximum is called the exit point. The next section details our approach and explains the different phases of our algorithm.","divisions":[{"label":"title","span":{"begin":0,"end":69}},{"label":"p","span":{"begin":70,"end":594}},{"label":"p","span":{"begin":595,"end":881}},{"label":"p","span":{"begin":882,"end":1614}},{"label":"table-wrap","span":{"begin":1615,"end":2174}},{"label":"label","span":{"begin":1615,"end":1622}},{"label":"caption","span":{"begin":1624,"end":1936}},{"label":"p","span":{"begin":1624,"end":1936}},{"label":"table","span":{"begin":1937,"end":2174}},{"label":"tr","span":{"begin":1937,"end":1987}},{"label":"td","span":{"begin":1937,"end":1939}},{"label":"td","span":{"begin":1940,"end":1946}},{"label":"td","span":{"begin":1948,"end":1953}},{"label":"td","span":{"begin":1955,"end":1960}},{"label":"td","span":{"begin":1962,"end":1967}},{"label":"td","span":{"begin":1969,"end":1974}},{"label":"td","span":{"begin":1976,"end":1979}},{"label":"td","span":{"begin":1981,"end":1987}},{"label":"tr","span":{"begin":1988,"end":2035}},{"label":"td","span":{"begin":1988,"end":1990}},{"label":"td","span":{"begin":1991,"end":1996}},{"label":"td","span":{"begin":1998,"end":2002}},{"label":"td","span":{"begin":2004,"end":2008}},{"label":"td","span":{"begin":2010,"end":2014}},{"label":"td","span":{"begin":2016,"end":2021}},{"label":"td","span":{"begin":2023,"end":2026}},{"label":"td","span":{"begin":2028,"end":2035}},{"label":"tr","span":{"begin":2036,"end":2082}},{"label":"td","span":{"begin":2036,"end":2038}},{"label":"td","span":{"begin":2039,"end":2044}},{"label":"td","span":{"begin":2046,"end":2050}},{"label":"td","span":{"begin":2052,"end":2056}},{"label":"td","span":{"begin":2058,"end":2062}},{"label":"td","span":{"begin":2064,"end":2068}},{"label":"td","span":{"begin":2070,"end":2073}},{"label":"td","span":{"begin":2075,"end":2082}},{"label":"tr","span":{"begin":2083,"end":2128}},{"label":"td","span":{"begin":2083,"end":2085}},{"label":"td","span":{"begin":2086,"end":2091}},{"label":"td","span":{"begin":2093,"end":2097}},{"label":"td","span":{"begin":2099,"end":2103}},{"label":"td","span":{"begin":2105,"end":2109}},{"label":"td","span":{"begin":2111,"end":2116}},{"label":"td","span":{"begin":2118,"end":2121}},{"label":"td","span":{"begin":2123,"end":2128}},{"label":"tr","span":{"begin":2129,"end":2174}},{"label":"td","span":{"begin":2129,"end":2131}},{"label":"td","span":{"begin":2132,"end":2137}},{"label":"td","span":{"begin":2139,"end":2143}},{"label":"td","span":{"begin":2145,"end":2149}},{"label":"td","span":{"begin":2151,"end":2155}},{"label":"td","span":{"begin":2157,"end":2162}},{"label":"td","span":{"begin":2164,"end":2167}},{"label":"td","span":{"begin":2169,"end":2174}},{"label":"p","span":{"begin":2175,"end":3041}},{"label":"p","span":{"begin":3042,"end":3237}},{"label":"p","span":{"begin":3238,"end":4037}},{"label":"p","span":{"begin":4038,"end":4077}},{"label":"p","span":{"begin":4078,"end":4987}},{"label":"p","span":{"begin":4988,"end":5026}}],"tracks":[]}