CORD-19:07e18fb2ba3bac9456e8afb29735fb91679840f9 JSONTXT 9 Projects

Annnotations TAB TSV DIC JSON TextAE

Id Subject Object Predicate Lexical cue
T1 290-473 Epistemic_statement denotes These profiles can be used to define a very simple, computationally efficient, alignment-free, distance measure that reflects the evolutionary relationships between genomic sequences.
T2 707-828 Epistemic_statement denotes A great volume of available genomic data has made possible analysis of large sets of organisms at the whole genome scale.
T3 829-1066 Epistemic_statement denotes However, given that most genomes contain millions to billion nucleotides, traditional molecular analysis methods based on multiple sequence alignment become impractical due to their high computation complexity (Vinga and Almeida, 2003) .
T4 1533-1692 Epistemic_statement denotes One of the aim of graphical representation is to identity regions of interest or the distribution of base along the sequence visually (Zhang and Zhang, 1994) .
T5 1693-1788 Epistemic_statement denotes The second approach, has been proposed to characterize the DNA sequence (Akhtar et al., 2007) .
T6 2301-2389 Epistemic_statement denotes Any DNA sequence can be converted into a unique numerical sequence with the same length.
T7 2750-2838 Epistemic_statement denotes However, Akhtar and Epps (2008) proved that it has poor accuracy in the exon prediction.
T8 3268-3382 Epistemic_statement denotes In fact, we can extract more information about the genome sequence from their inter-nucleotide distance sequences.
T9 3524-3859 Epistemic_statement denotes In case of the inter-nucleotide distance sequence about nucleotide A, the number of the nucleotide C, the number of the nucleotide G and the number of the nucleotide T would follow multinomial distribution given that inter-nucleotide distance is k. This multinomial distribution will be called the conditional multinomial distribution.
T10 3860-4009 Epistemic_statement denotes The relative error vector derived from the conditional multinomial distribution then can be used as a genomic signature that identifies each species.
T11 4101-4273 Epistemic_statement denotes In fact, we propose a new evolutionary information representation, complete multinomial composition vector (CMCV), by using a collection of multinomial composition vectors.
T12 4842-4956 Epistemic_statement denotes A DNA sequence, of length n, can be viewed as a linear sequence of n symbols from a finite alphabet N ¼ fA,C,G,Tg.
T13 5177-5384 Epistemic_statement denotes ,s n , GINðmÞ ¼ k, where k¼min value of i such that s m ¼ s m þ i ,m þ i rn else k¼nÀ m. We show below, as an example, the GIN for a short DNA fragment AGTTCTACCAGC is given as GIN ¼ 6,9,1,2,3,6,3,1,3,2,1,0:
T14 5385-5617 Epistemic_statement denotes From the global inter-nucleotide distance sequence GIN, we can get the inter-nucleotide distance sequence to the nucleotide x A N. Four inter-nucleotide distance sequences for the same short DNA segment used previously were given as
T15 5618-5716 Epistemic_statement denotes A similar inter-nucleotide distance sequence to the nucleotide x A N was defined by Afreixo et al.
T16 6349-6471 Epistemic_statement denotes In fact, we can count the number of each nucleotide about the genome sequence from its inter-nucleotide distance sequence.
T17 6472-6602 Epistemic_statement denotes Consequently, we can derive four conditional multinomial distributions from the corresponding inter-nucleotide distance sequences.
T18 6884-7119 Epistemic_statement denotes If the nucleotide sequence was generated by an independent and identically distributed (i.i.d) random process, the number of nucleotide C, G and nucleotide T between the nearest two nucleotide A would follow a multinomial distribution.
T19 7548-7640 Epistemic_statement denotes The nucleotide occurrence probability p GjA and p TjA can be obtained in the similar method.
T20 7641-8006 Epistemic_statement denotes The term reference conditional multinomial distribution, applied to a DNA sequence, describes the number of nucleotide C, G and nucleotide T would follow that the inter-nucleotide distance sequence about nucleotide A is given, if its nucleotides are randomly determined, with probabilities equal to the relative conditional frequencies, independently of each other.
T21 8007-8161 Epistemic_statement denotes From the perspective of molecular evolution, conditional multinomial distribution may reflect both the results of random mutation and selective evolution.
T22 8280-8351 Epistemic_statement denotes Many neutral mutations may remain and play a role of random background.
T23 8352-8549 Epistemic_statement denotes One should subtract the random background from the simple counting result in order to highlight the contribution of selective evolution (Chang and Wang, 2011; Ding et al., 2010; Gao et al., 2006) .
T24 8550-8783 Epistemic_statement denotes In this work, we propose a new conditional multinomial distribution representation which reveals the relative difference of biological sequence from sequence generated by an independent random process to remove the random background.
T25 8784-8953 Epistemic_statement denotes For a fixed k, we can obtain a measured conditional multinomial distribution and a reference conditional multinomial distribution for a certain nucleotide x A fA,C,G,Tg.
T26 8954-9089 Epistemic_statement denotes For a certain pattern a from the conditional multinomial distribution, we can define the multinomial composition value pðaÞ as follows:
T27 9090-9294 Epistemic_statement denotes where the f x 0 ðajkÞ is the measured relative frequency of the pattern a, the relative frequency of the pattern a from the reference conditional multinomial distribution f x ðajkÞ can be computed by (1).
T28 9295-9422 Epistemic_statement denotes All these multinomial composition values can be sorted in some order to form a vector V x ðSjkÞ ¼ ðp x ða 1 jkÞ,p x ða 2 jkÞ, .
T29 9427-9525 Epistemic_statement denotes ,p x ða m jkÞÞ for the genome S, where m denotes the total number of patterns under consideration.
T30 9814-9938 Epistemic_statement denotes For only a fixed k, the k-order multinomial composition vector of the whole genome S may lost some evolutionary information.
T31 10055-10154 Epistemic_statement denotes ,VðSjkÞ, denoted by CMCV (S, k), with the intention to use as much genomic information as possible.
T32 10406-10588 Epistemic_statement denotes In the case of inter-nucleotide distance sequence CIN A (k¼5), we firstly convert the possible value of the ðN CjA ,N GjA ,N TjA Þ into onedimensional value by the order of alphabet.
T33 12237-12359 Epistemic_statement denotes The signature can be used in application where evolutionary relationships need to be deduced using large genomic sequence.
T34 12360-12469 Epistemic_statement denotes Distances between sets of genomic sequences can be obtained without the need for multiple sequence alignment.
T35 12604-12814 Epistemic_statement denotes The outbreak of atypical pneumonia referred as severe acute respiratory syndrome coronavirus (SARS-CoVs) in 2003 had caught more attention to the relationship between the SARS-CoVs and the others coronaviruses.
T36 13076-13158 Epistemic_statement denotes Generally, coronavirus can be classified into three groups according to serotypes.
T37 13333-13845 Epistemic_statement denotes However, this is still a controversial topic-alignment-based methods showed that SARS-CoVs are not closely related to any groups and form a new group (Marra et al., 2003; Rota et al., 2003) ; maximum likelihood tree built from a fragment of the spike protein preferred SARS-CoVs clustering with group II (Li o and Goldman, 2004); while an information-based method, which makes use of the whole genome sequences, indicated that SARS-CoVs are close to the group I rather than from a new group (Yang et al., 2005) .
T38 14375-14399 Epistemic_statement denotes As can be seen from Fig.
T39 14400-14554 Epistemic_statement denotes 4 , our method indicates that SARS-CoVs are not closely related to any of the previously characterized coronaviruses and form a distinct group (group IV).
T40 15434-15667 Epistemic_statement denotes However, many patterns will not occur in the conditional multinomial distribution with a large value of k. From the view of information theory, some information may be lost and noise will dominate if a large value of k is considered.
T41 15668-15819 Epistemic_statement denotes To determine the upper bound of the value of k, we will introduce a scoring scheme to estimate how important a conditional multinomial distribution is.
T42 15820-16012 Epistemic_statement denotes w 2 ÀTest scoring scheme: For a fixed k, let a be a pattern in the conditional multinomial distribution, with its multinomial composition value pða,ijkÞ in genome i (could be found in k ÀMCV).
T43 16449-16539 Epistemic_statement denotes Thus, we may define a score for the conditional multinomial distribution with a fixed k as
T44 16540-16865 Epistemic_statement denotes where the first sum is for all patterns of the conditional multinomial distribution with a fixed k. We believe by considerably extending the basic pattern counting idea and thus studying their underlying distribution, we are able to discover unusual patterns to automatically distinguish their roles in shaping the evolution.
T45 16866-17080 Epistemic_statement denotes In this case, the largest score of conditional multinomial distribution, the k ÀMCV might be considered as the most representative for the species, while not as abnormal outliers from the pure statistical analysis.
T46 17436-17624 Epistemic_statement denotes Moreover, we can define the relative ratio of information involved in a certain conditional multinomial distribution with a fixed k as the k ÀMCV to the CMCV which will involve the kÀ MCV.
T47 17625-17718 Epistemic_statement denotes Form Table 3 , we can clearly see that the relative ratio of 7À MCV is the maximum 839 1408 .
T48 17923-18280 Epistemic_statement denotes DNA sequence databases have accumulated much data on biological evolution during billions of years, consequently novel concepts and methods are urgent need to reveal the biological functions of DNA sequences information, to investigate relationships of DNA sequences with biological evolution, cellular function, genetic mechanism and occurrence of illness.
T49 18451-18758 Epistemic_statement denotes From the conditional multinomial distribution profiles about nine chromosomes, we note that the relative error vector between the measured conditional multinomial distribution and the reference conditional multinomial distribution can be used as a genomic signature, thus allowing the comparison of species.
T50 19046-19135 Epistemic_statement denotes The phylogenetic tree can be gotten through the distance matrices using the UPGMA method.
T51 19389-19480 Epistemic_statement denotes 4 also indicates that SARS-CoVs are not closely related to any groups and form a new group.
T52 19637-19813 Epistemic_statement denotes Thus this opinion can then be used to guide the development more powerful measures for sequence comparison with future possible improvement on the correlation structure of DNA.