PMC:1524773 / 3339-4386 JSONTXT

Annnotations TAB JSON ListView MergeView

{"target":"https://pubannotation.org/docs/sourcedb/PMC/sourceid/1524773","sourcedb":"PMC","sourceid":"1524773","source_url":"https://www.ncbi.nlm.nih.gov/pmc/1524773","text":"One kind of clustering can be obtained by using BlastClust from the NCBI-BLAST package [2]. It is a clustering tool designed especially to cluster protein or DNA sequences based on pairwise matches returned by the BLAST algorithm. It uses BLAST scores to assign statistical significance, matches pairs that reach that level of significance, then constructs clusters using a simple, greedy, single-linkage clustering method. However, empirically it is known that BlastClust is very limited with respect to the size of the input set. Furthermore, in many instances, the application of this heuristic is also limited by the fact that one unique clustering does not exist and that in fact many clusterings are possible and reasonable for different applications: for example, when annotating putative domains of a protein sequence, it is important to find common regions that are shared by large groups of sequences, while for the computation of phylogenetic trees it is important to find groups that share the largest possible part of their sequences.","tracks":[]}