PMC:100321 / 8247-11538 JSONTXT

Annnotations TAB JSON ListView MergeView

{"target":"http://pubannotation.org/docs/sourcedb/PMC/sourceid/100321","sourcedb":"PMC","sourceid":"100321","source_url":"https://www.ncbi.nlm.nih.gov/pmc/100321","text":"Probe finding\nThe probe finding strategy is devised in a way (i) to avoid the need for exact alignments, (ii) to check probe specificity along the whole available sequence and (iii) to optimize performance. The workflow is depicted in Figure 1. It starts with a database in which each organism is represented by a single continuous sequence, such as a defined region of the 18S or 28S ribosomal genes. From this it takes first the sequences of the In-group organism(s) for which specific probes should be found and cuts these into short pieces of the specified oligo-nucleotide length (set as a program variable), following an approach proposed by Bavykin et al [11]. This is accomplished by a sliding window scheme with 1-nucleotide shifts across the whole length of the sequence(s). Two separate lists are created in this way. The first list is simply a straight list of all possible fragments from all In-group organisms. The second one consists of an array of lists for each of the In-group organisms (the two lists are identical if only one In-group organism is chosen). All duplicate oligos from the first list are then removed and each of the remaining oligos is checked whether it matches with each of the In-group organisms in the second list. A match is positive, when the relative melting probability is within the range of 0–25%, employing the function of Equation 4. Thus, this first calculation simply ensures that all candidate probes match with all In-group organisms. This calculation would be largely dispensable, if only a single In-group organism is chosen.\nFigure 1 Scheme of the probe finding algorithm. Details are explained in the text. The next step is to subtract all oligos that match in any of the Out-group organisms. To avoid the comparison of all candidate oligos against all Out-group sequences, we identify first a group of sequences that is closely related to the In-group. For this one requires a rough alignment of all sequences, to calculate percentage similarity between them. Note that this serves only to identify a subgroup of sequences for speeding up the calculations, i.e. mistakes in the alignment are of no concern. The similarity calculator in the program extracts this related group of sequences by a simple percentage identity calculation across the given alignment. All sequences that are at least 90% similar to the In-group are used as Related-group. This percentage can be set as a program variable and should be set such that the Related-group does not become more than 5–10% of all sequences.\nThe sequences of the Related-group are again converted into a fragment list as above, duplicates are removed and all candidate oligos are matched with this list. Now only those oligos are retained, which have a melting probability of at least 75% (the exact percentage values are program variables). The majority of oligos is removed in this step. The remaining candidate oligos are then matched against the remaining sequences in the Out-group with the same cut-off criterion.\nThis stepwise selection scheme allows to significantly speed up the calculations even for very large datasets, but still ensures that all oligo-nucleotides of the desired length were directly or indirectly matched against all possible other oligos in the database.","divisions":[{"label":"title","span":{"begin":0,"end":13}},{"label":"p","span":{"begin":14,"end":1577}},{"label":"figure","span":{"begin":1578,"end":1661}},{"label":"label","span":{"begin":1578,"end":1586}},{"label":"caption","span":{"begin":1588,"end":1661}},{"label":"p","span":{"begin":1588,"end":1661}},{"label":"p","span":{"begin":1662,"end":2548}},{"label":"p","span":{"begin":2549,"end":3026}}],"tracks":[]}