PMC:4067983 / 7357-11287
Annnotations
2_test
{"project":"2_test","denotations":[{"id":"24629096-12045153-8224319","span":{"begin":213,"end":215},"obj":"12045153"},{"id":"24629096-15652477-8224320","span":{"begin":541,"end":542},"obj":"15652477"},{"id":"24629096-15383676-8224321","span":{"begin":2641,"end":2642},"obj":"15383676"}],"text":"Implementation\nThe implementation of the mrSNP is presented in Figure 1. All the 3’ UTR sequences and phastCons scores of the each nucleotides are downloaded from the UCSC Database using the Genome Table Browser [22]. Each chromosome is stored in a single file, where each sequence has information including gene name and 3’UTR sequence coordinates. All available miRNAs are downloaded from the mirBase database and clustered according to their conservation across species using the information obtained from the TargetScan prediction tool [3,23]. The software accepts input SNPs with the related information containing the organism, the assembly according to which the mapping is done, the chromosome on which the SNP is located, the position of the SNP in the given chromosome, and the SNP alleles. Once this information is provided, the software searches for the sequence where it is located. If SNP is not located in a denoted 3’ UTR sequence, no further calculation is done and the software reports the SNP as, “not in 3’UTR”. If the SNP is found in a 3’UTR, the 79 basepairs (bp) of sequence that contains the SNP at the center is returned at this step. This length (79bp) was chosen based on the observation that the typical maximum size of an miRNA is 25 bp and a maximum 15 bp loop is allowed in the binding. Therefore, we allow a miRNA binding site to have a maximum length of 40 bp. If a SNP is to affect miRNA binding, it will be located in the miRNA’s binding site whose start/end nucleotide can be at most 39 bp apart from the SNP. Therefore, a 79 bp sequence (40 bp + 39 bp) is the minimal sequence to use for calculating potential miRNA binding differences. Once this sequence is obtained, it is duplicated and each SNP allele is substituted in the correct position. After this point, for each miRNA stored we check the existence of a minimum of 6 consecutive Watson-Crick (W-C) matches starting from second position of the miRNA seed region.\nFigure 1 The workflow of mrSNP software. The remainder of the approach is adapted from [9]. A sequence with 6 (7, 8, or 9) consecutive matches is called a 6mer (7mer etc.). We allow a single G:U wobble for 7, 8 and 9mer sequences. If no instances satisfy matching criteria, the miRNA and the sequence couple are not investigated further, and we conclude that the miRNA does not bind to this region. Moreover, if the sequence has at least 7 Watson-Crick matches in the seed region, it is considered as a miRNA binding site immediately. For weaker bindings such as the 6mers, or 7, 8 and 9mer sequences containing a single G-U wobble, we calculate the binding energy with RNAhybrid [5]. RNAhybrid runs a dynamic programming algorithm that finds the suboptimal binding energy between 2 sequences. For 6mers and 7mers (8mers and 9mers), we say that microRNA binds to a sequence if its binding energy is higher than 74% (60%) of the maximum binding energy. The numbers and methods used are adapted from [9]. For a given SNP-miRNA couple, the steps explained above are followed for both of the SNP sequences. If one of the them satisfies the binding criteria, while the other does not, we report this as a binding difference.\nIn the literature, many of the prediction tools apply a post-processing step to reduce the false positive rate of the binding predictions. This is performed using the conservation of the target site across different species. If the target site is conserved over different species, the binding possibility is considered to be higher. Although mrSNP does not filter out the results with this post-processing method, it calculates the conservation score (CS) of the seed region using the phastCons scores provided by UCSC database. For each prediction, CS is obtained as the average phastCons score of the nucleotides in the seed region. Then, it reports the probabilistic CS of the seed region as well as the conservation of the miRNA over the species."}