PMC:4331676 / 15490-17536 JSONTXT

Annnotations TAB JSON ListView MergeView

{"target":"https://pubannotation.org/docs/sourcedb/PMC/sourceid/4331676","sourcedb":"PMC","sourceid":"4331676","source_url":"https://www.ncbi.nlm.nih.gov/pmc/4331676","text":"Support vector machine\nSupport vector machine is a machine learning algorithm based on statistical learning theory presented by Vapnik (1998) [66], which uses a non-linear transformation to map the input data to a high dimensional feature space where linear classification is performed. It is equivalent to solving the quadratic optimization problem:\n(5) min w , b , ξ i 1 2 w * w + C ∑ i ξ i\n(6) s .t .  y i ( ϕ ( x i ) * w + b ) ≥ 1  -  ξ i , i = 1 , . . . , m ,   ξ i ≥ 0 , i = 1 , . . . , m ,\nWhere xi is a feature vector labeled by yi ∈ {-1, +1} and C, called the cost, is the penalty parameter of the error term. The above model called soft-margin SVM can be able to tolerate noise within the data, which analyze an example by generating a separating hyper-plane with f(x) = ϕ(x)·w + b = 0. Through resolving the above model with lagrangian multiplier method, we obtain w=∑jαh*yj*ϕ(xj) and w⋅ϕ(xi)=∑jαh*yj*ϕ(xj)⋅ϕ(xi), which provides an efficient approach to solve SVM without the explicit use of the non-linear transformation, where K(xi,xj)=ϕ(xi)⋅ϕ(xj)is the kernel function. Application of SVM in bioinformatics problems has been widely explored [15,67-69]. At present, the publicly available LIBSVM, which take the radial basis function (RBF) as the kernel function, is employed as the implementation of SVM. RBF is defined as below\n(7) K ( X i , X j ) = exp ( - γ X i - X j 2 )\nIn this study, SVM parameter γ and penalty parameter C were optimized based on 5-fold cross validation in a grid-based manner with respect to the sequence in benchmark dataset. In this study, jackknife test is taken as the evaluation method to calculate the evaluation criteria. For a dataset with N sequences, each time, one of sequence is taken out as testing sequence and the remaining sequences are employed as training dataset. This process repeated until each sequence in the dataset is tested exactly once. The average performance over all the processes is taken as the final results.","divisions":[{"label":"title","span":{"begin":0,"end":22}},{"label":"p","span":{"begin":23,"end":350}},{"label":"p","span":{"begin":351,"end":411}},{"label":"label","span":{"begin":351,"end":354}},{"label":"p","span":{"begin":412,"end":541}},{"label":"label","span":{"begin":412,"end":415}},{"label":"p","span":{"begin":542,"end":1387}},{"label":"p","span":{"begin":1388,"end":1454}},{"label":"label","span":{"begin":1388,"end":1391}}],"tracks":[{"project":"2_test","denotations":[{"id":"25708928-16284202-14842651","span":{"begin":1201,"end":1203},"obj":"16284202"},{"id":"25708928-24931825-14842652","span":{"begin":1204,"end":1206},"obj":"24931825"},{"id":"25708928-24504871-14842652","span":{"begin":1204,"end":1206},"obj":"24504871"},{"id":"25708928-19046430-14842652","span":{"begin":1204,"end":1206},"obj":"19046430"}],"attributes":[{"subj":"25708928-16284202-14842651","pred":"source","obj":"2_test"},{"subj":"25708928-24931825-14842652","pred":"source","obj":"2_test"},{"subj":"25708928-24504871-14842652","pred":"source","obj":"2_test"},{"subj":"25708928-19046430-14842652","pred":"source","obj":"2_test"}]}],"config":{"attribute types":[{"pred":"source","value type":"selection","values":[{"id":"2_test","color":"#93ecbc","default":true}]}]}}