PMC:4996396 / 18382-22369 JSONTXT

Annnotations TAB JSON ListView MergeView

{"target":"https://pubannotation.org/docs/sourcedb/PMC/sourceid/4996396","sourcedb":"PMC","sourceid":"4996396","source_url":"https://www.ncbi.nlm.nih.gov/pmc/4996396","text":"2.5. Data Processing\nStatistical analysis of the microarray data was performed using the R-based package BRB-ArrayTools, developed by Richard Simon [35]. Background-corrected median fluorescence values were loaded into the program and were log2-transformed. Furthermore, an average over replicate spots was calculated since all proteins were printed in duplicates resulting in a microarray comprising approximately 32,000 features. The intensity minimum threshold was set to a value of 100. If a feature was flagged in at least 50% of the arrays, it was excluded from analysis. \nData was normalized using different approaches and was further subjected to different statistical analyses. One method was quantile normalization, which equals distribution of values among different arrays. The second approach was data transformation using “Combating batch effects when combining batches of gene expression microarray data” (ComBat) in order to adjust data for batch effects [46]. For the application of ComBat, unnormalized data was used and missing values, which were due to flagging of low-quality features, were imputed. This was done using the package ‘impute’ for imputation of missing microarray data in R [47]. The third approach was distance weighted discrimination (DWD) which adjusts for systematic microarray data biases [32]. By means of using DWD, unnormalized data is loaded into the software but only two batches can be merged at one time. Therefore, a stepwise approach was applied for data adjustment. According to the PCA plot of the unnormalized data (Figure 3), two main groups of batches can be observed in the raw data. Therefore, the runs of each group were merged first before merging the two main groups in the last step. A similar approach was used by Benito and colleagues [32]. PCA [48] and PVCA [49] were used to investigate batch effects and underlying sources for variation.\nFurther statistical analysis was performed with BRB-ArrayTools. Mainly class prediction with complete leave-one-out cross-validation (LOOCV), which includes the steps of feature selection and classification model development, and class comparison were applied to different data subsets, which were pre-processed by different methods. Analysis was performed for all lung cancer cases versus all controls as well as for distinct histological types of lung cancer and their matched controls. Class comparison analysis was performed with a significance threshold of p \u003c 0.001, resulting in a list of significant differentially reactive antigens when comparing two sample groups. These lists were generated with unnormalized, quantile normalized, DWD-adjusted, and ComBat-adjusted data. Besides analysis of the sample groups, including all lung cancer samples versus all controls and the groups comprising only the histological subtype of lung cancer versus statistically matched controls, class comparison analysis was applied on the data of each histological subtype derived from single processing batches (run 1 to run 4). Moreover, cross-run class comparison was performed for each histological entity. The resulting lists of significant antigens were compared to each other for overlaps. \nClass prediction analysis was performed on the same data set including the analysis of all lung cancer cases versus all controls with quantile normalization and ComBat adjustment, as well as the analysis of distinct histological entities derived from multiple runs versus matched controls using quantile normalization, ComBat, and DWD adjustment. BRB-ArrayTools provides this supervised machine learning method with LOOCV. As feature selection methods, the greedy-pairs method [46] and the recursive feature elimination (RFE) method [50] were used. Methods used for model development were diagonal linear discriminant analysis, compound covariate predictor, nearest neighbor classification, nearest centroid classification, support vector machines and Bayesian compound covariate predictor [35].","divisions":[{"label":"Title","span":{"begin":0,"end":20}}],"tracks":[{"project":"2_test","denotations":[{"id":"27600218-19455231-69481430","span":{"begin":149,"end":151},"obj":"19455231"},{"id":"27600218-14693816-69481431","span":{"begin":1330,"end":1332},"obj":"14693816"},{"id":"27600218-14693816-69481432","span":{"begin":1798,"end":1800},"obj":"14693816"},{"id":"27600218-19455231-69481433","span":{"begin":3983,"end":3985},"obj":"19455231"}],"attributes":[{"subj":"27600218-19455231-69481430","pred":"source","obj":"2_test"},{"subj":"27600218-14693816-69481431","pred":"source","obj":"2_test"},{"subj":"27600218-14693816-69481432","pred":"source","obj":"2_test"},{"subj":"27600218-19455231-69481433","pred":"source","obj":"2_test"}]}],"config":{"attribute types":[{"pred":"source","value type":"selection","values":[{"id":"2_test","color":"#9493ec","default":true}]}]}}