3. Results In this study, samples from 100 lung cancer patients and 100 lung cancer-free controls were processed on the 16k protein microarray comprising 5449 recombinantly expressed human proteins derived from 15,417 cDNA clones. These proteins served as potential antigens and therefore capture molecules for tumor-specific autoantibodies from plasma. Purified IgG samples isolated from plasma were used for array processing. Preliminary results, obtained during development and optimization of the 16k protein microarray, showed that binding of non-IgG serum proteins to the microarray surface is influencing the specificity and reproducibility of the assay [51]. As depicted in Figure 1, across all isolated IgG samples the median concentration of IgG isolated from cancer cases (14.1 mg mL−1 ± 0.47) is significantly different from the median concentration of IgG isolated from lung cancer-free controls (10.9 mg mL−1 ± 0.33) (p < 0.0001 using unpaired t-test). To identify a set of antigens representative for the lung cancer-specific humoral immune response in cancer patients, intensity data gained from array images were analyzed using BRB-ArrayTools. The workflow of this study is shown in Figure 2. Figure 1 Boxplot of immunoglobulin G (IgG) concentrations from plasma of 100 controls (10.9 mg mL−1 ± 0.33) compared to 100 lung cancer cases (14.1 mg mL−1 ± 0.47). IgG concentrations of cases and controls are significantly different (unpaired t-test, p < 0.0001). Significance is indicated by asterisks (*** for p-value <0.001). Figure 2 (a) Study workflow including the production of 16k protein microarrays, array processing with 200 clinical samples, data pre-processing, data visualization, and statistical analysis of microarray data using BRB-ArrayTools. (b) Illustrated microarray processing workflow. The sample set (n = 200) was processed in six distinct runs, each including up to 36 samples. Cases and controls included in one run were statistically matched for sex, age, and smoking-habits. In the first four runs, each run included only one histological cancer subtype matched with hospital-based controls. Run 5 and 6 included cases from two subtypes matched with selected controls. To check the quality of the spotting process, each slide was scanned before sample processing and slides showing irregularities were excluded from sample analysis. Furthermore, we used an anti-Penta-His Alexa Fluor 532-conjugated antibody (Qiagen, Hilden, Germany) for processing of slides derived from different printing batches to investigate reproducibility of spotting. Correlation of log2 intensities of the detected His-tag proteins is in average 0.93 when comparing slides of different printing batches. A preliminary test, which included samples from three different individuals, was performed using purified IgG from the same IgG purification preparation. Each specimen was processed in three replicates on the same day and in three replicates on the consecutive day. Correlation between the replicates was calculated. Pearson’s correlation coefficient for intra-experiment replicates was in average 0.96 and for inter-experiment replicates 0.95. Furthermore, the intra-experiment median coefficient of variation (CV) ranges from 0.20% to 0.63% and the inter-experiment median CV ranges from 0.53% to 1.00% (Supplementary Figure S3). CV values were calculated for each protein of the array. Additionally to this preliminary experiment, one run of samples had to be repeated due to quality irregularities of the 16k slides. The best 17 slides from these repeatedly processed slides were used to investigate the reproducibility of the samples when using a different IgG purification preparation and different batches of spotted microarrays. Although the samples were processed on slides with a lower quality, resulting Pearson’s correlation coefficients range from 0.79 to 0.93 for log2-transformed raw data with an average of r = 0.86, and between 0.96 and 0.99 with an average of r = 0.98 upon removal of batch effects with ComBat. For further statistical analysis the replicates of samples were not included. On the 16k protein microarray, each protein was printed in duplicates. Duplicate spots show a very high correlation of signals within an array. We calculated the correlation between duplicate spots within 38 arrays and obtained an average Pearson’s correlation coefficient of 0.98. This high correlation indicates that duplicate spots are sufficient to ensure reproducibility of the measured signals. As positive controls on the 16k protein microarray different concentrations of E. coli lysate (0.3 mg mL−1, 0.4 mg mL−1, and 0.5 mg mL−1) were used. Fluorescent intensities of these control spots increase in a linear manner with increasing concentration, resulting in an average coefficient of determination of 0.97 over all processed samples. With respect to array processing with purified IgG in different concentrations, it was shown that fluorescent intensity increases in a linear manner with increasing IgG concentration [51]. microarrays-04-00162-t002_Table 2 Table 2 Experimental design. Each run indicates one batch of samples, which was processed in one day. In total, all 200 samples of the sample set were processed in six distinct runs. Since our protein microarray data is derived from different processing batches, systematic variation in the merged data would affect subsequent data analysis. Therefore, three different normalization or data adjustment strategies were applied on the data set in order to investigate the effect of these methods on the analysis results. After normalization of the data, principal component analysis was used to investigate the properties of the data set. Additionally, class comparison was used to identify significant antigens, which are differentially reactive to autoantibodies of lung cancer patients compared to lung cancer-free controls. 3.1. Principal Component Analysis and Principal Variance Components Analysis High-dimensional microarray data can be visualized by the reduction of dimensions by means of principal component analysis (PCA). The principal components, representing directions of the data with maximal variation, reveal similarities and differences between samples [52]. Figure 3 shows the first three principal components of the present microarray data set. Each sphere represents one sample and its color indicates the batch of sample processing, referred to as “run”. In the unnormalized data (Figure 3a), two groups of samples can be observed. The first three runs cluster into one group, the second group consists of the remaining three runs. One group consists of samples from three different histological types of lung cancer and their statistically matched controls, whereas the second group comprises all four used histological types of lung cancer and their matched controls, meaning that in both groups, three of the four used histological types are present. The PCA plot of the quantile normalized data is depicted in Figure 3b. This normalization method does not change the grouping of the batches compared to unnormalized data. The separation between the distinct runs in each group is even more obvious, especially between run 1, run 2, and run 3. By means of data adjustment with DWD (Figure 3c) and ComBat (Figure 3d), the previously observed groups become dispersed. In particular the ComBat algorithm achieves a more even distribution of the batches, whereas with DWD an accumulation of most samples in the center surrounded by a few samples outside of this center can be observed. Figure 3 Principal component analysis (PCA) plot showing the first three principal components of the microarray data set (a) unnormalized, (b) quantile normalized, (c) ComBat-adjusted, and (d) DWD-adjusted. Each sphere represents one sample; the color of the sphere indicates the different experimental runs. Run 1 to run 6 is represented by the following colors: run 1 = orange, run 2 = cyan, run 3 = dark blue, run 4 = yellow, run 5 = dark green, run 6 = pink. A novel method for estimation of sources for variability in microarray data is principal variance components analysis (PVCA). In this analysis, a threshold for the percentage of variability is defined and the number of principal components representing this percentage is included in the further steps [34]. Here, the threshold was set to 60%. The defined principal components (PCs) are matched to known sources of variation by variance component analysis (VCA). The resulting weighted proportion of variance shows the proportional effect of each source on the data set [28]. The PVCA analysis of the unnormalized data (Figure 4a) shows that the highest contribution (49.5%) to variation is due to the experimental run. The second highest contribution is due to undefined residual effects (38.6%) which represent the variance in the data set which cannot be explained by known factors. The third most weighted variance is contributed by the combination of experimental run and sample type (lung cancer case or control). When using quantile normalization (Figure 4b), the contribution of the experimental run increases to 52.5%. This effect can be reduced to 0% when using ComBat (Figure 4c) and to 0.001% by means of DWD (Figure 4d). The combination of the effects “experimental run” and “sample type” remain similar when using quantile normalization and ComBat, only with DWD a slight increase to 1.11% can be observed. Other factors, like age, sex, and smoking habit, do not have any effects (0.02%–0.06%) independent of the data pre-processing method. The sample type (lung cancer case or cancer-free control) has just a slightly higher effect on variance in each data set (0.11%–0.12%). Figure 4 Principal variance components analysis (PVCA) of the microarray data set (a) unnormalized, (b) quantile normalized, (c) ComBat-adjusted, and (d) DWD-adjusted. Contribution to variance was estimated for the following factors: run = experimental runs (run 1 to run 6), type = sample type (cancer or control), sex = female/male, age = age group (0–56, 56–64, 64–70, 70–100 years), smoking = smoking habit (current, never, former smoker), resid = residual weighted average proportion variance. Single factors were investigated as well as combinations of factors. 3.2. Class Comparison Analysis Class comparison with a significance threshold of p < 0.001 was performed after quantile normalization, DWD, and ComBat adjustment of the data and significant antigens were compared between these methods. The overlap between all analyses for investigation of the used normalization and batch removal methods is summarized in Table 3 (details are given in the Supplementary Figure S4). The highest number of significant antigens is achieved upon ComBat and quantile normalization. Much lower numbers are significant when analyzing DWD-adjusted and unnormalized data. microarrays-04-00162-t003_Table 3 Table 3 Number of significant antigens (p < 0.001) resulting from class comparison analysis. Rows show results of different data pre-processing methods, columns indicate the analyzed sample group (“r” means experimental run). Each histological sample group was processed in a distinct run comprising only samples of one subtype as well as in a mixed run comprising samples of two subtypes and controls. Therefore, class comparison analysis could be performed either run-wise on each histological group, including samples derived from a single run or cross-run analysis was done, including samples of the single run combined with the samples of the respective subtype derived from the mixed run. For each histological subtype, overlaps of the significant antigens derived from class comparison analysis including either data of a single run or data from two combined runs were calculated (Table 4). This enables to investigate if the significant antigens derived from the analysis of a single run remain significant in cross-run analysis. For example, the case of the analysis of the SCLC group, class comparison analysis was done in two ways: including only samples of “run 1”, and combining the samples of “run 1” with the SCLC samples of “run 5”. Then, the overlap of the significant antigens was calculated. This was done using data from each histological group, using all three normalization methods as well as unnormalized data. When comparing the three different normalization strategies, ComBat adjustment yields the highest overlap of significant antigens when comparing analyses from single runs to cross-run analyses, ranging from 25.7% for SqLC to 43.4% for SCLC. With quantile normalization the second highest overlaps have been achieved, ranging from 18.3% for SqLC to 41.6% for SCLC. The lowest number of overlaps was achieved with DWD, which has even lower overlaps for the SCLC (13.7%) and SqLC (9.5%) compared to the unnormalized data (17.8% and 10.0%, respectively). Interestingly, for the histological group of LCLC the normalization method did not influence the overlap at all, ranging from 38.9% to 39.6%. The overlap for AdCa is comparable for all three adjustment methods (30.0% for DWD, 31.7% for quantile normalization, and 38.3% for ComBat). microarrays-04-00162-t004_Table 4 Table 4 Overlap (%) of significant antigens between single-run and cross-run class comparison analysis (p < 0.001). This calculation was done with the same data set in an unnormalized state, quantile-normalized, DWD-adjusted data, and ComBat-adjusted data. SCLC = small cell lung cancer, SqLC = squamous cell lung cancer, LCLC = large cell lung cancer, AdCa = adenocarcinoma. Comparison of congruent significant antigens derived from class comparison analysis (p < 0.001) with different normalization methods revealed that a high proportion of antigens are shared between quantile normalization and ComBat adjustment. This can be observed when analyzing cases and controls derived from one run (exemplified for SCLC in Figure 5a) compared to two processing runs (exemplified for SCLC in Figure 5b). For SCLC, all antigens which are significant in the unnormalized data occur in at least one list of significant antigens of a pre-processed data set. The number of significant antigens in the unnormalized and DWD-adjusted data is very low compared to the quantile-normalized and ComBat-adjusted data. However, a relatively high number of antigens is shared by all four differently pre-processed data sets; 15 antigens in the case of the run-wise analysis and 45 antigens in the case of the cross-run analysis (Figure 5a,b). This can also be observed for the LCLC group, while in the SqLC and the AdCa group the number of overlaps between all four methods is relatively low (see Supplementary Figure S5). Figure 5 Overlap of significant antigens derived from class comparison analysis (p < 0.001) with different normalization methods for the SCLC group. Data was analyzed unnormalized, quantile normalized (QN), DWD-adjusted, and ComBat-adjusted. Significant antigens of SCLC versus controls from (a) single run analysis (“run-wise” analysis) and (b) analysis of two combined experimental runs (“cross-run” analysis). Highest numbers of overlapping antigens are observed between QN and ComBat (318 and 256, respectively). (c) Excerpt of Supplementary Figure S4 showing relative overlaps (% representing the intersections of findings depicted in Figure 5a,b) of all class comparison analyses results between quantile normalization and ComBat adjustment. Comparing relative intersections, the overlaps of significant antigens between quantile normalization and ComBat adjustment ranges from 74.9% to 76.8% for single-run analyses and 42.9% to 62.3% for cross-run analyses (Figure 5c). Considering the overlaps between all adjustment methods as depicted in Supplementary Figure S4, quantile normalization and ComBat adjustment yield the highest values. The sample size plug-in from BRB-ArrayTools was used to calculate the number of samples needed per sample class for identifying antigens that are differentially reactive between controls and histological subtypes. For this calculation, we selected an accepted type 1 error rate (α) of 0.001, a power (1-β) of 0.9 and a mean difference in log2 expression between classes of 0.585 (equates a fold-change of 1.5). An estimated required sample size of 21, 21, 21, and 20 for SCLC, SqLC, LCLC, and AdCa, respectively, was calculated when using the 50th percentile of the variance distribution. Therefore a good statistical power is given when using 25 samples per group. 3.3. Class Prediction Analysis Class prediction analysis was used to construct multivariate predictors to classify arrays into pre-defined classes based on autoantibody profiles among these classes. The emphasis is to develop an accurate multivariate classifier, which is capable of predicting to which class a future sample belongs. Table 5 provides an overview of class prediction results performed using either the whole data set or only samples having the same histology and matched controls. Although DWD showed the lowest performance in class comparison analysis, this pre-processing method was also included for further class prediction besides quantile normalization and ComBat for evaluation of class prediction performance of all methods. Class prediction analysis of DWD-adjusted data resulted in lower accuracy for the distinct histological entities (79% to 92%) of lung cancer, except for AdCa (91%), compared to quantile normalized and ComBat-adjusted data. In the following, class prediction results of quantile normalized and ComBat-adjusted data are outlined. Different feature selection algorithms and best predictor models have been used to compare the performance of different settings. It can be seen that overall sensitivity and specificity values are higher than 80%. Correct classifications rates, representing the correctly classified samples as case or control using the calculated classifier panel are also comparable, ranging from 79% to 85% when applying class prediction on all lung cancer cases versus all controls. Using distinct histological subtypes versus matched controls, correct classification rates from 83% to 98% could be achieved. In general, class prediction performed only with data from one histological subtype yielded higher accuracy (0.83–0.98), sensitivity (0.80–1.00) and specificity (0.83–0.96) values than analyzing all lung cancer samples versus all controls with an accuracy of 79%–85%, sensitivity of 0.76–0.90, and specificity of 0.75–0.85. Furthermore, it can be seen that when performing class prediction analysis for SCLC or AdCa the quantile normalized data yielded a higher accuracy than ComBat data, whereas for LCLC and SqLC ComBat was favorable. Venn diagrams illustrated in Figure 6 represent cross-sectional antigen overlaps calculated between the four histological subtype-specific classifier lists (calculated with class prediction analysis using 100 RFE, Figure 6). Only a minor proportion (zero to six) of antigens does overlap. In general, there are no common antigens among all four histological subtypes. There is almost no difference in the number of overlapping antigenic proteins based on quantile normalized and ComBat adjusted data. microarrays-04-00162-t005_Table 5 Table 5 Performance of classifier panels calculated with class prediction analysis. Different normalization strategies were used and different subsets of samples were analyzed. The analysis of all samples together means that all four histological types of lung cancer were merged to one cancer class. a greedy-pairs algorithm (GP), recursive feature elimination (RFE); b Nearest Neighbor classification (NN), support vector machine(SVM), Compound Covariate Predictor (CCP), Nearest Centroid (NC), Diagonal Linear Discriminant Analysis (DLDA); c correct classification rate. Figure 6 Venn diagram showing classifier overlaps from class prediction results with respect to histological subtypes. Each circle represents a classifier (100 antigens), specific for each histological subtype, found by means of class prediction analysis using recursive feature elimination (see Table 5) as feature selection method. (a) Quantile-normalized data, (b) ComBat-adjusted data. Moreover, the intersection of histological subtype-specific classifiers calculated with class prediction analysis using 100 RFE between ComBat adjusted data and quantile normalized data is approximately 50%. The 100-antigen-classifiers specified by quantile normalization versus ComBat for SCLC, SqLC, LCLC, and AdCa were overlapping by 56, 45, 46, and 52 antigens, respectively. The intersection of classifying antigens for DWD versus quantile normalization accounts for 13, 15, 18, and 16 antigens, and versus ComBat adjustment for 12, 12, 15, and 12 antigens, for SCLC, SqLC, LCLC, and AdCa, respectively. Classifiers are published and covered in the European Patent Application “Lung cancer diagnostic method and means”, Publication Number EP2806274A1 [43]. Two representative classifiers distinguishing between all cancer cases and cancer-free controls are given in the supplementary section (Supplementary Tables S1 and S2).