5. Conclusions Since conventional diagnostic methods often fail to detect lung cancer at an early treatable stage, tumor-specific autoantibodies, formed already very early in tumor development, represent an attractive alternative for lung cancer diagnosis. In the present study, a protein microarray-based screening for identification of tumor autoantibody markers for lung cancer was performed. In order to evaluate a valid data pre-processing strategy to elucidate reliable candidate classifiers, different normalization and data adjustment methods have been compared. PCA and PVCA plots have shown that only DWD and ComBat, both batch effect removal methods, were able to eliminate non-biological variations from the data set. Further, class comparison analysis yielded the highest number of retained significant antigens when comparing single-run versus cross-run analysis of the data sets adjusted with quantile normalization and ComBat. But, as quantile normalization fails to remove batch effects in the data set, we concluded that using ComBat-adjusted data is the best way to pre-process data sets from protein microarrays derived from multiple runs. Class prediction analysis subsequent to ComBat adjustment yielded correct classification rates of 85% for all lung cancer entities combined and even higher correct classification rates for distinct histological lung cancer types ranging from 85% to 98% (Table 5). To the best of our knowledge, these high numbers of correct classification outperform accuracy values of current diagnostic methods. Handling a large data set and applying different data manipulation algorithms have again shown that adjustment methods and data normalization indeed have a strong influence on subsequent analysis results and should therefore never be applied without critical evaluation using different statistical methods or data visualization tools. Furthermore, with the results of this study we could show that performing a protein microarray-based screening together with an elaborate experimental design, including samples from each main histological subtype of lung cancer, allows detecting tumor-specific autoantibody signatures which are able to cover the whole molecular complexity of lung cancer. However, to fulfill the urgent clinical demand of biomarkers for early diagnosis of lung cancer we encourage the validation of the resulting classifier antigens in larger study populations. Therefore, validation of candidate markers on targeted arrays using planar microarrays or the Luminex® microsphere-based system will be the next step. The data-handling and evaluation procedure as described here would be helpful for biomarker stratification of other cancerous or complex diseases, especially when disease-specific autoantibody signatures or immune-profiles shall be used for biomarker development. Furthermore, for warranting appropriate sample-numbers and statistical power in microarray analyses, multiple experimental runs might have to be conducted in almost all studies. Therefore, the sequential approach to evaluate batch effect removal efficacy and testing the intersection of significant features from single and multiple experimental runs for distinguishing e.g., cases versus controls, would be a prerequisite to choose the most reliable candidates from microarrays and other highly multiplexed methods for confirmation and further biomarker development.