Discussion
OSCC is associated with substantial mortality and morbidity [58]. To identify potential biomarkers for early detection of invasive OSCC, microarray experiments have been conducted, and these kinds of microarray datasets have accumulated rapidly in the public database. However, there are many datasets that include insufficient sample sizes for detecting significant genes by statistical analysis. Therefore, this study attempted to combine several microarray datasets from a public database to identify significant candidates as biomarkers.
In a microarray data analysis, the information from different datasets obtained under different experimental conditions may be inconsistent even though they are performed with the same research objectives. Moreover, even when the datasets are generated by the same platform, the data agreement may be affected by technical variations between laboratories. In such cases, it could be necessary to use a combined dataset after adjusting for the differences between such datasets for detecting the more reliable information. Combining datasets is especially useful in OSCC microarray datasets, because there are many datasets with insufficient sample sizes for analysis [4, 5, 59, 60].
For identifying significant genes classifying tumor and normal groups, we achieved two microarray datasets from a public database, GEO. They included 20 and 27 samples, and each sample size was unbalanced between the different groups. By combining these two datasets, the sample size was increased, and we had a sufficient sample size for statistical analysis, even though it was still unbalanced. When these datasets were combined, we used the rank of gene expression, because the scale of gene expression was different. In this study, we identified 51 significant genes from a combined dataset, and this number could be increased or decreased by the significance level (we used 0.005). The selected 51 genes were upregulated in tumor tissues. Many of the selected genes were proven to be cancer-related genes by previous studies.
SOD2 is associated with lymph node metastasis in OSCC and may provide predictive values for the diagnosis of metastasis [10]. Metastasis is a critical event in OSCC progression. An SOD2 variant has also been associated with increased breast cancer and ovarian cancer risk in previous studies [47, 61]. TopBP1 included eight BRCT domains (originally identified in BRCA1), and it was proposed as a breast cancer susceptibility gene [18, 62].
By semiquantitative reverse transcription PCR analysis, RHEB was shown to be upregulated in OSCC [9]. In salivary cancer, survival probability rates dropped when Skp2 was overexpressed [7]. Overexpression of Skp2 is associated with the reduction of p27 (KIP1) expression and may have a role in the progression of OSCC [25].
The expression of RCN2 was linearly related to the tumor mass increase, and its expression was increased in breast cancer [16]. PTPRK was proven as a candidate gene of colorectal cancer [19], and it is a functional tumor suppressor in Hodgkin lymphoma cells [20]. DMTF1 was shown to be amplified in adenocarcinoma of the gastroesophageal junction, residing at 7q21 by aCGH experiments [21]. FEZ1 was involved in ovarian carcinogenesis, and its reduction or loss could be an aid to the clinical management of patients affected by ovarian carcinoma [22]. It is also a known tumor suppressor gene in breast cancer and gastric cancer [23, 63].
Other ovarian cancer-related genes were NMI [27, 28] and FANCI [44]; breast cancer-related genes were COX11 [42], MELK [33], and FANCI [44] among the selected genes. MELK was known to be associated with shorter survival in glioblastoma [34].
TTK was associated with progression and metastasis of advanced cervical cancers after radiotherapy [29, 30]. It might also be a relevant candidate as a new target in cancer therapy, since it plays relevant roles in mitotic progression and the spindle checkpoint [31, 32]. Aurora kinase A (AURKA) was associated with skin tumors [36] and colorectal cancer [37, 38].
In previous studies, OSCC-related genes among the selected genes were STAT1 [14], SKP2 [7, 25], IFI16 [8], RHEB [9], IFI44 [64], SOD2 [10-12], and GREM1 [11]. The gene set, which has not been proven as OSCC-related genes until now, could be expected to be possibly proven as OSCC-related genes by biological evaluation.
In this study, we identified significant genes related with OSCC from two microarray datasets in a public database. For this, we transformed microarray datasets using ranks of gene expressions with different expression scales, even though they were constructed under the same experimental conditions. This method could be useful when using multiple datasets that are created for the same research purpose, By combining these accumulated datasets, we can detect more reliable information due to the increased sample size. It saves time and money and avoids repeating experiments.