Methods Dataset Two microarray datasets were used for this study. We acquired these datasets from a public database (Gene Expression Omnibus, GEO). One was the expression dataset of 16 tumors and 4 normal tissues from 16 patients, using Affymetrix U133A gene chips (Affymetrix, Santa Clara, CA, USA). The other microarray dataset consisted of expression profiles of 22 tumors and 5 normal tissues. These two datasets were experimented on under the same platform, Affymetrix U133A. The datasets are summarized in Table 1. Process for combining datasets For combining datasets, gene expression ratios are rearranged in order of expression ratios by each gene in each dataset, and the ranks are matched with the corresponding experimental group. If the experimental groups are homogenous, the ranks within the same experimental group would be neighboring. The process of discretization of gene expressions is summarized in the following steps [3]: Rank the gene expression ratios within a gene for each dataset.List in order of the ranks, and assign the order of gene expressions to the corresponding experimental groups.Summarize the result of (2) in the form of a contingency table for each gene.Combine the contingency tables that have been summarized for each dataset. When there are three datasets to be combined, the datasets can be added as a single entry, as shown in Table 2, after the transformation of each dataset by rank. Identification of significant genes from a combined dataset After the summarization of gene expression ratios in the form of a contingency table for each gene, as shown in Table 3, a nonparametric statistical method was applied to the datasets for independence testing between gene expression patterns and experimental groups. The test statistics are calculated as follows for each gene: When the sample size is small - generally Ê(nij) less than 5 - Fisher's exact test is recommended rather than chi-square test. The significant genes can be selected by an independence test between the phenotypes and gene expressions using this type of summarized dataset. ci and ri represent the marginal sums of the ith column and row, respectively. nij is the number of experiments belonging to Ei and Pj, and n represents the total number of experiments.