Methods
The goal of this study was to provide researchers with a tool for classifying their candidate genes from HGT-based cancer genome studies into previously reported or novel categories of cancer genes, while providing insight into underlying carcinogenic mechanisms through a pathway analysis. To implement the cancer gene annotation function of CaGe, we constructed reported cancer gene and cancer annotation databases from public cancer genomic databases and cancer pathway-gene databases by pathway analysis with reported cancer gene sets and canonical pathways. We also constructed a gene ID database to allow various input formats for the input of gene lists and a gene functional annotation database to provide users with functional clues for the annotated candidate genes. Then, we developed a core retrieval program and web interfaces for the main functions, which include cancer gene annotation, cancer pathway annotation, cancer gene browsing, and cancer pathway browsing. The workflow for the database construction and data processing in CaGe is summarized in Fig. 1, and the cancer gene annotation page of the CaGe web interface is shown in Fig. 2.

Cancer gene and annotation data
To construct the reported cancer gene database, we used gene sets from the CGC database (released on Dec, 2010) and CGI database (downloaded on Feb, 2011). The cancer pathways for the cancer pathway gene database construction were assigned based on statistical significance from one-tailed Fisher's exact test for overlapping genes between reported cancer gene sets and canonical pathways from public pathway databases, including KEGG (Release 57.0), BioCarta, and Reactome (downloaded on Feb, 2011).
We also created a gene ID database to convert various input identifiers into standard gene symbols with HUGO Gene Nomenclature Committee (HGNC) (downloaded on Feb, 2011) data for the standard gene symbols and with Entrez Gene (downloaded on Feb, 2011) and UniProt (Release 2011_03) data for the gene IDs, protein IDs, and functional annotations.

Input types
Available input types include a list of gene symbols, Entrez gene IDs, or UniProt accessions and output files from prediction tools, such as Sorting Intolerant From Tolerant (SIFT) [10] and PolyPhen [11], which evaluate functional effects of mutations on proteins. CaGe parses the output of those tools to extract the list of genes that have somatic mutations evaluated as functionally damaging, classify them into previously reported and novel candidate cancer genes, and conduct pathway analyses to give insights into underlying carcinogenic mechanisms. Thus, CaGe offers straightforward data processing from HGT-based data without additional data conversion.

Cancer gene and pathway annotation workflow
After acquiring user input, CaGe converts various input gene IDs into standard gene symbols, finds known cancer genes and pathways, links various cancer-related annotations to matched genes, and outputs them in the form of tables or text files through the web interface (Fig. 3). Another function of CaGe is to identify over-represented cancer-related or other biological pathways from the input gene list by performing one-tailed Fisher's exact test.
When processing more than one annotation job, users can manage their annotation tasks through the CaGe interface, and access to their results is maintained by the internal job database of CaGe until the jobs are deleted by the user or by a scheduled cleaning process. The IP addresses of client computers are used for secured job management without a logon process. Completed annotation jobs are listed on the job table, and the annotated results can be shown selectively by the user on the annotation result page or can be downloaded as tab-delimited text files for further analyses. The annotation on the results page has many useful links to gene and cancer-related information in the CaGe database or external public databases. In addition to the cancer gene and pathway annotation function, CaGe provides a function to browse cancer genes and pathways so that users can search cancer-related annotations without an input list. The search flows of cancer genes and pathways are connected to each other by crosslinks in the cancer gene information page or pathway information page.