Sequence acquisition and CST identification A list of human genes involved in either the pathogenesis of monogenic human disorders or in the predisposition to multifactorial diseases was obtained by screening the Genecards (15) and the On-Line Mendelian Inheritance in Man (OMIM) (16) databases. We then searched the human Ensembl database (assembly release NCBI34) to retrieve the human genomic sequences spanning the selected transcripts as well as 250 additional kilobases of flanking sequence on both sides. The extent of the flanking sequence was reduced when known genes were annotated in proximity of the disease gene, but a minimum of 20 kb was taken in all cases. The Ensembl database was also used as the source of the corresponding murine sequences. Orthologous gene annotation was used, when available, to find the mouse counterparts; when more than one orthologous gene was found, sequences were manually selected, on the basis of overall sequence conservation and relationships with other neighboring sequences. Mouse sequence size was defined according to the length of the human sequence. A total set of 1088 human genomic sequences was compared to the corresponding murine orthologous genomic sequences (the full list is available online). Overall, 193 million bp of human genomic sequences were analyzed, corresponding to 7% of the human genome. Human and mouse genomic sequences, prefiltered to mask all known repeated sequences, were compared using the BLASTZ program (17). Sequences showing at least 70% identity, over a region of at least 100 bp, were selected and further analyzed to eliminate redundancies, leading to the identification of 66 495 repeat-free, non-overlapping, human and mouse CST pairs. The CSTs were found to correspond or to overlap to known human exon sequences in about 32% of cases (n = 21 139) while they were located either in intronic or in intergenic region in the remaining 68% of cases (n = 45 356) (Table 1).