TOOLS

Generic Genome Browser (GBrowse)
GBrowse (19) is used to visualize genetic and genomic data (Figure 3). Genomic data are extracted from the Ensembl and UCSC genome databases. The Ensembl database is downloaded after each Ensembl release, and the Ensembl API is used to extract the genome features of interest. These are converted into genome feature format (GFF) and loaded into the GBrowse database. From UCSC, certain data types, notably the UCSC mRNA and EST homologies are downloaded, converted into GFF and loaded into the GBrowse database. Currently 32 data tracks are available. Efforts are underway to integrate statistical tools such as selection of tag SNPs (20) and display of D′/R2 plots for an interval of interest.
An alternative approach to integrating the Ensembl and UCSC data would be to use distributed annotation server (DAS) (21). However the current specification of DAS only allows a limited glyph set, and does not, for instance, allow graphs to be represented.
We make extensive use of the plugin capability provided by GBrowse. A plugin is used to visualize the UCSC dataset of regulatory potential scores (22). This is a very large dataset, which we prefer not to store in our main GBrowse database. Instead, it is imported into a separate database and uses a plugin to connect GBrowse to the data. Similar plugins are used to visualize Fugu net scores and repeat density plots. We expect to add more plugins as we integrate additional data tracks that do not fit the built-in GBrowse model.
Another plugin facilitates genome annotation. The plugin uses BLAT (23) to align an mRNA sequence to the genome and convert the result into a GFF file. The user can then upload the file and view the annotation in GBrowse. To add the annotation to the permanent database, the user can email the GFF file to T1DBase, the file is then manually verified and loaded into the database.
We also use plugins to allow users to export selected data tracks to a file.
The T1DBase GBrowse provides the T1D research community with a rich genomic data environment by integrating the UCSC and Ensembl genomes and user contributed data.

Search
T1DBase offers a site-wide search capability that works across the multiple datasets present on the site. A technical subtlety is that different kinds of data require different search strategies which the software carries out behind the scenes. Genes are an important special case: the software can search for genes based on a variety of identifiers, including gene names, symbols, LocusLink IDs and UniGene IDs.
The search system is built on the open source Plucene package, a Perl port of the widely used Lucene package (24) (http://www.onjava.com).

Connect-the-Dots
Connect-the-Dots connects identifiers for genes and other entities based on information extracted from multiple data sources. It provides methods for parsing data sources to extract identifiers and connections among identifiers, and loading this information into an internal database. Users can query the database to connect identifiers from any number of sources by following paths composed of the parsed connections. For example, to find literature citations about genes of interest on an Affymetrix chip, a query can connect Affymetrix probeset identifiers to LocusLink identifiers using information from Affymetrix's annotation files and connect the LocusLink identifiers to PubMed identifiers using information in NCBI's LocusLink files. Longer and more complex paths are also possible. Queries are expressed in a special-purpose query language and are translated into SQL by the software.
The system can be used interactively over the Web, or as a batch resource to create specialized translation tables for specific purposes. Many of the translation tables used internally by T1DBase are constructed in this manner.
The current Connect-the-Dots database has information from LocusLink, UniGene (human, mouse and rat), OMIM, IPI, UniProt, HomoloGene, DoTS, several Affymetrix chips, and human and mouse PancChips (pancreas/islet-specific microarrays). The database contains 20 million unique identifiers and 42 million connections extracted from 2 million data source entries.

Cytoscape
Cytoscape (25) is a tool for visualizing and analyzing biological networks, defined broadly to include any collection of interacting bio-molecules. A common use of the software is to display networks of protein–protein and protein–DNA interactions, but it can also be used to display gene networks. A key feature is that Cytoscape can analyze networks in combination with gene expression data, e.g. to discover sub-networks with correlated expression, and annotation data such as Gene Ontology, e.g. to associate sub-networks with biological functions.
Cytoscape can be launched directly from T1DBase, although at present this only works on two demonstration networks. Work is underway to connect Cytoscape to human protein interaction data from HPRD (26), microarray gene expression data from the Beta Cell Gene Expression Bank and other sources and annotations suggesting association with T1D susceptibility.

GESTALT
GESTALT (27) is a workbench for genome annotation that combines automated and manual analysis with an emphasis on rich graphical display of the analysis results. GESTALT can execute a variety of external analysis programs (e.g. for gene recognition) as well as internal analyses (e.g. for compositional complexity analysis). The results are stored in an internal database and can later be retrieved and displayed.
GESTALT analyses have been carried out on most T1D human candidate regions, and the results can be inspected on T1DBase. Several new genes were found through this analysis. For operational reasons, users are not allowed to run their own GESTALT analyses on our website, but can do so on the public GESTALT server at http://db.systemsbiology.net/gestalt/.