> top > docs > PMC:540000 > spans > 8564-8594

PMC:540000 / 8564-8594 JSONTXT

FlyBase: genes and gene models Abstract FlyBase (http://flybase.org) is the primary repository of genetic and molecular data of the insect family Drosophilidae. For the most extensively studied species, Drosophila melanogaster, a wide range of data are presented in integrated formats. Data types include mutant phenotypes, molecular characterization of mutant alleles and aberrations, cytological maps, wild-type expression patterns, anatomical images, transgenic constructs and insertions, sequence-level gene models and molecular classification of gene product functions. There is a growing body of data for other Drosophila species; this is expected to increase dramatically over the next year, with the completion of draft-quality genomic sequences of an additional 11 Drosphila species. SCOPE OF FLYBASE FlyBase includes information about the structure and function of genes and gene products of the Drosophila genome (1). Although the primary species represented is that workhorse of classic genetics, Drosophila melanogaster, the database currently includes records for genes of more than 400 other Drosophila species, and will house genomic information for the 11 additional species included in the Drosophila comparative genomics sequencing effort. Phenotypic and genetic interaction information about mutants, and wild-type gene and enhancer-trap expression patterns are linked to strains in the Drosophila Stock Centers, from which extensive collections of mutant and wild-type strains are available. Mutant phenotypes (2) and gene expression patterns are described using controlled vocabularies, including anatomical terms linked to illustrations in the Anatomy section of FlyBase. Data concerning chromosome aberrations, natural transposons, genetically engineered constructs and transgene insertions are presented with hyperlinks to affected genes and resulting mutant alleles. An overview of the classes of data found in FlyBase may be seen on the homepage (http://flybase.org; for further description see Supplementary Figure 1). Features recently added to FlyBase include an External Database Links section in Gene reports, expanded Batch query options and an extensive Drosophila Resources compilation (http://flybase.bio.indiana.edu/allied-data/resources.html), which provides a comprehensive list of links to both network resources (e.g. sequence analysis tools) and material resources (e.g. clone and microarray suppliers) external to the FlyBase project. Data are compiled by curators and annotators from sources including the scientific literature, large-scale genome sequencing projects and online resources such as the GenBank (NCBI)/EMBL/DDBJ nucleotide sequence databases and the UniProt (3) protein database. FlyBase curators work with curators of other databases, such as the Gene Ontology (GO) consortium (4) to ensure consistency of annotation across databases. The D.melanogaster genome annotation, Release 4.0 at the time of writing (5–7), has been enhanced by hand curation of all gene models (8,9), including integration of error reports submitted by the user community. Table 1 shows a snapshot of FlyBase content as of September 2004. The remainder of this paper will focus on genes and gene models in FlyBase. THE GENE REPORT FlyBase provides several formats of gene report which differ by degree of completeness of data reported within the initial web page, the default being the Synopsis format. The Synopsis report for the maleless (mle) gene is shown in Figure 1. The Synopsis report displays commonly accessed gene information fields, an Available reports side panel to allow easy access to other report formats, and a text Summary generated automatically from the underlying data. The Abridged report format displays a wider range of information in the initial display than the Synopsis format, but collapses many of the details, such as individual Allele reports, into links in tables. The Full report format is the most comprehensive initial display. FlyBase also offers Subsection reports selected by data type, for example, alleles of that gene, references that discuss the gene and sequences in the DNA and protein data banks that correspond to the gene. Links to these and other subreports are listed in the Subsections panel of the Synopsis report. Recent additions include the Gene Ontology subreport, the Genetic Interactions subreport and the Constructs & Insertions subreport. Gene reports now include an External Database Links section (http://flybase.bio.indiana.edu/allied-data/extdb/ExternalLinks.htm). This section houses links to databases external to FlyBase, to ease access to information about the gene that falls outside the scope of FlyBase data curation. The databases currently listed in this section include; the BDGP In Situ Gene Expression Database (10), Drosophila melanogaster Exon Database (http://proline.bic.nus.edu.sg/dedb), PANTHER Protein Classification (11,12), Fly GRID Interaction Data (13), Hybrigenics PIMRider interactions (14), Interactive Fly (15), Yale Developmental Gene Expression (16) and NCBI's Gene Expression Omnibus (17). Not all genes have an entry in all these databases. The number of external links in place via this facility exceeds 76 500. THE GENE ANNOTATION REPORT Detailed information about the annotated transcripts and other sequence-level data for a particular gene are to be found in the Annotation Report. This may be accessed from the Gene Report page from the link ‘Genome Annotation’ or by a direct query using the ‘Gene Annotations’ option in the homepage search box. The Annotation Query Form (http://flybase.bio.indiana.edu/annot/fbannquery.hform) allows queries based on location, gene class, peptide length, mapped expressed sequenced tags (ESTs) or cDNAs, GO terms, or terms within annotation comments. An example of an Annotation Report is shown in Figure 2. Notable features include a graphic representation of the transcript structures aligned with supporting evidence, information about each transcript and protein product, links to sequence data and information about other data mapped experimentally to the genomic sequence, such as point mutations, aberration breakpoints, rescue fragments and experimentally defined regulatory regions. Accompanying comments describe any unusual characteristics of the gene model, such as atypical splice donor or acceptor, non-AUG translation start, or dicistronic transcript. At the top of the report is a link to the peptide analysis that includes a graphic display of homologous proteins and known InterPro (3) protein motifs. GENE REGION MAPS: GBROWSE AND APOLLO A molecular map of the region surrounding a gene may be accessed through the Gene Region Map (GBrowse) link on either the Gene Report page or the Gene Annotation Report. GBrowse (18) is a configurable genome viewer that allows the presentation of both molecularly mapped and cytologically mapped data (http://www.gmod.org/ggb/gbrowse.shtml; see Supplementary Figure 2). Annotations or larger genomic regions may also be viewed using the interactive viewing and editing tool, Apollo (19). Apollo is available for Windows, MacOSX or Unix systems and may be downloaded from the Apollo site (http://www.fruitfly.org/annot/apollo). BULK DATA DOWNLOADS FlyBase offers a variety of routes for bulk data retrieval; a recent addition is the Batch Download Reports by ID facility shown in Figure 3. This tool allows the user to query the genes dataset for many records at once, by valid symbol or by FlyBase identification number. The users can select the output type they wish to retrieve (HTML/Text, Spreadsheet or Database format). For HTML/Text outputs, the user can choose Report Content (from Synopsis, Abridged, Full, Summary, Alleles, Sequences, Reviews, References). For HTML/Text or Spreadsheet outputs, it is possible to filter output by field, using the ‘Select fields’ function. A related tool, Batch Download Sequences by ID, allows querying for sequences for many genes simultaneously. Options for sequence retrieved are Gene Region, Transcript, Translation, 3′-untranslated region (3′-UTR) and 5′-UTR. Both Batch Download forms can be accessed from the Genes data directory or from the Genome Annotation and Sequences page. In addition to bulk queries performed over the web interface, FlyBase data files are available for download by ftp from several of our mirror sites, in a text, acode or XML format. Protocols are described in the FlyBase Reference Manual section D (http://flybase.org/docs/lk/refman/refman-D.html). D.MELANOGASTER GENOME RELEASES The genomic sequence of D.melanogaster continues to be refined and expanded (http://flybase.bio.indiana.edu/annot/release3.html); the Berkeley Drosophila Genome Project has made public Release 4.0 of the genome sequence (http://www.fruitfly.org/annot/release4.html), and is currently finishing Release 5.0. FlyBase makes regular corrections and additions to the gene model annotations based on new data submissions to the sequence databases, user error reports and literature curation. We anticipate that comparative genomic analyses will play an increasing role in annotation assessment and improvement. Annotation updates are indicated by decimal numbers appended to the release number: e.g. Release 4.0 and Release 4.1. The heterochromatic portion of the genome is being analyzed by members of the Drosophila Heterochromatin Genome Project (http://www.dhgp.org); the heterochromatin annotations are accessible through FlyBase. ADDITIONAL DROSOPHILA GENOMES The National Human Genome Research Institute (NHGRI) has recognized the importance of comparative genomic analysis for the annotation of D.melanogaster and for understanding how genomes evolved. Towards this end, the major NHGRI-funded sequencing centers are sequencing 11 additional species of Drosophila (pseudoobscura, yakuba, simulans, virilis, ananassae, erecta, willistoni, grimshawi, mojavensis, persimilis and sechellia; status of projects reported at http://genome.gov/page.cfm?pageID=10002154). The genome sequences, annotations, syntenic relationships and other data from these genome projects will be incorporated into FlyBase, consistent with FlyBase's long-term commitment to maintaining genomic and genetic data on the family Drosophilidae. THE CHADO DATABASE SCHEMA FlyBase has been operating since 1992 and is now in the process of developing and populating a new database structure, an integrated implementation of the chado generic genome database schema (http://www.gmod.org/schema/). The initial design of the chado schema was undertaken by FlyBase developers at Harvard and Berkeley to fully integrate the finished D.melanogaster genome sequence and annotation with the vast body of Drosophila genetic and phenotypic data produced over the last 100 years. The chado schema is an open software project and is being developed in cooperation with the GMOD initiative (http://www.gmod.org). REFERENCING FLYBASE We suggest FlyBase be referenced in publications by citing this publication and the FlyBase web address (http://flybase.org). SUPPLEMENTARY MATERIAL Supplementary Material is available at NAR Online. [Supplementary Material]

Document structure show

projects that have annotations to this span

There is no project