PMC:15028 / 2003-4979 JSONTXT

Annnotations TAB JSON ListView MergeView

    2_test

    {"project":"2_test","denotations":[{"id":"11178259-10089201-44563383","span":{"begin":2523,"end":2524},"obj":"10089201"},{"id":"11178259-9789099-44563384","span":{"begin":2525,"end":2526},"obj":"9789099"},{"id":"11178259-10869029-44563385","span":{"begin":2530,"end":2532},"obj":"10869029"},{"id":"11178259-9847154-44563386","span":{"begin":2536,"end":2538},"obj":"9847154"}],"text":"Background\nThe EMBL (European Molecular Biology Laboratory) Nucleotide Sequence Database (often referred to as the EMBL database) [1] is hosted at the European Bioinformatics Institute (EBI). It is a comprehensive database of DNA and RNA sequences that are directly submitted from researchers and genome sequencing groups, and collected from the scientific literature and patent applications. It is produced in an international collaboration with GenBank (NCBI, Bethesda, USA) and DDBJ (the DNA Data Bank of Japan, CIB, Mishima, Japan). Each of the three collaborating groups collects a portion of the total sequence data reported worldwide, and all new and updated database entries are exchanged daily. The amount of sequence data is growing exponentially.\nAs our scientific understanding deepens, the complexity of the related information increases as well. As a result, the structure of the data also keeps changing. The EMBL database is managed and maintained using the relational database management system (DBMS) Oracle. It contains over 130 tables and 140 relationships, having around 80 Gigabytes (Gb) of data comprising nearly 10 million objects of primary data and millions of sub-objects called 'features'. Traditionally, the sequences and related information, which have been collected over a long period of time, are made available in flat-file format via ftp, CD-ROM, www tools, and so on. The queries through tools such as SRS (Sequence Retrieval System, a network browser for databanks in molecular biology) [2] also return data in flat-file format. However, flat files have a number of shortcomings: the format may not be described formally; it is difficult to represent complex data and relationships, the meaningful units of information ('objects') are not represented or handled well; it is hard to retrieve objects separately; assembly of objects into bigger aggregates is difficult; elaborate parsing is often required; and so on. In general, the current availability of the resources is not matched by a flexible environment to meet individual researchers' needs.\nAn industry standard, the Object Management Group's (OMG) common object request broker architecture (CORBA), provides platform-independent programming interfaces and models for portable distributed object-oriented computing applications [3,4,5,6]. Its independence from programming languages, computing platforms and network protocols provides a solution for developing new applications for querying and distributing biological data [7,8,9,10,11,12,13], which can also be integrated into existing systems. Here we present a CORBA infrastructure developed at EMBL-EBI and show that the CORBA interfaces to the EMBL database address some of the limitations of the flat-file format and provide an efficient means for accessing and distributing EMBL data. CORBA also provides a flexible environment for users to develop application programs (for example, for sequence analysis or data mining)."}

    Colil

    {"project":"Colil","denotations":[{"id":"T1","span":{"begin":2523,"end":2524},"obj":"10089201"},{"id":"T2","span":{"begin":2525,"end":2526},"obj":"9789099"},{"id":"T3","span":{"begin":2530,"end":2532},"obj":"10869029"},{"id":"T4","span":{"begin":2536,"end":2538},"obj":"9847154"}],"namespaces":[{"prefix":"_base","uri":"http://pubannotation.org/docs/sourcedb/PubMed/sourceid/"}],"text":"Background\nThe EMBL (European Molecular Biology Laboratory) Nucleotide Sequence Database (often referred to as the EMBL database) [1] is hosted at the European Bioinformatics Institute (EBI). It is a comprehensive database of DNA and RNA sequences that are directly submitted from researchers and genome sequencing groups, and collected from the scientific literature and patent applications. It is produced in an international collaboration with GenBank (NCBI, Bethesda, USA) and DDBJ (the DNA Data Bank of Japan, CIB, Mishima, Japan). Each of the three collaborating groups collects a portion of the total sequence data reported worldwide, and all new and updated database entries are exchanged daily. The amount of sequence data is growing exponentially.\nAs our scientific understanding deepens, the complexity of the related information increases as well. As a result, the structure of the data also keeps changing. The EMBL database is managed and maintained using the relational database management system (DBMS) Oracle. It contains over 130 tables and 140 relationships, having around 80 Gigabytes (Gb) of data comprising nearly 10 million objects of primary data and millions of sub-objects called 'features'. Traditionally, the sequences and related information, which have been collected over a long period of time, are made available in flat-file format via ftp, CD-ROM, www tools, and so on. The queries through tools such as SRS (Sequence Retrieval System, a network browser for databanks in molecular biology) [2] also return data in flat-file format. However, flat files have a number of shortcomings: the format may not be described formally; it is difficult to represent complex data and relationships, the meaningful units of information ('objects') are not represented or handled well; it is hard to retrieve objects separately; assembly of objects into bigger aggregates is difficult; elaborate parsing is often required; and so on. In general, the current availability of the resources is not matched by a flexible environment to meet individual researchers' needs.\nAn industry standard, the Object Management Group's (OMG) common object request broker architecture (CORBA), provides platform-independent programming interfaces and models for portable distributed object-oriented computing applications [3,4,5,6]. Its independence from programming languages, computing platforms and network protocols provides a solution for developing new applications for querying and distributing biological data [7,8,9,10,11,12,13], which can also be integrated into existing systems. Here we present a CORBA infrastructure developed at EMBL-EBI and show that the CORBA interfaces to the EMBL database address some of the limitations of the flat-file format and provide an efficient means for accessing and distributing EMBL data. CORBA also provides a flexible environment for users to develop application programs (for example, for sequence analysis or data mining)."}