Slide 1
NCBI's Entrez System
Presented by Alex E Lash, MD
National Center for Biotechnology Information
National Library of Medicine
National Institutes of Health
Bethesda, Maryland
Slide 2
Paris, 1830; photographs of Georges Cuvier (1769-1832) and Etienne Geoffroy St. Hilaire (1772-1844)
Slide 3
1830: ''Form vs Function'' Debate
Cuvier
Geoffroy
Slide 4
Photograph of Charles Darwin
1859: Darwin on Geoffroy
''Geoffroy St. Hilaire has insisted strongly on the high importance of relative connexion in homologous organs; the parts may change to almost any extent in form and size, and yet they always remain connected together in the same order.''
Slide 5
''Pre-hypothesis'' Biological Information Collection
Collect Data leads to Characterize Data leads to Relate Data (where discovery takes place, because patterns are perceived and hypotheses form)
Cuvier & Geoffroy both got to Relating Data, through different reasoning
A modern example:
Sequencing a gene leads to annotation of that gene into coding and non-coding regions and cross-comparison in which the sequences are compared to every other sequence
Slide 6
Today vs. 1830
Biotechnological developments have increased size, scope and speed of ''pre-hypothesis'' biological information collection.
Collection: overwhelming amount and variety of records
Slide 7
National Center for Biotechnology Information
Slide 8
NCBI Web Hits Per Day: a graph
Hits haveave risen from 2 million to 25 million from January 1998 to January 2002
Slide 9
Entrez Hits Per Day: a graph
Hits are cyclical during the week and steady between 5 and 6 million over the course of 6 months during 2001.
Slide 10
What is Entrez?
Entrez is a scalable and flexible database and interface system constructed and maintained at NCBI.
Each Entrez database contains records with pre-specified fields, contains indices on each field, and comes with an interface allowing field-specific, boolean queries.
PubMed is an Entrez database. OMIM is an Entrez database. GenBank nucleotide sequence records are contained in Entrez Nucleotide.
Links can be specified between records within the same Entrez database (intra-database links), or between records in different Entrez databases (inter-database links).
Links can be obvious (eg, identifier matching) or non-obvious (eg, sequence similarity). Non-obvious links generally require examination of the full record and some computation.
Slide 11
Architecture
Queries to the Query Processor-Display System which puts out a display
The Query Processor-Display System consults Index Terms, with a search field name or term or UID, Records with a UID, display field name or content, and Links with a database name, UID, etc.
Slide 12
Entrez database statistics
15 Entrez databases
>38 million records
>140 million indexed terms
>6.7 billion intra- and inter-database links
Slide 13 through 19
A series of searches of Entrez are shown
Slide 20
New Entrez Databases
6 new databases in the last year
5 new databases on the way
Slide 21
Gensat shows slides of tissue pathology
Slide 22
Current Query Scheme
Database selection is made; a query is placed and records are found, with links
Slide 23
Global Query Scheme
A query is placed, a summary is made across databases, then a database is selected and its records and links displayed
Slide 24
Entrez Global Query
Slide 24
NCBI Web Site
http://www.ncbi.nlm.nih.gov
New Entrez Databases
Entrez Gensat
Entrez Global Query
U.S. Department of Health and Human Services | National Institutes of Health | National Cancer Institute | USA.gov
NIH…Turning Discovery Into Health®