Skip to Content

Programs & Resources

NCBI and the Entrez System

Slide 1
NCBI's Entrez System
Presented by Alex E Lash, MD
National Center for Biotechnology Information
National Library of Medicine
National Institutes of Health
Bethesda, Maryland

Slide 2
Paris, 1830; photographs of Georges Cuvier (1769-1832) and Etienne Geoffroy St. Hilaire (1772-1844)

Slide 3
1830: ''Form vs Function'' Debate
Cuvier

  • ''form follows function''
  • anatomic similarities among vertebrates were due to similar function
  • ''If there are resemblances between the organs..., it is only insofar as there are resemblances between their functions.''

Geoffroy

  • ''function follows form''
  • vertebrates were modifications of a single archetype
  • ''There is, philosophically speaking, only a single animal.''

Slide 4
Photograph of Charles Darwin
1859: Darwin on Geoffroy
''Geoffroy St. Hilaire has insisted strongly on the high importance of relative connexion in homologous organs; the parts may change to almost any extent in form and size, and yet they always remain connected together in the same order.''

Slide 5
''Pre-hypothesis'' Biological Information Collection
Collect Data leads to Characterize Data leads to Relate Data (where discovery takes place, because patterns are perceived and hypotheses form)
Cuvier & Geoffroy both got to Relating Data, through different reasoning
A modern example:
Sequencing a gene leads to annotation of that gene into coding and non-coding regions and cross-comparison in which the sequences are compared to every other sequence

Slide 6
Today vs. 1830
Biotechnological developments have increased size, scope and speed of ''pre-hypothesis'' biological information collection.
Collection: overwhelming amount and variety of records

GenBank contains >19 million sequence records and >20 billion bases and doubled in size in the last 16 months
Characterization: increased scope and detail of fields in records
Relation: increased possibility of intra- and inter-database record to record links

Slide 7
National Center for Biotechnology Information

  • Created by Public Law 100-607 in 1988 as part of National Library of Medicine at NIH to: Create automated systems for knowledge about molecular biology, biochemistry, and genetics.
    Perform research into advanced methods of analyzing and interpreting molecular biology data.
    Enable biotechnology researchers and medical care personnel to use the systems and methods developed.
  • Builders and providers of GenBank, Entrez, Blast, PubMed. Online systems host more than 2 million users per month.
  • Center for basic research and training in computational biology.

Slide 8
NCBI Web Hits Per Day: a graph
Hits haveave risen from 2 million to 25 million from January 1998 to January 2002

Slide 9
Entrez Hits Per Day: a graph
Hits are cyclical during the week and steady between 5 and 6 million over the course of 6 months during 2001.

Slide 10
What is Entrez?
Entrez is a scalable and flexible database and interface system constructed and maintained at NCBI.
Each Entrez database contains records with pre-specified fields, contains indices on each field, and comes with an interface allowing field-specific, boolean queries.
PubMed is an Entrez database. OMIM is an Entrez database. GenBank nucleotide sequence records are contained in Entrez Nucleotide.
Links can be specified between records within the same Entrez database (intra-database links), or between records in different Entrez databases (inter-database links).
Links can be obvious (eg, identifier matching) or non-obvious (eg, sequence similarity). Non-obvious links generally require examination of the full record and some computation.

Slide 11
Architecture
Queries to the Query Processor-Display System which puts out a display
The Query Processor-Display System consults Index Terms, with a search field name or term or UID, Records with a UID, display field name or content, and Links with a database name, UID, etc.

Slide 12
Entrez database statistics
15 Entrez databases
>38 million records
>140 million indexed terms
>6.7 billion intra- and inter-database links

Slide 13 through 19
A series of searches of Entrez are shown

Slide 20
New Entrez Databases

6 new databases in the last year

  1. Books: online books
  2. GEO: high-throughput gene expression and microarray datasets
  3. 3D Domains: structural protein domains from Entrez Structure
  4. UniSTS: markers and mapping data
  5. CDD: conserved protein domains
  6. SNP: single nucleotide polymorphisms

5 new databases on the way

  1. UniGene: clusters of sequence similar transcripts
  2. Gene: a derivation of LocusLink and Genomes
  3. SKY/CGH: spectral karyotyping/comparative genomic hybridization
  4. Site Search: search the NCBI web and ftp sites
  5. Gensat: in situ gene expression in the nervous system of the mouse

Slide 21
Gensat shows slides of tissue pathology

Slide 22
Current Query Scheme
Database selection is made; a query is placed and records are found, with links

Slide 23
Global Query Scheme
A query is placed, a summary is made across databases, then a database is selected and its records and links displayed

Slide 24
Entrez Global Query

Slide 24
NCBI Web Site
http://www.ncbi.nlm.nih.gov

New Entrez Databases
Entrez Gensat

Entrez Global Query