National Cancer Institute
Cancer Imaging Program

Image Archive Resources

This page contains links to the various components and interests of the National NCI Cancer Imaging Program Image Archive Committee, and its related websites. There are special sections on Image Archive formats and standardization:

National Cancer Imaging Archive 1

General References on Biomedical Image Archives 2

Image Archive Technology 3

Image Archive Standards 4

Image Archive Applications - General 5

Image Archive Applications - Clinical Trials 6

Imaging in Clinical Trials 7

Clinical Trials Image Archive Technology 8

NIH Information Standards 9

Information Standards from Other Federal Agencies 10

XML and DICOM 11

Biological Databases 12

Implementation of Biological Databases 13

Cancer Image Archives 14

General References on Biomedical Image Archives

NCI Image Archive Management Workshop Report - August 2000 15
Summary of the National Cancer Institute (NCI) workshop 16 entitled "Image Archive Management," presented August 28-29, 2000, at the Natcher Conference Center on the National Institutes of Health (NIH) campus. The purpose of this workshop was to solicit expert input for the planned development of an archival system to make imaging databases readily accessible by the broad scientific community. This PDF file is the workshop report published in Academic Radiology.

BIRN - Biomedical Informatics Research Network 17
The BIRN is a National Center for Research Resources (NCRR 18) initiative aimed at creating a test bed to address neuroscience researchers' need to access and analyze data at a variety of levels of aggregation located at diverse sites throughout the country. The BIRN test bed will bring together hardware and develop software necessary for a scalable network of databases and computational resources. Issues of user authentication, data integrity, security, and data ownership will also be addressed. BIRN initiative has created consortiums of biomedical technology and clinical research centers that are working together to address two fundamental biomedical research issues: 1) integrating data across modalities and scales; and 2) merging difficult to acquire data with heterogeneous collection attributes from multiple research sites. These initiatives are only first steps in creating the infrastructure to support the synergistic approaches needed to solve challenging biomedical problems.

Bristol Biomedical Image Archive 19
The Bristol Biomedical Image Archive is an online collection of about 8500 medical, dental, and veterinary images for use in teaching and learning. All the images have been donated by academics working in the biomedical fields in different countries. Hosted at the Institute for Learning and Research Technology, University of Bristol, UK.

BioImage Database 20
A European initiative for a database of multidimensional biological images. The BioImage Database project, funded by the European Union, is a collaboration between eight European groups. Its aim is to provide the general scientific community with a flexible and searchable database of multi-dimensional biological images.

Open Archives Initiative 21
The Open Archives Initiative develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content. The Open Archives Initiative has its roots in an effort to enhance access to e-print Archives as a means of increasing the availability of scholarly communication. The fundamental technological framework and standards that are developing to support this work are, however, independent of the both the type of content offered and the economic mechanisms surrounding that content, and promise to have much broader relevance in opening up access to a range of digital materials. As a result, the Open Archives Initiative is currently an organization and an effort explicitly in transition, and is committed to exploring and enabling this new and broader range of applications. As we gain greater knowledge of the scope of applicability of the underlying technology and standards being developed, and begin to understand the structure and culture of the various adopter communities, we expect that we will have to make continued evolutionary changes to both the mission and organization of the Open Archives Initiative.

Image Archive Technology

David Clunie's Medical Image Format Site 22
This Web site includes FAQs on medical image data, applications, and formats. It was assembled and is sponsored by a key expert and advocate from the DICOM community.

Very Large Data Base Endowment Inc. 23
The Very Large Data Base Endowment Inc. (VLDB Endowment) is non-profit organization incorporated in the United States for the sole purpose of promoting and exchanging scholarly work in databases and related fields throughout the world. The contents of the VLDB journal are available, an annual VLDB Conference schedule, and many PDF files with extended abstracts are available.

Microsoft Research - Scalable Servers 24
Microsoft is exploring techniques to build large servers as arrays of commodity processors, disks, and interconnects - Scalable Networks and Platforms (SNAP). The resulting computer cluster should be as easy to program, manage, and use as a single system. In addition, by using spare modules and redundant storage, the cluster should mask component failures and so provide highly-available services. This work combines the expertise of the NTclusters group to help define the requirements for clusters and the SQLserver database team to add fault-tolerance, scalability, and parallelism to SQLserver.

Terra-server™ 25
A one-node terabyte geo-spatial database server (the Terra-Sever™ ), and a 45-node cluster doing a billion transactions per day. There were also SAP + SQL + NT-Cluster failover demos, a 50 GB mail store, a 50k user POP3 mail server, a 100 million-hits-per-day web server, and 64-bit addressing SQL Server were also shown. The TerraServer started as a joint research project between Aerial Images, Inc., Microsoft, the USGS, and Compaq. The TerraServer concept grew out of the convergence of two needs. Aerial Images, Inc. wanted to sell imagery online and Microsoft Research needed a large database to demonstrate the capabilities of its new database software.

Teradata Corp. 26
Teradata, a division of NCR Corporation, offers powerful analytical solutions that help businesses drive growth. Teradata solutions include the Teradata warehouse, along with analytical applications for customer relationship management, operations/financial management, business performance management and e-business.

Archive Builders Corp. 27
Archive Builders assists organizations with their plans for document management, document imaging systems and digital libraries. One valuable service has proven to be advice and discussion of document management plans drawn up by organizations considering a system installation. They offer onsite systems analysis, requirements planning and assistance in writing system specifications. Whitepapers and presentation materials used in the document management class taught by Archive Builders are available free for download.

My SQL - Open Source Relational Database 28
MySQL is the world's most popular Open Source Database, designed for speed, power and precision in mission critical, heavy load use. MySQL AB is the company that develops, supports, and markets the MySQL database server globally. Their mission is to make superior data management available and affordable for all, and to contribute to building the mission-critical high-volume systems and products of tomorrow. The product is available at zero cost under the GNU General Public License (GPL), and it is sold under a commercial license to those who do not wish to be bound by the terms of the GPL. The unique separation of the core server from the table handler makes it possible to run MySQL under strict transaction control or with ultrafast, transactionless disk access, whichever is most appropriate for the situation.

Image Archive Standards

DICOM - Digital Imaging and Communications in Medicine 29
DICOM Standards Committee exists to create and maintain international standards for communication of biomedical diagnostic and therapeutic information in disciplines that use digital images and associated data. The goals of DICOM are to achieve compatibility and to improve workflow efficiency between imaging systems and other information systems in healthcare environments worldwide. DICOM is a cooperative standard. Therefore, connectivity works because vendors cooperate in testing via scheduled public demonstration, over the Internet, and during private test sessions. Every major diagnostic medical imaging vendor in the world has incorporated the standard into their product design and most are actively participating in the enhancement of the standard. Most of the professional societies throughout the world have supported and are participating in the enhancement of the standard as well. DICOM is used or will soon be used by virtually every medical profession that utilizes images within the healthcare industry. These include cardiology, dentistry, endoscopy, mammography, ophthalmology, orthopedics, pathology, pediatrics, radiation therapy, radiology, surgery, etc. DICOM is even used in veterinary medical imaging applications.

Health Level Screen (HL7) 30
Founded in 1987, Health Level Seven, Inc. is a not-for-profit, ANSI-accredited, standards developing organization that provides standards for the exchange, management, and integration of data that supports clinical patient care and the management, delivery, and evaluation of healthcare services. Its 2,200 members represent over 500 corporations, including 90 percent of the largest information systems vendors serving healthcare. HL7's endeavors are sponsored, in part, by the support of its benefactors: CAP Gemini Ernst & Young U.S. LLC, Eclipsys Corporation, Eli Lilly & Company, IDX Systems Corporation, Johnson & Johnson, McKesson Information Solutions, Microsoft Corporation, Philips Medical Systems, Quest Diagnostics Inc., Siemens Medical Solutions Health Services, Sun Microsystems and the U.S. Department of Veterans Affairs.

NCI External Standards Review - Report 31
NCI External Standards Review - Appendices 32
The National Cancer Institute (NCI) is supporting a broad initiative to develop standard tools and practices that include controlled vocabularies, common data elements (CDEs), and logical models of entities within and across life science domains. In the health information arena, a number of standards have been developed to define the way organizations record disease types, identify health care provider information, and specify patient information. An NCI External Data Standards Review has been completed that outlines current standards developed by organizations outside of the NCI. This review is intended to be a starting place for consideration of external data standards to be adopted by the NCI for use in the collection, storage, and reporting of information.

NCI Center for Bioinformatics (NCICB) 33
The NCI Center for Bioinformatics (NCICB) provides biomedical informatics support and integration capabilities to the cancer research community. We work with both intramural and extramural groups to develop Initiative-Specific Modules. These modules are connected through intelligent interfaces, coordinated through an NCI Core Module and deployed through open source tools and systems. The NCICB also serves as a focal point for cancer research informatics planning worldwide. We work with research organizations, biomedical informatics groups and standards bodies to facilitate the identification and adoption of information exchange standards, thus connecting research information sources wherever they may reside.

Cancer Bioinformatics Infrastructure Objects (caBIO) 34
The caBIO modeling effort is an on-going effort to model the domains of cancer research. The caBIO domain objects simulate the behavior of actual components in biomedicine such as genes, chromosomes, sequences, agents, trials, ontologies, etc. They provide access to a variety of data sources including GenBank, Unigene, LocusLink, Ensemble, GoldenPath (through DAS), and NCICB's CGAP (Cancer Genome Anatomy Project) data repositories. The current object model was designed via the interaction of domain experts and IT professionals. The object model is designed using an iterative software development approach to accommodate new requirements for modeling genomic information. Details of each object were identified during domain analysis and include information provided by domain experts as well as industry standards. caBIO is an "open source" software project.

CODATA - Committee on Data for Science and Technology 35
CODATA, the Committee on Data for Science and Technology, is an interdisciplinary Scientific Committee of the International Council for Science (ICSU). CODATA was established over 30 years ago and its secretariat is located in Paris, France. CODATA seeks: 1) improvement of the quality and accessibility of data, as well as the methods by which data are acquired, managed, analyzed and evaluated, with a particular emphasis on developing countries; 2) facilitation of international cooperation among those collecting, organizing and using data; 3) promotion of an increased awareness in the scientific and technical community of the importance of these activities; and 4) consideration of data access and intellectual property issues.

SANE - Scanner Access Now Easy Image Data Format 36
SANE stands for "Scanner Access Now Easy" and is an application programming interface (API) that provides standardized access to any raster image scanner hardware (flatbed scanner, hand-held scanner, video- and still-cameras, frame-grabbers, etc.). The SANE API is public domain and its discussion and development is open to everybody. SANE is a universal scanner interface. The value of such a universal interface is that it allows writing just one driver per image acquisition device rather than one driver for each device and application. So, if you have three applications and four devices, traditionally you'd have had to write 12 different programs. With SANE, this number is reduced to seven: the three applications plus the four drivers. Of course, the savings get even bigger as more and more drivers and/or applications are added.

Cancer Informatics: Essential Technologies for Clinical Trials 37
"Cancer Informatics: Essential Technologies for Clinical Trials" is a book published in January 2002 that describes the National Cancer Institute's vision of a Cancer Informatics Infrastructure (CII). By exploring the best that the Internet and information technology have to offer, the CII will facilitate clinical trials, for all who are involved, including the patient along with the myriad of health professionals involved in cancer trials.

NIH Data Sharing Policy 38
NIH expects and supports the timely release and sharing of final research data from NIH-supported studies for use by other researchers. Investigators submitting an NIH application will be required to include a plan for data sharing or to state why data sharing is not possible. This is an extension of NIH policy on sharing research resources.

Open Source Health Care Resources 39
Open source refers to software that comes with the source code in a form that customers can modify for their own needs and resell or give away to others under the same terms.

CTSim 40
CTSim is an open source computed tomography simulator. It simulates the process of transmitting X-rays through phantom objects. These X-ray data are called projections. CTSim reconstructs the original phantom image from the projections using a variety of algorithms. Additionally, CTSim has a wide array of image analysis and image processing functions.

Image Archive Applications - General

USF Digital Mammography Database 41
The Digital Database for Screening Mammography (DDSM) is a resource for use by the mammographic image analysis research community. The primary purpose of the database is to facilitate sound research in the development of computer algorithms to aid in screening. Secondary purposes of the database may include the development of algorithms to aid in the diagnosis and the development of teaching or training aids. The database contains approximately 2,500 studies. Each study includes two images of each breast, along with some associated patient information (age at time of study, ACR breast density rating, subtlety rating for abnormalities, ACR keyword description of abnormalities) and image information (scanner, spatial resolution). Images containing suspicious areas have associated pixel-level "ground truth" information about the locations and types of suspicious regions. Also provided is software both for accessing the mammogram and truth images and for calculating performance figures for automated image analysis algorithms.

Mouse Brain Library 42
The MBL consists of high-resolution images and databases of brains from many genetically-characterized strains of mice. There are numerous uses of the MBL, but the developers' mission is to systematically map and characterize genes that modulate architecture of the mammalian CNS (for a complete description of projects refer to the P20 Human Brain Project Award: Informatics Center for Mouse Neurogenetics 43). MBL databases also include detailed information on genomes of many strains of mice. The collection now consists of images from approximately 800 brains and numerical data from just over 8000 mice. MBL can be searched for cases by strain, age, sex, or body or brain weight. Images of the slide collection are available at a series of resolutions. The base resolution is 24.5 +/- 0.5 micrometer per pixel in the XY plane with a 150 micrometer interval between sections (300 micrometer on each slide, 2 slides per case). Significantly higher resolution images of single sections (4.5 micrometer/pixel) have been acquired for over a hundred cases marked with a blue "hi-res" button. They are now collecting 1 micrometer/pixel images for specific parts of the brain - at present, the neocortex, hippocampus, and the dorsal lateral geniculate nucleus. Very high resolution images (<0.2 micrometer/pixel) are available for C57BL/6J using the iScope, a web-controlled microscope equipped with DIC optics.

RadiologyInfo 44
RadiologyInfo is designed to answer patient questions related to many radiologic procedures and therapies. It includes images from diagnostic radiology, interventional radiology and radiation therapy, has an alphabetical procedures list and galleries of images. There is access to descriptive material for radiologists to use in their waiting rooms.

CMU Computer Vision Test Images 45
The Computer Vision Homepage was established at Carnegie Mellon University in 1994 to provide a central location for World Wide Web links relating to computer vision research. The emphasis of the Computer Vision Homepage is on computer vision research rather than on commercial products. A comprehensive set of links to publicly accessible Web sites with computer vision test images is offered.

ECVNet Image Data Bases List 46
This page contains pointers to sites offering public access to image collections via the Internet. There you can find color and grayscale still images, medical images, textures, sequences, stereo pairs, range images, etc.

MedPix™ Medical Image Database 47
MedPix™ is a fully web-enabled and cross-platform database, integrating images and textual information. The primary "target audience" includes resident and practicing physicians, medical students, graduate nursing students and other post-graduate trainees. The material is organized by disease category, disease location (organ system), and by patient profiles. The database can be searched through multiple internal text search engines. In addition, search formulations can be sent directly to PubMed, or to other outside search engines. Registered users may browse the image database through a "slide sorter" module. Contributed content may be copyrighted by the original author/contributor, and is used with permission.

Gastrolab Endoscopy Pictures Archive 48
This Web site is an image library that will eventually contain pictures of every disease that make visible changes in the digestive system. Most of the endoscopic pictures are taken with Olympus videoendoscopes. The picture quality in this library is not as good as in the original pictures - the original quality would have made transmission times too long. In this image library typical x-ray-findings in gastroenterologic diseases are illustrated. This website is provided as a free service by The Wasa Workgroup on Intestinal Disorders, GASTROLAB, Vasa, Finland.

The Stanford Visible Female 49
The Stanford Visible Female is an Academic Project sponsored by the Division of Anatomy and SUMMIT. Central to the project is a series of 95 photographed cryosections of a reproductive-age female cadaveric pelvis acquired in 1993. From these cross-sectional data, several research projects have arisen. These range from 2D imaging correlations with independent MR data to 3D models developed for anatomically accurate surgical simulation.

Visible Human Project 50
Visible Human Project is creating complete, anatomically detailed, three-dimensional representations of the male and female human body. The current phase of the project is collecting transverse CT, MRI and cryosection images of representative male and female cadavers at one millimeter intervals. Includes an extensive collection of links to projects based upon the Visible Human data.

NIDCR - Craniofacial and Skeletal Diseases Branch 51
The NIDCR imaging Web page will allow the NIH research and clinical community to collaborate on imaging studies through the Internet. All authorized users on the NIH campus and abroad will be able to display and review the studies posted on the imaging Web page with the NIH-developed imaging tool Medical Image Processing, Analysis and Visualization (MIPAV) 52.

Image Archive Applications - Clinical Trials

CDISC - Clinical Data Interchange Standards Consortium 53
CDISC is an open, multidisciplinary, non-profit organization committed to the development of industry standards to support the electronic acquisition, exchange, submission and archiving of clinical trials data and metadata for medical and biopharmaceutical product development. The mission of CDISC is to lead the development of global, vendor-neutral, platform independent standards to improve data quality and accelerate product development.

FDA Electronic Submissions and Review 54
The Food and Drug Administration (FDA) 55 regulates drugs 56, biologics 57 and medical devices 58. The FDA Center for Drug Evaluation and Research's Electronic Regulatory Submissions and Review (ERSR) Web page provides information about the electronic submission of regulatory information to the Center and the review of it by CDER staff.

QARC - Quality Assurance Review Center 59
The Quality Assurance Review Center (QARC) is a Global Data and Review Center, providing Radiotherapy Quality Assurance and Diagnostic Imaging data management programs for several NCI-supported Cooperative Groups and international pharmaceutical companies. QARC is a research program within the University of Massachusetts Medical School. It is an established research resource for clinical investigators around the world.

RCET - Resource Center for Emerging Technologies 60
The Resource Center for Emerging Technologies (RCET) at the University of Florida (UF) provides advanced technical resources necessary to support radiotherapy. The use of medical informatics is expected to facilitate education, collaboration, and peer review, as well as provide an environment in which clinical investigators can receive, share, and analyze voluminous multimodality clinical data.

Image Guided Therapy Center at Washington University 61
The Image-Guided Therapy Center (ITC) (formerly known as the 3DQA Center) WWW server at Washington University School of Medicine in St. Louis, Missouri supports image-based 3D conformal radiotherapy (CRT) multi-institutional trials.

ACRIN - American College of Radiology Imaging Network 62
The American College of Radiology Imaging Network (ACRIN) is a National Cancer Institute-funded cooperative group. ACRIN's overarching goal is - through clinical trials of diagnostic imaging and image-guided therapeutic technologies - to generate information that will lengthen and improve the quality of the lives of cancer patients.

Medical Image Repository 63
At the NIH Center for Information Technology (CIT) 64, in collaboration with NINDS 65, the High Performance Computing and Informatics Office (HPCIO) 66 has developed a Web-based medical image archive system for the archive of imaging and clinical data from the Suburban Hospital study. This archive system provides secure Web interfaces for clinical data entry, data upload, database query, and data download. HPCIO is currently developing a separate archive system for the GAIN study.

Imaging in Clinical Trials

Advanced Technology Consortium(ATC) 67
The Advanced Technology Consortium (ATC) was created to facilitate the conduct of National Cancer Institute sponsored advanced technology radiation therapy clinical trials that require digital data submissions while maintaining patient confidentiality. This effort includes radiation therapy quality assurance, image and radiation therapy digital data management, and clinical research and developmental efforts.

Clinical Data Interchange Standards Consortium (CDISC) 68
CDISC is an open, multidisciplinary, non-profit organization committed to the development of industry standards to support the electronic acquisition, exchange, submission and archiving of clinical trials data and metadata for medical and biopharmaceutical product development. The mission of CDISC is to lead the development of global, vendor-neutral, platform independent standards to improve data quality and accelerate product development in our industry.

Multidisciplinary Approach to Data Standards for Clinical Development 69
This article originally appeared in APPLIED CLINICAL TRIALS, Volume 11, Number 4, pages 35-44, April 2002, by Rebecca Kush, Ph.D. who is president of CDISC. A common interchange standard for clinical data is described that will save time, effort, and money for everyone involved - and CDISC continues to develop new functional models to prove it.

Clinical Trials Image Archive Technology

CardioNow, Inc. 70
CardioNow's service, which is specifically designed to handle the large file sizes (greater than 200 megabytes) associated with DICOM cardiology images, enables study investigators to send complete trial images from their cath lab to the angiographic core lab in near real-time. All cases sent via the CardioNow network are transmitted and archived in native DICOM so the original image quality is preserved. Furthermore, cases associated with clinical trials are coded and anonymized to protect patient confidentiality. By facilitating the secure, electronic transmission of cases, CardioNow eliminates the unnecessary delays and expenses associated with copying, shipping and storing CDs and cine films.

Advanced Technology Consortium (ATC) 67
To facilitate the conduct of National Cancer Institute sponsored advanced technology radiation therapy clinical trials that require digital data submissions while maintaining patient confidentiality. This effort includes radiation therapy quality assurance, image and radiation therapy digital data management, and clinical research and developmental efforts. We strongly believe that advanced medical informatics can facilitate education, collaboration, and peer review, as well as provide an environment in which clinical investigators can receive, share, and analyze volumetric multimodality treatment planning and verification (TPV) digital data. Our ultimate goal is to improve the standards of care in the management of cancer by improving the quality of clinical trials medicine.

Image-guided Therapy QA Center (ITC) 71
The Image-guided Therapy QA Center (ITC) WWW server at Washington University School of Medicine in St. Louis, Missouri, a member of the NCI Advanced-Technology QA Consortium (ATC) 67, provides software tools and data management services to support quality assurance for advanced-technology Radiation Oncology clinical trials.

NIH Information Standards

Secure One HHS 72
Secure One HHS develops policies, procedures, and guidance to serve as a foundation for HHS' information security program. These documents implement relevant Federal laws, regulations, standards, and guidelines that provide a basis for the information security program at the Department. As Secure One HHS evolves, these documents will be subject to review and revision. Reviews and updates will take place at least annually, or when changes occur that identify the need to revise and improve Secure One HHS.

cancer Biomedical Informatics Grid (caBIG) 73
To expedite the cancer research communities' access to key bioinformatics platforms the NCI plans to deploy an integrating biomedical informatics infrastructure: the cancer biomedical informatics grid (caBIG). In partnership with the cancer research community, the NCI is creating a common, extensible informatics platform that integrates diverse data types and supports interoperable analytic tools. This platform will allow research groups to tap into the rich collection of emerging cancer research data while supporting their individual investigations.

Cancer Data Standards Repository (caDSR) 74
One of the problems confronting the biomedical data management community is the panoply of ways that similar or identical concepts are described. Such inconsistency in data descriptors (metadata) makes it nearly impossible to aggregate and manage even modest-sized data sets in order to be able to ask basic questions. The NCI, together with partners in the research community, develops common data elements (CDEs) that are used as metadata descriptors for NCI-sponsored research and for the caCORE applications. The caCORE objects are represented by UML Models. The UML Model is used to facilitate a semi-automated load from caCORE UML into ISO/IEC 11179 Administered Components. This is discussed in more detail in the Application Developers 75 section. The caDSR is a database and tool set that the NCI and its partners use to create, edit and deploy the CDEs.

ISO/IEC 11179, Information Technology - Metadata Registries (MDR) 76
ISO 11179 is a standard for describing data elements used in databases and documents that specifies basic aspects of data element composition, including metadata. The standard applies to the formulation of data element representations and meaning as shared among people and machines; it does not apply to the physical representation of data as bits and bytes at the machine level. This standard is used as the basis for the NCI Common Data Elements.

Dublin Core Metadata Initiative 77
The Dublin Core Metadata Initiative is an open forum engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models. DCMI's activities include consensus-driven working groups, global workshops, conferences, standards liaison, and educational efforts to promote widespread acceptance of metadata standards and practices.

NLM Communications Engineering Branch 78
Projects in the Communications Engineering Branch focus on R&D in image engineering: the capture, storage, processing, online retrieval, transmission and display of both biomedical documents (mainly journals) and medical imagery. The data repositories available from the NLM Communications Engineering Branch have been collected from a variety of sources. This collection contains digitized versions of radiographs and rare manuscripts. Data Repositories include the National Health and Nutrition Examination Surveys (NHANES) 79 with collateral data and x-ray images.

Information Standards from Other Federal Agencies

DARPA Information Processing Technology Office 80
DARPA IPTO will create Information Processing Technology for new generation intelligent systems, transforming our national infrastructure to enhance global stability. The IPTO has a 4-part mission: 1) Create transformational information technologies to anticipate and meet National Security imperatives; 2) Validate technologies with prototypes of real National Security solutions; 3) Lead, stimulate, and complement commercial technology; and 4) Transition technologies to National Security users, via partnerships with other DARPA offices, industry, armed services, and government agencies.

CDC Health Information and Surveillance Systems Board 81
The Centers for Disease Prevention and Control (CDC) Health Information and Surveillance Systems Board (HISSB) Web site lists organizations and resources related to development of health information standards. These include Coordinators/Promoters of Standards Development, Standards Development Organizations, and Classification/Nomenclature Systems.

Public Health Data Standards Consortium 82
In November 1998, the National Center for Health Statistics (NCHS) of the Centers for Disease Control and Prevention (CDC), in conjunction with the Agency for Healthcare Research and Quality (AHRQ) and the National Committee on Vital and Health Statistics (NCVHS), convened a workshop to examine the implications of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) for the practice of public health and health services research. The workshop, "The Implications of HIPAA's Administrative Simplification Provisions for Public Health and Health Services Research," brought together 85 leaders in health statistics, research, and informatics to examine the challenges and opportunities presented by HIPAA. This resulted in creation of a new consortium, officially established in January 1999 as the Public Health Data Standards Consortium that serves as a mechanism for ongoing representation of public health and health services research interests in HIPAA implementation and other data standards-setting processes.

NASA ESAD Scientific Data Purchase Program 83
The Scientific Data Purchase (SDP) is a demonstration program developed in response to the President's Space Policy, directing NASA to purchase remote sensing data from the private sector. Initiated in fiscal year 1997, the SDP was funded under the Earth Science Enterprise (ESE) Program 84 to provide scientific data to the ESE science community. The $50 million program is an opportunity to advance global-systems research, to strengthen the U.S. economy through development of remote sensing technologies, and to test a new way of doing business. The NASA Earth Science Applications Directorate (ESAD) at the John C. Stennis Space Center in Mississippi manages the SDP.

NASA EOSDIS Core System Information for Scientists (ECS Info) 85
The Earth Observing System Data and Information System (EOSDIS) 86 is designed to archive unprecedented amounts of Earth observing data from a wide range of instruments collecting information over decades. Its diverse user community can search, retrieve, and analyze any of these observations, also over a period of decades. EOS data products need descriptive information, or metadata, to enable users and data providers to locate and use the information. Over several years, numerous teams of scientists, computer scientists, and information engineers have collaborated to develop the data model, with its metadata attributes and how they are organized, to meet these needs. A catalog of EOSDIS related information 87 has been prepared.

Astronomy Digital Image Library (ADIL) 88
ADIL collects astronomical, research-quality images and makes them available to the astronomical community and the general public. Patrons access the Library through the World Wide Web to search for and browse images. Once images are located in the Library, users may download them to their local machines in FITS format for further analysis. The Library provides a number of benefits not only to those looking for images, but also to those who add images to the Library's growing collection.

XML and DICOM

What is XML? 89
A markup language is a mechanism to identify structures in a document. The XML specification defines a standard way to add markup to documents. In order to appreciate XML, it is important to understand why it was created. XML was created so that richly structured documents could be used over the web. The only viable alternatives, HTML and SGML, are not practical for this purpose. This is the 1st part of a technical introduction to XML 90.

Transcoding DICOM to XML
Supplement 23 to DICOM (Digital Imaging and Communications for Medicine), Structured Reporting, is a specification that supports a semantically rich representation of image and waveform content, enabling experts to share image and related patient information. DICOM SR supports the representation of textual and coded data linked to images and waveforms. Nevertheless, the medical information technology community needs models that work as bridges between the DICOM relational model and open object-oriented technologies. The authors assert that representations of the DICOM Structured Reporting standard, using object-oriented modeling languages such as the Unified Modeling Language, can provide a high-level reference view of the semantically rich framework of DICOM and its complex structures. They have produced an object-oriented model to represent the DICOM SR standard and have derived XML-exchangeable representations of this model using World Wide Web Consortium specifications. They expect the model to benefit developers and system architects who are interested in developing applications that are compliant with the DICOM SR specification. [from Abstract] A distributed database course project on the exchange of DICOM-compatible medical images using XML was done at the University of Waterloo.

XML for Molecular Biology 91
A list of XML resources compiled by Paul Gordon that may be of use to the bioinformatician.

Object Management Group (OMG) 92
The OMG was formed to create a component-based software marketplace by hastening the introduction of standardized object software. The OMG's charter includes the establishment of industry guidelines and detailed object management specifications to provide a common framework for application development. Conformance to these specifications will make it possible to develop a heterogeneous computing environment across all major hardware platforms and operating systems. The nearly 800 member companies of the Object Management Group produce and maintain a suite of specifications that support distributed, heterogeneous software development projects from analysis and design through coding, deployment, runtime, and maintenance.

XML - CORBA - DICOM
What Digital Imaging and Communication in Medicine (DICOM) could look like in common object request broker (CORBA) and extensible markup language (XML).

CDISC Operational Data Model (ODM) 93
The final version 1.1 Specification for the Operational Data Model (ODM) was released by CDISC on May 9, 2002. The XML-based Operational Data Model "provides a format for representing the study metadata, study data, and administrative data associated with a clinical trial. It represents only the data that would be transferred among different software systems during a trial, or archived after a trial. It need not represent any information internal to a single system, for example, information about how the data would be stored in a particular database." The version 1.1 release includes the text of the specification, with XML DTDs and supporting documentation. ODM v1.1 Final "represents the culmination of more than three years of effort by a multi-disciplinary team of pharmaceutical and biotechnology sponsors and technology vendors; the development team believes the CDISC 1.1 DTD is now ready for widespread adoption among sponsors, vendors and CROs to facilitate the interchange of clinical trial data.

DARPA Agent Markup Language (DAML) 94
The World Wide Web (WWW) contains a large amount of information and is expanding at a rapid rate. Most of that information is currently being represented using the Hypertext Markup Language (HTML), which is designed to allow web developers to display information in a way that is accessible to humans for viewing via web browsers. While HTML allows us to visualize the information on the web, it doesn't provide much capability to describe the information in ways that facilitate the use of software programs to find or interpret it. The World Wide Web Consortium (W3C) 95 has developed the Extensible Markup Language (XML) 96 which allows information to be more accurately described using tags. As an example, the word Algol on a web site might represent a computer language, a star or an oceanographic research ship. The use of XML to provide metadata markup, such as Algol, makes the meaning of the work unambiguous. However, XML has a limited capability to describe the relationships (schemas or ontologies) with respect to objects. The use of ontologies provides a very powerful way to describe objects and their relationships to other objects. The DAML language is being developed as an extension to XML and the Resource Description Framework (RDF) 97. The latest release of the language (DAML+OIL 98) provides a rich set of constructs with which to create ontologies and to markup information so that it is machine readable and understandable.

XML Multimedia Radiology Report
The clinical display of radiologic information as an interactive multimedia report is accomplished using a multimedia report model based on Extensible Markup Language (XML), rather than a traditional workstation model. XML does not replace existing standards (i.e., Digital Imaging and Communications in Medicine [DICOM], Transmission Control Protocol/Internet Protocol [TCP/IP]). Instead, it provides a powerful framework that is used in combination with existing standards to allow system designers to modify display characteristics based on user need. The application of XML to the clinical display of radiologic information is described. [from Abstract]

Review/Tutorial on Standards for Radiology Networks
Medical communication standards, i.e., HL 7, DICOM, and in the near future the migration towards XML, support the interoperability between the IT subsystems and pave the way to patient information systems with access to unified and complete electronic medical records (EMR). Furthermore, with standardized communication techniques, such as CORBAmed 99 [PDF File], an object-oriented design of Healthcare applications will be possible in the near future. [from Abstract]

MIMOS: A framework for exchanging medical image processing results
DICOM presently supports structured reporting of image studies, but does not accommodate semantics in the image handling domain. This can impede the exchange and the interpretation of processing results. To overcome this limitation, a framework based on a formal grammar was developed, with documents encoded using XML. [from Abstract]

Biological Databases

Molecular Biology Database Collection 100
The Molecular Biology Database Collection is an online resource listing key databases of value to the biological community. This Collection is intended to bring fellow scientists' attention to high-quality databases that are available throughout the world, rather than just be a lengthy listing of all available databases. As such, this up-to-date listing is intended to serve as the initial point from which to find specialized databases that may be of use in biological research. The databases included in this Collection provide new value to the underlying data by virtue of curation, new data connections or other innovative approaches.

Nucleic Acids Research - Special Database Issue (2002) 101
The 2002 Database Issue of Nucleic Acids Research is the ninth in a series dedicated to factual biological databases. These databases have become an essential resource for working biologists and the aim of this compilation is to provide descriptions of the most important of these databases and especially to introduce newly compiled databases that provide specialist information in the biological area. In the issue (Jan 2002), there are descriptions of 2112 databases.

Implementation of Biological Databases

SIDB - Scientific Image Data Base (SIDB) 102
A web-driven open source database for 2-D and 3-D images specifically designed for (confocal) microscopy units, but applicable wherever groups of users collaborate with images.

OpenHealth™ -- Open source software in health care 103
Electronic medical records and networks are the solutions to the technical issues around coordinating the work of diverse health care professionals caring for a single person across multiple sites. Open source software has potential to overcome some of the obstacles now being encountered in this transition: 1) Open source reference implementations of medical record standards could speed their adoption and increase interoperability in practice. The differences in adoption between TCP/IP and ISO network protocols illustrate the importance of reference implementations. 2) Open source software could reduce the issue of "Who pays?" in community health networks by eliminating per user and per site license costs and unbundling implementation and support charges.

ASN.1 - Abstract Syntax Notation One 104
ASN.1, or Abstract Syntax Notation One, is an International Standards Organization (ISO) data representation format used to achieve interoperability between platforms. The National Center for Biotechnology Information (NCBI) 105 uses ASN.1 for the storage and retrieval of data such as nucleotide and protein sequences, structures, genomes, and MEDLINE records. It permits computers and software systems of all types to reliably exchange both the data structure and content. The NCBI Software Development ToolKit (known as the 'NCBI Toolbox 106') is a set of software and data exchange specifications used by NCBI to produce portable, modular software for molecular biology. The software in the Toolbox is primarily designed to read ASN.1 format records. It is freely available to the public, and can be used in its own right or as a foundation for building tools with similar properties.

VISIM: Information Retrieval and Exploration in Large Medical Image Collections 107
Visual information systems in medicine (VISIM) are emerging capable of retrieving items from large collections of images and exploring connections between them to discover new insights, confirm hypotheses, or search for similar findings. The advance of these systems is at the crossroads of computer vision, man-machine interaction and image database technology, invoking many novel issues that need to be addressed. This one day workshop was held in Utrecht, NL on October 18, 2001.

Digital Library Technologies (DLT) 108
The Digital Library Technologies group at the National Center for Supercomputer Applications (NCSA) 109 is a continuing effort to develop components of a new infrastructure for building large-scale digital libraries of distributed, heterogeneous digital information objects. Components enable digital information to be used across and within communities by providing user-configurable tools for formatting, translating, publishing, indexing, and searching data and metadata. Tools are designed to interoperate using standard protocols such as the Open Archives Initiative 21 Protocol for Metadata Harvesting and the ISO-standard Z39.50 110 protocol for information retrieval. The DLT group is also working with the challenges of community development of data standards and data use practices.

PEIPA - the Pilot European Image Processing Archive 111
PEIPA is an archive of material relating to the processing of images, with an emphasis on image analysis and computer vision. The archive is supported from the British Machine Vision Association 112, the University of Essex 113, and the EU-funded project Performance Characterization in Computer Vision.

Cancer Image Archives

Virtual Colonoscopy Image Database 1
The NCI, in collaboration with Walter Reed Army Medical Center Virtual Colonoscopy Center and National Library of Medicine, offers a Virtual Colonoscopy image database complete with associated findings as a network downloadable resource to benefit computer-aided diagnosis CAD) researchers and developers. This database provides 52 complete cases (26 with polyps) consisting of DICOM-compliant 3D CT data, several 2D images, the pathology reports, the virtual colonoscopy reports, and the optical colonoscopy reports along with the optical colonoscopy video. The database is available on the NCIA web site.

National Biomedical Imaging Archive 1
The NCI, in collaboration with medical researchers in a number of imaging fields, is offering the National Biomedical Imaging Archive (NBIA) as an image repository to foster rapid dissemination of information to the scientific community and the public. You may browse, download, and use the data for non-commercial, scientific and educational purposes. However, you may encounter documents or portions of documents contributed by private institutions or organizations. Other parties may retain all rights to publish or produce these documents. Commercial use of the documents on this site may be protected under United States and foreign copyright laws. In addition, some of the data may be the subject of patent applications or issued patents, and you may need to seek a license for its commercial use. NCI does not warrant or assume any legal liability or responsibility for the accuracy, completeness or usefulness of any information in this archive.

National Digital Medical Archive (NDMA) 114
i3 Archive is the commercial continuation of an effort started as the National Digital Mammography Archive (NDMA) which was a collaborative effort between the University of Pennsylvania Medical Center (including the National Scalable Cluster Project - NSCP, the University of Chicago Department of Radiology, the University of North Carolina - Chapel Hill School of Medicine, the Department of Radiology - Breast Imaging - Sunnybrook and Women's College Health Sciences Centre of the University of Toronto, and Advanced Computing Technologies Division of BWXT Y-12 L.L.C. in Oak Ridge Tennessee. This was a Next Generation Internet (NGI) Initiative project sponsored by the National Library of Medicine. NDMA developed a test bed to demonstrate the feasibility of a national breast imaging archive and network infrastructure [PDF File] to support digital mammography using Next Generation Internet (NGI) technologies.

Virtual Cancer Image Data Warehouse
At the National Cancer Center (Tokyo, Japan), more than 100 virtual cancer images from CT or MR data of individual patients with cancer (Cancer Edutainment Virtual Reality Theater: CEVRT). These images can be used to help explain procedures, findings, etc. to the patient, to obtain informed consent, to simulate surgery, and to estimate cancer invasion to surrounding organs. A web-based object-oriented database was created to access these cancer images and to register medical images at international research sites via the Internet. [from Abstract]

Mammographic Image Analysis Society - Mammographic Database 115
The original MIAS Database (digitized at 50 micron pixel edge) has been reduced to 200 micron pixel edge and clipped/padded so that every image is 1024 pixels x 1024 pixels. There are 322 cases. Reference: J Suckling et al (1994) "The Mammographic Image Analysis Society Digital Mammogram Database" Exerpta Medica. International Congress Series 1069, pp 375-378.

USF Digital Database for Screening Mammography (DDSM) 41
The Digital Database for Screening Mammography (DDSM) is a resource for use by the mammographic image analysis research community. The primary purpose of the database is to facilitate sound research in the development of computer algorithms to aid in screening. Secondary purposes of the database may include the development of algorithms to aid in the diagnosis and the development of teaching or training aids. The database contains approximately 2,500 studies. Each study includes two images of each breast, along with some associated patient information (age at time of study, ACR breast density rating, subtlety rating for abnormalities, ACR keyword description of abnormalities) and image information (scanner, spatial resolution...). Images containing suspicious areas have associated pixel-level "ground truth" information about the locations and types of suspicious regions. Also provided is software both for accessing the mammogram and truth images and for calculating performance figures for automated image analysis algorithms.

UCSF Neuroimaging Data Warehouse
An image data warehouse infrastructure containing a broad array of biomedical imaging and clinical data is built on top of a picture archiving and communication system (PACS) environment and applies an iterative object-oriented analysis and design (OOAD) approach and recognized data interface and design standards. The implementation is based on a Java CORBA (Common Object Request Broker Architecture) and Web-based architecture that separates the graphical user interface presentation, data warehouse business services, data staging area, and backend source systems into distinct software layers. [from Abstract]



Table of Links

1http://ncia.nci.nih.gov
2http://dev1.cancer.gov/programsandresources/InformationSystems/ImageArchiveReso
urces/page2
3http://dev1.cancer.gov/programsandresources/InformationSystems/ImageArchiveReso
urces/page3
4http://dev1.cancer.gov/programsandresources/InformationSystems/ImageArchiveReso
urces/page4
5http://dev1.cancer.gov/programsandresources/InformationSystems/ImageArchiveReso
urces/page5
6http://dev1.cancer.gov/programsandresources/InformationSystems/ImageArchiveReso
urces/page6
7http://dev1.cancer.gov/programsandresources/InformationSystems/ImageArchiveReso
urces/page7
8http://dev1.cancer.gov/programsandresources/InformationSystems/ImageArchiveReso
urces/page8
9http://dev1.cancer.gov/programsandresources/InformationSystems/ImageArchiveReso
urces/page9
10http://dev1.cancer.gov/programsandresources/InformationSystems/ImageArchiveReso
urces/page10
11http://dev1.cancer.gov/programsandresources/InformationSystems/ImageArchiveReso
urces/page11
12http://dev1.cancer.gov/programsandresources/InformationSystems/ImageArchiveReso
urces/page12
13http://dev1.cancer.gov/programsandresources/InformationSystems/ImageArchiveReso
urces/page13
14http://dev1.cancer.gov/programsandresources/InformationSystems/ImageArchiveReso
urces/page14
15http://download.journals.elsevierhealth.com/pdfs/journals/1076-6332/PIIS1076633
203806992.pdf
16http://dev1.cancer.gov/reportsandpublications/reportsandpresentations/ImageArch
iveManagementWorkshop
17http://www.nbirn.net
18http://www.ncrr.nih.gov
19http://www.brisbio.ac.uk
20http://www.bioimage.org
21http://www.openarchives.org
22http://dclunie.com
23http://www.vldb.org
24http://research.microsoft.com/barc/Scaleable
25http://www.terraserver.com
26http://www.teradata.com/t/go.aspx/index.html?id=22723
27http://www.archivebuilders.com
28http://www.mysql.com
29http://medical.nema.org
30http://www.hl7.org
31ftp://ftp1.nci.nih.gov/pub/cacore/ExternalStds/NCI_ExtStdsReview.pdf
32ftp://ftp1.nci.nih.gov/pub/cacore/ExternalStds/NCI_ExtStdsReviewAppend.pdf
33http://ncicb.nci.nih.gov/NCICB
34http://ncicb.nci.nih.gov/NCICB/infrastructure/cacore_overview/caBIO
35http://www.codata.org
36http://www.sane-project.org
37http://www.springer.com/west/home?SGWID=4-102-22-2278824-0&changeHeader=true
38http://grants1.nih.gov/grants/policy/data_sharing
39http://www.minoru-development.com/en/healthlinks.html
40http://www.ctsim.org
41http://marathon.csee.usf.edu/Mammography/Database.html
42http://www.mbl.org
43http://www.nervenet.org/iscope/info.html
44http://www.radiologyinfo.org/index.cfm?bhcp=1
45http://www-2.cs.cmu.edu/~cil/vision.html
46http://zeus.ics.forth.gr/forth/ics/cvrl/proj/ecvnet/imagedb
47http://rad.usuhs.mil/synapse/cow.html
48http://www.gastrolab.net/pawelcom.htm
49http://summit.stanford.edu/ourwork/PROJECTS/LUCY/lucywebsite/home.html
50http://www.nlm.nih.gov/research/visible/visible_human.html
51http://csdb.nidr.nih.gov/csdb/frame_images.htm
52http://csdb.nidr.nih.gov/csdb/mipav.htm
53http://www.cdisc.org
54http://www.fda.gov/cder/regulatory/ersr/default.htm
55http://www.fda.gov
56http://www.fda.gov/cder
57http://www.fda.gov/cber
58http://www.fda.gov/cdrh
59http://www.qarc.org
60http://rcetsystem.org
61http://rtog3dqa.wustl.edu
62http://www.acrin.org
63http://hpcio.cit.nih.gov/Repository.htm
64http://cit.nih.gov/home.asp
65http://www.ninds.nih.gov
66http://hpcio.cit.nih.gov
67http://atc.wustl.edu
68http://www.cdisc.org/index.html
69http://www.cdisc.org/pdf/ACT5856e.pdf
70http://www.cardionow.com
71http://itc.wustl.edu
72http://intranet.hhs.gov/infosec/policies_guides.html
73http://cabig.nci.nih.gov
74http://ncicb.nci.nih.gov/NCICB/infrastructure/cacore_overview/cadsr
75http://ncicb.nci.nih.gov/core/caDSR/#SOFTWAREDEVELOPERS
76http://metadata-stds.org/11179
77http://dublincore.org
78http://archive.nlm.nih.gov
79http://archive.nlm.nih.gov/proj/dxpnet/nhanes/nhanes.php
80http://www.darpa.mil/ipto
81http://www.cdc.gov/od/hissb/lnk_sdo.htm
82http://phdatastandards.info
83http://www.esad.ssc.nasa.gov/datapurchase
84http://www.earth.nasa.gov
85http://ecsinfo.gsfc.nasa.gov
86http://edhs1.gsfc.nasa.gov
87http://www.cs.utexas.edu/users/vbb/eosdis.html
88http://imagelib.ncsa.uiuc.edu
89http://www.xml.com/pub/a/98/10/guide1.html#AEN58
90http://www.xml.com/pub/a/98/10/guide0.html
91http://www.visualgenomics.ca/gordonp/xml
92http://www.omg.org/technology/xml/index.htm
93http://xml.coverpages.org/ni2002-05-09-a.html
94http://www.daml.org
95http://www.w3c.org
96http://www.w3.org/XML
97http://www.w3.org/RDF
98http://www.daml.org/2000/12/daml+oil-index
99http://www.omg.org/attachments/pdf/CORBAmed_bro.pdf
100http://nar.oupjournals.org/cgi/content/abstract/30/1/1
101http://nar.oupjournals.org/content/vol30/issue1
102http://sourceforge.net/projects/sidb
103http://www.minoru-development.com/en/healthcare.html
104http://www.ncbi.nlm.nih.gov/Sitemap/Summary/asn1.html
105http://www.ncbi.nlm.nih.gov
106http://www.ncbi.nlm.nih.gov/IEB/ToolBox
107http://www.science.uva.nl/research/isis/VISIM
108http://dlt.ncsa.uiuc.edu
109http://ncsa.uiuc.edu
110http://www.loc.gov/z3950/agency
111http://peipa.essex.ac.uk
112http://www.bmva.ac.uk
113http://www.essex.ac.uk
114http://www.i3archive.com
115http://peipa.essex.ac.uk/info/mias.html