Volume 1 Supplement 1
Modeling biomedical experimental processes with OBI
- Ryan R Brinkman1,
- Mélanie Courtot1,
- Dirk Derom2,
- Jennifer M Fostel3,
- Yongqun He4,
- Phillip Lord5,
- James Malone6,
- Helen Parkinson6,
- Bjoern Peters7,
- Philippe Rocca-Serra6,
- Alan Ruttenberg8,
- Susanna-Assunta Sansone6,
- Larisa N Soldatova9Email author,
- Christian J StoeckertJr.10,
- Jessica A Turner11,
- Jie Zheng10 and
- the OBI consortium
© Soldatova et al; licensee BioMed Central Ltd. 2010
Published: 22 June 2010
Experimental descriptions are typically stored as free text without using standardized terminology, creating challenges in comparison, reproduction and analysis. These difficulties impose limitations on data exchange and information retrieval.
The Ontology for Biomedical Investigations (OBI), developed as a global, cross-community effort, provides a resource that represents biomedical investigations in an explicit and integrative framework. Here we detail three real-world applications of OBI, provide detailed modeling information and explain how to use OBI.
We demonstrate how OBI can be applied to different biomedical investigations to both facilitate interpretation of the experimental process and increase the computational processing and integration within the Semantic Web. The logical definitions of the entities involved allow computers to unambiguously understand and integrate different biological experimental processes and their relevant components.
OBI is available at http://purl.obolibrary.org/obo/obi/2009-11-02/obi.owl
Biomedical investigations use empirical approaches to investigate causal relationships among a large range of variables. The wide range of possible investigations presents a number of challenges when building tools to describe experimental processes. There are varying levels of complexity and granularity and a wide range of material and equipment is used. Furthermore, the use of varying terminology by different communities makes data integration problematic when representing and integrating biomedical investigations across different fields of study. The use of ontologies has been successful in biological data integration and representation [1, 2] and there have been multiple efforts to develop ontologies aimed at providing clearer semantics for data (GO, FuGO, MGED, EXPO, LABORS, MSI ontology) [3–8]. Work in the transcriptomics, proteomics and metabolomics communities has proceeded in parallel, producing ontologies with overlapping scopes.
Though each focuses on particular types of experimental processes, many terms, such as investigation and assay, are common to all. Merging common aspects of these formalisms is useful as it provides a mechanism by which terms can be used and understood by all, reducing ambiguity and difficulties associated with post-hoc attempts to integrate data. The practice of consolidating representations is endorsed by organizations such as the OBO Foundry  which requires all member ontologies to define a term only once among them (orthogonality). OBO Foundry members use a common set of relations from the Relations Ontology , and the upper level Basic Formal Ontology (BFO)  in order to facilitate cross ontology consistency and to support automated reasoning . OBO ontologies adhere to common naming conventions  in order to make it easier to learn and understand them.
The Ontology for Biomedical Investigations (OBI) addresses the need for a cross-disciplinary, integrated ontology for the detailed description of biological and clinical investigations. OBI is collaboratively developed by representatives from 19 biomedical communities from around the globe and has been submitted as a candidate for the OBO Foundry . It uses other OBO ontologies wherever possible. OBI defines a set of broadly applicable terms that span biomedical and technological domains, for example, assay (the planned process of producing data about something) as well as domain-specific terms relevant to smaller areas of study, for example, T cell epitope recognition assay, used by the IEDB database to describe experimental data extracted from articles investigating immune epitopes .
OBI represents all phases of experimental processes, and the entities involved in preparing for, executing, and interpreting those processes e.g., study designs, protocols, instrumentation, biological material, collected data and analyses performed on that data. OBI also represents roles and functions used in biomedical investigations. OBI therefore supports consistent annotation of biomedical experimental processes regardless of the field of study. OBI is expressed in OWL, a W3C ontology language developed for the Semantic Web. The development of OBI is driven by specific use cases of experiments. In this paper, the OBI release of 2009-11-02 is applied to three exemplar use cases, originating from three communities: 1) neuroscience, 2) vaccine protection, and 3) functional genomics. The OBI release of 2009-11-02 is available at http://purl.obolibrary.org/obo/obi/2009-11-02/obi.owl.
In what follows, italics are used to refer to a specific term within OBI where appropriate. OBI defines an investigation as a process with several parts, including planning an overall study design, executing the designed study, and documenting the results. An investigation typically includes interpreting data to draw conclusions.
Biomedical experimental processes involve numerous sub-processes, involving experimental materials such as whole organisms, organ sections and cell cultures. These experimental materials are represented as subclasses of the BFO class material entity. OBI uses BFO’s material entity as the basis for defining physical things. Material entity is an independent continuant, a continuant that is a bearer of quality and realizable entity(s), in which other entities inhere and which itself cannot inhere in anything . Material entities are entities that are spatially extended, whose identity is independent of that of other entities, and which persist through time, for example organism, test tube, and centrifuge. Material entities can bear roles, typically socially defined, which are realized in the context of a process, e.g. study subject role, host role, specimen role, patient role; and functions, results of design or evolution that depend on their physical structure e.g. measure function, separation function and environment control function. The function is considered to inhere in the material entity and be realized by the role that material entity plays in a process.
To assess the completeness of the OBI release of 2009-11-02 and to demonstrate the use of OBI for annotation, we present three representative use cases. These demonstrate how to model entities and relations between entities involved in experimental processes using OBI. The first use case models a neuroscience experiment described in a journal article  and shows how logical definitions are constructed using parts of external ontologies imported into OBI. The second use case details how OBI is used to model vaccine studies; the third describes an investigation run by a Robot Scientist which fully automatically designs and executes functional genomics experiments.
Use case 1: neuroscience investigation
Stimulating monkey with a light source, which is an example of presentation of stimulus, having the participants Japanese macaque monkey as the subject and light source as the stimulus, during the process of a measuring neural activity in the caudate nucleus assay.
Measuring neural activity in the caudate nucleus: this process is a subclass of the process extracellular electrophysiology recording, which unfolds in the caudate nucleus that is part of the Macaca fuscata, of which the Japanese macaque monkey is an example. The anatomical term caudate nucleus is imported from the Neuroscience Information Framework standardized (NIFSTD) ontology  and used in the logical definition of the assay.
The light on the tangent screen here is a light source used to present the stimulus to the study subject. The function of the microelectrode, part of the single unit recorder (an example of processed material), is realized in the measuring neural activity in the caudate nucleus process. The process measuring neural activity in the caudate nucleus has the specified input a neuron and the specified output a neuronal spike train datum.
Use case 2: vaccine protection investigation
A vaccination is a kind of administering substance in vivo process that realizes some material to be added role, borne by a vaccine (e.g., VacX) as well as a target of material role borne by an organism that also bears a host role (e.g., mouse). The term vaccination is a term imported from the Vaccine Ontology (VO, http://www.violinet.org/vaccineontology). An injection function that inheres in a syringe (is a processed material) is realized by the vaccination process.
A pathogen challenge is also a kind of administering substance in vivo process. It realizes a number of roles - a pathogen role and material to be added role borne by the challenge organism (e.g., Influenza Virus), and a target of material role and host role borne by another organism (e.g., mouse). An injection function that inheres in a syringe is realized by the pathogen challenge process.
A survival assessment is an assay that measures the survival rate (occurrence of death events) in one or more organisms that are monitored over time. The survival assessment is a protection efficiency assay that has specified input a number of organisms (e.g., mouse) and has specified output a survival rate, in this case a measurement datum that records that 75% of mice survived the pathogen challenge.
Use case 3: an automated functional genomics investigation
Adam’s planning yields a plan specification that has an objective specification to test an inferred statements each of which are modeled as a hypothesis textual entity. Each statement is about whether a particular metabolite will affect yeast strain growth. Adam’s plan specifies an assay to test these hypotheses: to grow yeast with and without addition of the metabolite.
The planned process of the automated investigation of the enzyme EC.6.1.39 is an instance of the class hypothesis driven investigation with the objective to test the hypothesis specified in the planning process (we represent here only a single hypothesis textual entity which serves as a design pattern). The result is whether the metabolite affected the growth of the yeast strain (see Figure 3, optical density reading box). The upper growth curve (drawn in red) shows the growth rate with the addition of the metabolite, and the lower curve (drawn in blue) shows the growth rate with no metabolite. So, the addition of the metabolite affects the growth rate of the yeast. The results interpretation is modeled as a conclusion textual entity that states that the hypothesis inferred by Adam has been confirmed and the robot can update its background knowledge.
The investigation process has several assays that provide data used to test the hypotheses. The assay has specified inputs the metabolite and the yeast strain specified in the hypothesis, and the specified output is a data set consisting of several optical density measurements. The yeast bears the evaluant role, and the metabolite the nutrient role.
Ontology classes used in the three use cases (note: instances are not included):
Sources and term IDs
administering substance in vivo
conclusion textual entity
extracellular electrophysiology recording
hypothesis driven investigation
hypothesis textual entity
material to be added role
measuring neural activity in the caudate nucleus
extracellular electrophysiology recording
directive information entity
administering substance in vivo
directive information entity
presentation of stimulus
spike train datum
study subject role
target of material addition role
administering substance in vivo
Relations used in the three use cases:
1, 2, 3
In the example of the neuroscience investigation use case, the construction of logical definitions of the experimental process encouraged us to ask questions of domain experts because details we wished to capture were not explicit in the publication. For example, was the location of the micro-electrode extra- or intra- cellular? Were all spike train data recorded from the caudate nucleus? How does a spike train relate to the GO biological process regulation of action potential [GO:0001508]? Based on the answers, we augmented OBI’s existing assays and imported several terms from external ontologies, for example NIFSTD. When we needed relations that were not yet present in OBI, rather than define them ourselves we used relations from ro_proposed (http://obofoundry.org/cgi-bin/detail.cgi?ro_proposed). For example unfolds in specifies that an occurrent (process) happens in a certain location (i.e., the assay of spike trains in the caudate nucleus). Finally, we used the NCBI taxonomy  to describe the species involved in this experiment. Re-use of external resources fulfils two purposes. First, as domain experts have already devoted time to defining terms in these external ontologies we save ourselves substantial efforts by not replicating that work. Second, by re-using existing resources that others already use, we improve the potential for future data integration by making it unnecessary to map between different identifiers denoting the same entity.
In developing the neuroscience use case we found decisions about choosing an appropriate level of detail challenging: in this use case we decided not to include instances of the classes, but instead to focus on adding classes that can be re-used for other use cases and communities. It is our intention that our analysis and the classes we defined serve as design patterns for other neuroscience assays. Depending on the use case, OBI intends to be able to model the desired level of details (granularity), from molecular level experiments to higher level of biomedical investigations. OBI can be used at a more or less granular level depending on the user community needs.
In the second use case, the vaccine protection investigation includes three processes. The processes vaccination and pathogen challenge are disjoint subclasses of administering substance in vivo. The process Survival assessment is a type of assay (Table 1). We found that all these required processes, as well as all other entities described in the use case could be represented using OBI idioms. Syringe is a processed material that participates in different processes. Entities such as vaccine are types of material entity. Host role, pathogen role, and material to be added role are types of role.
That OBI can be used to represent experimental processes for different applications and domains is appealing because it suggests that we can better leverage the work we each do. For the domain of vaccine investigation, approximately 400 vaccines have been manually curated and stored in the Vaccine Investigation and Online Information Network (VIOLIN; http://www.violinet.org) vaccine database system . Currently, the vaccine protection experimental data in VIOLIN is stored in plain text and can be difficult to interpret. The lack of a common ontology to aid in representing this data has prevented optimal use of the VIOLIN vaccine data. We plan to apply the representation described in this paper to that data in order to enable advanced querying both within the data as well as across data from other biomedical communities that represent their data using OBI. As an example, consider that a vaccine candidate against Alzheimer disease may induce specific changes on the brains of transgenic mice or human patients (http://www.ncbi.nlm.nih.gov/pubmed/12379846). Therefore enabling queries across the domains of vaccinology and neuroscience would be of utility in conducting such research.
The representations of the investigations run by Adam were stored as instances of the defined classes in a relational database . Accurate and complete recording of all experimental processes involved in the investigations allows efficient re-use of produced data and results for different investigations with different objectives. OBI’s approach to representation for automation suggests new possibilities for automated investigations, desirable because such methods offer high throughput mechanisms not only for data generation, but also for hypotheses generation and the results analysis. As using terminology from a wide range of biology is a central part of OBI’s methodology, we can easily imagine that it is reasonable to extend the reach of such an approach. For example, DNA microarray experiments may also be performed using Robot Scientists in order to generate and test hypotheses regarding the transcript expression level in brain or other tissues, and knowledge encapsulated the Gene Ontology or other ontologies could be applied to interpreting the results of such experiments.
Here we provide three real world use cases as examples of how to represent experimental processes with OBI. Experience such as this helps validate OBI’s current design choices, as well show how to extend it in domain specific ways. It also generates competency questions that allow us to identify parts of OBI that are insufficiently expressive and to identify external resources that can be used to extend OBI’s coverage. We found that a major technical challenge is the requirement to import terms from other ontologies to construct logical definitions: due to its broad scope OBI spans multiple existing ontological resources. There is a significant cost preventing those large imports, as reasoning becomes slower and the ontology is harder to navigate. To solve this problem the OBI consortium developed the MIREOT mechanism , which preserves namespaces of imported terms and allows their direct use into OBI. We also hope that technologies such as views  and modules  as well as improvements to existing reasoners will address these issues. OBI will be further developed to expand the coverage and depth of biomedical investigations and the use cases presented here helped us in testing the version 1.0 of the ontology.
List of abbreviations used
Information Artifact Ontology
LABoratory Ontology for Robot Scientists
Microarray Gene Expression Society
Minimum Information to Reference an External Ontology Term
Metabolomics Standards Initiative
National Center for Biotechnology Information
Neuroscience Information Framework standardized
Ontology for Biomedical Investigations
Open Biomedical Ontologies
Vaccine Investigation and Online Information Network
This work is partially supported by grant funding from the National Institute of Biomedical Imaging and Bioengineering, National Human Genome Institute, NIH R01EB005034, NIH P41 HG003619, NIH U54-DA-021519, NIH-NIAID R01AI081062, HHSN26620040006C, the Intramural Research Program of the NIH and NIEHS, (HHSN273200700046UEC), NIMH (NIH) R01MH084812-01A1, NCRR (NIH) the Bio-Informatics Research Network Coordinating Center (U24 RR025736-01), EMERALD project (LSHG-CT-2006-037686), EC FELICS, MUGEN, BBSRC (BB/E025080/1, BB/C008200/1, BB/G000638/1), RC UK, NERC-NEBC, EU IP CarcinoGenomics (PL 037712), EU NoE NuGO (NoE 503630), CARMEN project EPSRC (EP/E002331/1), NERC-NEBC, EU IP CarcinoGenomics (PL 037712), EU NoE NuGO (NoE 503630), the Michael Smith Foundation for Health Research, the Public Health Agency of Canada / Canadian Institutes of Health Research Influenza Research Network (PCIRN).
We thank Midori Harris, David Hill, Jane Lomax and Maryann Martone for discussions on the neuroscience use case. We thank Ross D. King for the discussions on the automated functional genomics use case.
We thank the OBI Consortium members Jeff Grethe, Daniel Rubin, Bill Bug, Stefan Wiemann, Tina Hernandez-Boussard, Richard Scheuermann, Richard Bruskiewich, Frank Gibson, Norman Morrison, Dawn Field, Tanya Gray, Eric Deutsch, Daniel Schober, Luisa Montecchi, Chris Taylor, Trish Whetzel, John Westbrook, Gilberto Fragoso, Joe White, Mervi Heiskanen, Liju Fan, Helen Causton, Allyson Lister, Kevin Clancy, Cristian Cocos, Jay Greenbaum, Pierre Grenon, Chris Mungall, Matthew Pocock, Holger Stenzhorn, Lawrence Hunter, Monnie Mc Gee, Barry Smith, Robert Stevens, Elisabetta Manduchi for the contribution to OBI.
This article has been published as part of Journal of Biomedical Semantics Volume 1 Supplement 1, 2010: Proceedings of the Bio-Ontologies Special Interest Group Meeting 2009: Knowledge in Biology. The full contents of the supplement are available online at http://www.jbiomedsem.com/supplements/1/S1.
- The Gene Ontology Consortium: The Gene Ontology project in 2008. Nucleic acids research. 2008, 36: D440-4. 10.1093/nar/gkm883.View Article
- Matos P, Ennis M, Darsow M: ChEBI - Chemical Entities of Biological Interest. Nucleic Acids Research. 2006, Database Summary: 646-
- The Gene Ontology Consortium: Gene Ontology: Tool for the Unification of Biology. Nature Genetics. 2000, 25: 25-29. 10.1038/75556.View Article
- Whetzel PL, Brinkman RR: Development of FuGO: an ontology for functional genomics investigations. Omics. 2006, 10 (2): 199-204. 10.1089/omi.2006.10.199.View Article
- Whetzel PL, Parkinson HE, Causton HC: The MGED Ontology: a resource for semantics-based description of microarray experiments. Bioinformatics. 2006, 22: 866-873. 10.1093/bioinformatics/btl005.View Article
- Soldatova LN, King RD: An Ontology of Scientific Experiments. J R Soc Interface. 2006, 3: 795-803. 10.1098/rsif.2006.0134.View Article
- King RD, Rowland J, Oliver SG: The Automation of Science. Science. 2009, 324: 85-89. 10.1126/science.1165620.View Article
- Sansone S, Schober D, Atherton HJ: Metabolomics Standards Initiative - Ontology Working Group. Work in Progress. Metabolomics. 2007, 3 (3): 249-256. 10.1007/s11306-007-0069-z.View Article
- Smith B, Ashburner M, Rosseet C: The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nature Biotechnology. 2007, 25: 1251-1255. 10.1038/nbt1346.View Article
- Smith B, Ceusters W, Klagges B: Relations in Biomedical Ontologies. Genome Biology. 2005, 6: R46-10.1186/gb-2005-6-5-r46.View Article
- Grenon P, Smith B, Goldberg L: Biodynamic Ontology: Applying BFO in the Biomedical Domain. Ontologies in Medicine. 2004, IOS Press, 20-32.
- Schober D, Smith B, Lewis S: AS: Naming Conventions for OBO Foundry Ontology engineering. BMC Bioinformatics. 2009, 10: 125-10.1186/1471-2105-10-125.View Article
- Peters B, Sidney J, Bourne P: The immune epitope database and analysis resource: from vision to blueprint. PLoS Biol. 2005, 3 (3): e91-10.1371/journal.pbio.0030091.View Article
- Lauwereyns J, Watanabe K, Coe B, Hikosaka O: A neural correlate of response bias in monkey caudate nucleus. Nature. 2002, 418: 413-417. 10.1038/nature00892.View Article
- Bug W, Ascoli GA: The NIFSTD and BIRNLex Vocabularies: Building Comprehensive Ontologies for Neuroscience. Neuroinformatics. 2008, 6: 175-194. 10.1007/s12021-008-9032-z.View Article
- Sayers EW, Barrett T, Benson DA: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2009, 37: D5-15. 10.1093/nar/gkn741.View Article
- Xiang Z, Todd Th, Ku KP: VIOLIN: vaccine investigation and online information network. Nucleic acids research. 2008, 36: D923-D928. 10.1093/nar/gkm1039.View Article
- Soldatova LN, Clare A, Sparkes A, King RD: An ontology for a Robot Scientist. Bioinformatics. 2006, 22 (14): e464-e471. 10.1093/bioinformatics/btl207.View Article
- Courtot M, Gibson F, Lister A: MIREOT: the Minimum Information to Reference an External Ontology Term. In Proc. 2009, ICBO'09
- Detwiler LT, Brinkley JF: Custom views of reference ontologies. In Proc. AMIA Annu Symp. 2006, 909. PMID: 17238528-
- Parsia B, Sattler U, Schneider T: Mechanisms for Importing Modules, Owl Experiences and Directions. In Proc. OWLED. 2009, [http://www.webont.org/owled/2009/papers/owled2009_submission_10.pdf]
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.