The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery
- Michel Dumontier1, 4Email author,
- Christopher JO Baker2,
- Joachim Baran3,
- Alison Callahan4,
- Leonid Chepelev4,
- José Cruz-Toledo4,
- Nicholas R Del Rio5,
- Geraint Duck6,
- Laura I Furlong7,
- Nichealla Keath4,
- Dana Klassen8,
- James P McCusker9,
- Núria Queralt-Rosinach7,
- Matthias Samwald10,
- Natalia Villanueva-Rosales5,
- Mark D Wilkinson11 and
- Robert Hoehndorf12
© Dumontier et al.; licensee BioMed Central Ltd. 2014
Received: 2 July 2013
Accepted: 2 February 2014
Published: 6 March 2014
The Semanticscience Integrated Ontology (SIO) is an ontology to facilitate biomedical knowledge discovery. SIO features a simple upper level comprised of essential types and relations for the rich description of arbitrary (real, hypothesized, virtual, fictional) objects, processes and their attributes. SIO specifies simple design patterns to describe and associate qualities, capabilities, functions, quantities, and informational entities including textual, geometrical, and mathematical entities, and provides specific extensions in the domains of chemistry, biology, biochemistry, and bioinformatics. SIO provides an ontological foundation for the Bio2RDF linked data for the life sciences project and is used for semantic integration and discovery for SADI-based semantic web services. SIO is freely available to all users under a creative commons by attribution license. See website for further information: http://sio.semanticscience.org.
Biomedical research is poised to enter an era of unprecedented large scale data analysis powered by hundreds of public biological databases and hundreds of millions of patient records. There is a real and urgent need to explore effective methods for biomedical data integration and knowledge management [1, 2]. Semantic-based technologies, such as ontologies, offer a proven method to exploit expert-based knowledge in the analysis of large datasets through terminological reasoning such as correspondence, classification, query answering and consistency checking [3–5].
The Semantic Web effort, as pursued under the auspices of the World Wide Web Consortium (W3C), provides a set of standards to facilitate the representation, publication, linking, querying and discovery of heterogeneous knowledge using web infrastructure . In particular, the Resource Description Framework (RDF)  enables triple-based assertions about resources using web-friendly identifiers, RDF Schema (RDFS)  offers vocabulary to create terminological hierarchies, and the Web Ontology Language (OWL)  assists in the construction and interpretation of ontologies as sophisticated logic-based expressions to more precisely capture the meaning of types and relations between entities. With dozens of high value datasets now available in RDF and hundreds of biological ontologies expressed using OWL, there is a tantalizing opportunity to use these resources in knowledge discovery. Biomedical researchers have made use of Semantic Web technologies to uncover curation errors in systems biology models , find putative disease-causing genes , identify aberrant pathways , and uncover alternative drug therapies based on mechanism of action , among others . These knowledge-based applications use automated reasoning over a coherent knowledge base often crafted from multiple and different underlying representations. Ontology-design patterns offer a simple way to guide users towards a uniform representation of knowledge [15–17].
With the goal of facilitating knowledge discovery through simple, but effective ontology-based data integration, we developed the Semanticscience Integrated Ontology (SIO). SIO offers classes and relations to describe and relate objects, processes and their attributes with specific extensions in the biomedical domain. Its relations cover aspects of spatial and temporal qualitative reasoning including location, containment, overlap, parthood and topology; participation and agency, linguistic and symbolic representation, as well as comparative and other information-oriented relations. Using straightforward mappings, we report on the substantial benefits afforded by SIO in the retrieval of RDF-based linked data and automatic composition of OWL-described semantic web services. Although SIO development is driven by needs in the biomedical domain, we show that SIO can be applied to a broader set of domains.
This paper is organized as follows: we first describe the current state of the SIO OWL implementation, and then we describe ontological foundations and essential relations in mereotopology, participation and reference. We then present three uses of SIO in knowledge representation and outline its use in the integration of data and web services. We finish with a brief comparison with related work. As a matter of convention, we use ‘single quotes’ to indicate labels, boldface to indicate classes, and italics to indicate relations.
The semanticscience integrated ontology
Processes and participation
SIO includes an OWL2 property chain [realizes o is role of - > has participant] which enables an OWL2 DL reasoner to infer that entities having the realized role are also participants of the process.
Referential relations in SIO are used to indicate what an object refers to or the nature of the mention of one entity by another (Figure 3C). At the top level, ‘refers to’ enables this basic mention, while ‘references’ is a relation where one entity mentions another, ‘describes’ is a relation where one entity provides a detailed account of another, and ‘represents’ is a relation where one entity is a sign, symbol or model for another. ‘describes’ is further partitioned into ‘is about’ where one entity provides information about another while ‘specifies’ contains specific information that can be used as evaluation criteria to determine the degree of conformance. ‘references’ is further subdivided into ‘cites’ as a relation to refer to by way of example, authority or proof, and ‘has evidence’ which is a relation between a proposition and something that demonstrates the truth of the assertion. ‘has evidence’ has three sub-properties (‘is supported by’, ‘is disputed by’, ‘is refuted by’) which can articulate the type of evidence that one entity offers another. Finally, ‘represents’ is subdivided into ‘denotes’ which is a relation between an entity and what it is a sign or indication of, or what it specifically means, and ‘is model of’ which indicates that an artifact is a model or representation of another.
In this section we detail three use cases that outline how SIO can be used to represent biomedical knowledge, scientific experiments, and measurements.
In this use case, we describe the various parts and relationships within a scientific investigation. A scientific ‘experiment’ (Figure 9) is a ‘procedure’ that aims to support, dispute or refute a well formulated ‘hypothesis’ by ‘analysis’ of ‘data’ obtained through ‘observation’ and/or ‘measurement’. Experiments usually involve:
the development of a research ‘plan’ which includes, but is not limited to:
○ the formulation of a ‘hypothesis’
○ the formulation of aims and ‘objectives’
○ the formulation of a ‘study design’
the execution of the research plan which includes, but is not limited to:
○ the ‘selection, preparation or collection’ of a ‘sample’
○ the ‘collection of data’ through ‘observation’, ‘assay’ or ‘measurement’
○ the ‘analysis’ of ‘data’
○ the preparation of an investigational ‘report’
Figure 9 illustrates a pattern to express the relationship among a research plan, study design, experiment and its parts (e.g. sample preparation, measurement, analysis). Temporal parts are linked to the whole using SIO’s ‘has proper part’ relation, while temporal ordering is achieved with SIO’s ‘precedes’ relation.
A ‘description’ provides detailed information ‘about‘ some ‘entity’ (‘object’, ‘process’ or ‘attribute’), a ‘hypothesis’ is a proposed explanation of some phenomena, and an ‘objective’ is a description of a desired outcome. A description that ‘specifies‘ a set of actions to be executed is an ‘action specification’ and include ‘plans’, ‘study designs’, recipes and ‘protocols’. A plan should clearly identify (‘specify’) one or more ‘objectives’, and optionally specify a ‘hypothesis’ or ‘study design’ as ‘attributes’. A plan, like any action-based specification ‘is manifested as’ a ‘process’. An objective ‘is realized in’ an experiment if and only if its outcomes are fully apparent. Data generated from the experiment may also serve as ‘evidence for’ the hypothesis, and more specifically found to be ‘in support of’, ‘in disput e of’, or ‘in refutation of’ the hypothesis. The Ontology for Biomedical Investigations (OBI) features more specific assays, material and data processing techniques .
Measurements and measurement values
Semantic data integration and question answering
This query is possible because the BioModels type for biochemical reaction has been mapped as a subclass of SIO’s ‘biochemical reaction’. Similarly, the BioModels predicate for ‘is identical to’ has been mapped as a sub-property of SIO’s ‘is identical to’.
Semantic Web service interoperability
Top 10 classes and relations used in SADI services registered at http://sadiframework.org
Deoxyribonucleic acid sequence
Ribonucleic acid sequence
Is attribute of
Is part of
Is derived from
Is similar to
SIO is also being used in the Earth Life and Semantic Web (ELSEWeb) project to streamline the flow of heterogeneous geospatial data in order to ease the task of creating multi-source models of species-distribution . ELSEWeb translates a family of industry standard XML geospatial metadata (e.g., OGC WCS, FGDC, CF) into RDF that is based on constructs defined by SIO and the Extensible Observation Ontology (OBOE) . Geospatial satellite data is automatically discovered, transformed, and integrated with species distribution models services using the ELSEWebData ontology. The alignment of SIO, OBOE, ELSEWebData allows geospatial data to be queried and integrated with both data from the bio and environmental communities, providing a wider spectrum of modeling potential.
The OBO Foundry is a collaborative effort to construct a set of orthogonal interoperable Open Biomedical Ontologies (OBO) . OBO Foundry ontologies use the Basic Formal Ontology (BFO) as an upper level ontology for domain independent types and the Relation Ontology (RO) as a source of domain-independent relations. The BFO is a small (36 class) ontology that is intentionally limited by its realist philosophy to classes with at least one known instance and whose instances only exist in real space and time [36, 37]. In contrast, SIO simplifies the declaration and characterization of hypothetical, theorized or virtual entities (simply by virtue of having such a quality) and is thus more broadly applicable to situations of interest to the health care and life sciences including the presumed existence of underlying agents in medical disease or the existence of entities or attributes that are computationally predicted. SIO allows processes to have characterizing attributes, whereas the BFO does not . The RO was initially  comprised of a collection of 8 domain-independent (e.g. has part) relations which has since been expanded to 160 relations, although these do not include all relations used in all OBO ontologies. OBO Foundry’s approach to building an interoperable set of ontologies can be contrasted with that of SIO, where instead of coordinating needs and duplication across dozens of ontologies, SIO serves as a single point of interoperability capable of addressing needs that go beyond its current scope. In order to foster semantic interoperability between SIO and BFO + RO, we have mapped 9 BFO classes and 24 RO relations to SIO (mapping available at ).
BioTop  is an upper level ontology for biology and medicine that features 390 classes and 82 object properties. The class top-level is characterized by a flattened set of basic categories (material object, immaterial object, information object, process, quality, role, condition, disposition, time, value region) while the object hierarchy provides type-specific relations around physical, processual and abstract nature (e.g. has physical part, has processual part, has abstract part). BioTop includes relatively sophisticated formalization for selected terms, for example pathological disposition is defined as "disposition that ('inheres in' some ('bearer of' some (canonicity and ('quality located' some 'noncanonical value region'))))", where SIO would simply express it as a 'biological disposition' that ('is attribute of' some ('entity' that 'has attribute' some 'pathological quality')). BioTop has been used to provide a number of ontology design patterns [42, 43] and to identify semantic type errors in the UMLS network .
The Translational Medicine Ontology (TMO) is a unifying ontology for chemical, genomic and proteomic data with disease, treatment, and electronic health records . The TMO acted as a central schema that mapped basic types to dozens of bio-ontologies and linked open data. The utility of the TMO was demonstrated by answering a series of questions pertaining to diagnosis, prescription, drug mechanism of action, alternative therapeutics, and biomarkers. As SIO emerged from considerations in the TMO effort, SIO can be seen as the supported successor to TMO.
The Semanticscience Integrated Ontology (SIO) is an ontology of basic types and relations to capture a wide span of knowledge through a set of emerging domain-specific patterns using RDF/OWL. SIO has emerged to support the demands of the bioinformatics community, with a special emphasis on biological knowledge representation as well as ontology, data and service interoperability.
The SIO homepage is http://sio.semanticscience.org. SIO is freely available under a Creative Commons by Attribution license at http://semanticscience.org/ontology/sio.owl. Version 1.0 of SIO is available as Additional file 1. The base namespace for SIO entities (classes, properties) is http://semanticscience.org/resource/. SIO entities are identified using resolvable HTTP URIs, initially formulated as an alphanumeric identifier e.g. http://semanticscience.org/resource/SIO_000001, but is alternatively accessible using a label-based identifier e.g. http://semanticscience.org/resource/is-related-to. These and other generated subsets are available from http://goo.gl/0LgN8.
This work was funded, in part, by NSERC Discovery Grant to MD, Ontario Early Researcher Award to MD and CANARIE NEP-2 grant to MD, CB and MW. This work has received support from the IMI Joint Undertaking under grant agreement no. 115191, Open PHACTS, resources of which comprise financial contribution from the EU FP7 (FP7/2007-2013) and EFPIA companies’ in kind contribution; and Instituto de Salud Carlos III FEDER [CP10/00524]. The Research Programme on Biomedical Informatics (GRIB) is a node of the Spanish National Institute of Bioinformatics (INB). ELSEWeb is funded by NASA ACCESS grant NNX12AF49A (UTEP) and used resources from Cyber-ShARE Center of Excellence supported by NSF grant HRD-1242122. We would like to thank the following individuals for thoughtful discussions and contributions on and off the mailing list: Jerven Bolleman, Kevin Cohen, Melanie Courtot, Simon Jupp, Jin-Dong Kim, James Malone, Luke McCarthy, Chris Mungall, David Osumi-Sutherland, Alexandre Riazanov and Robert Stevens.
- Gardner SP: Ontologies and semantic data integration. Drug Discov Today. 2005, 10 (14): 1001-1007. 10.1016/S1359-6446(05)03504-X.View Article
- Goble C, Stevens R: State of the nation in data integration for bioinformatics. J Biomed Inform. 2008, 41 (5): 687-693. 10.1016/j.jbi.2008.01.008.View Article
- Bodenreider O, Stevens R: Bio-ontologies: current trends and future directions. Brief Bioinform. 2006, 7 (3): 256-274. 10.1093/bib/bbl027.View Article
- Noy NF: Semantic integration: a survey of ontology-based approaches. SIGMOD Rec. 2004, 33 (4): 65-70. 10.1145/1041410.1041421.View Article
- Wache H, Voegele T, Visser U, Stuckenschmidt H, Schuster G, Neumann H, Hübner S: Ontology-Based Integration of Information-a Survey of Existing Approaches. IJCAI-01 Workshop: Ontologies and Information Sharing, Vol. 2001. 2001, 108-117.
- Shadbolt N, Hall W, Berners-Lee T: The Semantic Web revisited. IEEE Intell Syst. 2006, 21 (3): 96-101. 10.1109/MIS.2006.62.View Article
- November 25, 2013. Resource Description Framework. 2004, Available from: http://www.w3.org/tr/rdf-concepts/
- RDF Vocabulary Description Language 1.0: RDF Schema. 2004, Available from: http://www.w3.org/TR/rdf-schema/
- Hitzler P, Krötzsch M, Parsia B, Patel-Schneider PF, Rudolph S: OWL 2 Web Ontology Language Primer. 2009, cited 2011; Available from: http://www.w3.org/TR/owl2-primer/
- Hoehndorf R, Dumontier M, Gennari JH, Wimalaratne S, de Bono B, Cook DL, Gkoutos GV: Integrating systems biology models and biomedical ontologies. BMC Syst Biol. 2011, 5: 124-10.1186/1752-0509-5-124.View Article
- Hoehndorf R, Schofield PN, Gkoutos GV: PhenomeNET: a whole-phenome approach to disease gene discovery. Nucleic acids research. 2011, 39: 18: e119-View Article
- Hoehndorf R, Dumontier M, Gkoutos GV: Identifying aberrant pathways through integrated analysis of knowledge in pharmacogenomics. Bioinformatics. 2012, 28 (16): 2169-2175. 10.1093/bioinformatics/bts350.View Article
- Luciano JS, Andersson B, Batchelor C, Bodenreider O, Clark T, Denney CK, Domarew C, Gambet T, Harland L, Jentzsch A, Kashyap V, Kos P, Kozlovsky J, Lebo T, Marshall SM, McCusker JP, McGuinness DL, Ogbuji C, Pichler E, Powers RL, Prud'hommeaux E, Samwald M, Schriml L, Tonellato PJ, Whetzel PL, Zhao J, Stephens S, Dumontier M: The translational medicine ontology and knowledge base: driving personalized medicine by bridging the gap between bench and bedside. J Biomed Semant. 2011, 2 (Suppl 2): p. S1-View Article
- Sahoo SS, Bodenreider O, Rutter JL, Skinner KJ, Sheth AP: An ontology-driven semantic mashup of gene and biological pathway information: application to the domain of nicotine dependence. J Biomed Inform. 2008, 41 (5): 752-765. 10.1016/j.jbi.2008.02.006.View Article
- Gangemi A: Ontology design patterns for semantic web content. The Semantic Web–ISWC 2005. 2005, Berlin Heidelberg: Springer, 262-276.View Article
- Egaña M, Rector A, Stevens R, Antezana E: Applying ontology design patterns in bio-ontologies. Knowledge Engineering: Practice and Patterns. 2008, Berlin Heidelberg: Springer, 7-16.View Article
- Aranguren ME, Antezana E, Kuiper M, Stevens R: Ontology design patterns for bio-ontologies: a case study on the cell cycle ontology. BMC Bioinformatics. 2008, 9 (Suppl 5): S1-10.1186/1471-2105-9-S5-S1.View Article
- Brinkman RR, Courtot M, Derom D, Fostel JM, He Y, Lord P, Malone J, Parkinson H, Peters B, Rocca-Serra P, Ruttenberg A, Sansone SA, Soldatova LN, Stoeckert CJ, Turner JA, Zheng J, O.B.I. consortium: Modeling biomedical experimental processes with OBI. J Biomed Semantics. 2010, 1 (Suppl 1): S7-10.1186/2041-1480-1-S1-S7.View Article
- Belleau F, Nolin MA, Tourigny N, Rigault P, Morissette J: Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J Biomed Inform. 2008, 41 (5): 706-716. 10.1016/j.jbi.2008.03.004.View Article
- Callahan A, Cruz-Toledo J, Dumontier M: Ontology-based querying with Bio2RDF's linked open data. J Biomed Semantics. 2013, 4 (Suppl 1): S1-10.1186/2041-1480-4-S1-S1.View Article
- Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V, Djoumbou Y, Eisner R, Guo AC, Wishart DS: DrugBank 3.0: a comprehensive resource for 'omics' research on drugs. Nucleic Acids Res. 2011, 39 (Database issue): D1035-D1041.View Article
- Whirl-Carrillo M, McDonagh EM, Hebert JM, Gong L, Sangkuhl K, Thorn CF, Altman RB, Klein TE: Pharmacogenomics knowledge for personalized medicine. Clin Pharmacol Ther. 2012, 92 (4): 414-417. 10.1038/clpt.2012.96.View Article
- Chelliah V, Laibe C, Le Novere N: BioModels database: a repository of mathematical models of biological processes. Methods Mol Biol. 2013, 1021: 189-199. 10.1007/978-1-62703-450-0_10.View Article
- Wilkinson MD, Vandervalk B, McCarthy L: The semantic automated discovery and integration (SADI) Web service design-pattern, API and reference implementation. J Biomed Semantics. 2011, 2 (1): 8-10.1186/2041-1480-2-8.View Article
- Wilkinson MD, McCarthy L, Vandervalk B, Withers D, Kawas E, Samadian S: SADI, SHARE, and the in silico scientific method. BMC Bioinformatics. 2010, 11 (Suppl 12): S7-10.1186/1471-2105-11-S12-S7.View Article
- Chepelev LL, Riazanov A, Kouznetsov A, Low HS, Dumontier M, Baker CJ: Prototype semantic infrastructure for automated small molecule classification and annotation in lipidomics. BMC Bioinformatics. 2011, 12: 303-10.1186/1471-2105-12-303.View Article
- Vandervalk B, McCarthy EL, Cruz-Toledo J, Klein A, Baker CJ, Dumontier M, Wilkinson MD: The SADI personal health lens: a Web browser-based system for identifying personally relevant drug interactions. JMIR Res Protoc. 2013, 2 (1): e14-10.2196/resprot.2315.View Article
- BLASTN P: dulcis SADI web service. 2013, Available from: http://sadiframework.org/services/blast/Prunus+dulcis
- Del Rio N, Villanueva-Rosales N, Pennington D, Benedict K, Stewart A, Grady C: Elseweb meets sadi: Supporting data-to-model integration for biodiversity forecasting. Discovery Informatics Symposium. 2013
- Madin J, Bowers S, Schildhauer M, Krivov S, Pennington D, Villa F: An ontology for describing and synthesizing ecological observation data. Ecol Informat. 2007, 2 (3): 279-296. 10.1016/j.ecoinf.2007.05.004.View Article
- Mons B, van Haagen H, Chichester C, den Dunnen JT, van Ommen G, van Mulligen E, Singh B, Hooft R, Roos M, Hammond J: The value of data. Nat Genet. 2011, 43 (4): 281-283. 10.1038/ng0411-281.View Article
- Patrinos GP, Cooper DN, van Mulligen E, Gkantouna V, Tzimas G, Tatum Z, Schultes E, Roos M, Mons B: Microattribution and nanopublication as means to incentivize the placement of human genome variation data into the public domain. Hum Mutat. 2012, 33 (11): 1503-1512. 10.1002/humu.22144.View Article
- Kuhn T, Barbano PE, Nagy ML, Krauthammer M: Broadening the scope of nanopublications. The Semantic Web: Semantics and Big Data. 2013, Berlin Heidelberg: Springer, 487-501.View Article
- van Haagen HH, AC't Hoen P, Bovo AB, de Morrée A, van Mulligen EM, Chichester C, Kors JA, den Dunnen JT, van Ommen G-JB, van der Maarel SM: Novel protein-protein interactions inferred from literature context. PLoS One. 2009, 4 (11): e7894-10.1371/journal.pone.0007894.View Article
- Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Shah N, Whetzel PL, Lewis S: The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol. 2007, 25 (11): 1251-1255. 10.1038/nbt1346.View Article
- Smith B, Ceusters W: Ontological realism: a methodology for coordinated evolution of scientific ontologies. Appl Ontol. 2010, 5 (3–4): 139-188.
- Formal Ontology in Information Systems, Proceedings of the Sixth International Conference, FOIS 2010, Toronto, Canada, May 11-14, 2010. Edited by: Antony G, Riichiro M. 2010, IOS Press, 387-399. Frontiers in Artificial Intelligence and Applications ISBN 978-1-60750-534-1
- Lord P, Stevens R: Adding a little reality to building ontologies for biology. PLoS One. 2010, 5 (9): e12258-10.1371/journal.pone.0012258.View Article
- Smith B, Ceusters W, Klagges B, Kohler J, Kumar A, Lomax J, Mungall C, Neuhaus F, Rector AL, Rosse C: Relations in biomedical ontologies. Genome Biol. 2005, 6 (5): R46-10.1186/gb-2005-6-5-r46.View Article
- Mungall C, Dumontier M: SIO-RO mapping. 2013, Available from: http://purl.obolibrary.org/obo/ro/bridge/sio-ro-bridge.owl
- Stenzhorn H, Beisswanger E, Schulz S: Towards a top-domain ontology for linking biomedical ontologies. Stud Health Technol Inform. 2007, 129 (Pt 2): 1225-1229.
- Schulz S, Spackman K, James A, Cocos C, Boeker M: Scalable representations of diseases in biomedical ontologies. J Biomed Semantics. 2011, 2 (Suppl 2): S6-10.1186/2041-1480-2-S2-S6.View Article
- Seddig-Raufie D, Jansen L, Schober D, Boeker M, Grewe N, Schulz S: Proposed actions are no actions: re-modeling an ontology design pattern with a realist top-level ontology. J Biomed Semantics. 2012, 3 (Suppl 2): S2-10.1186/2041-1480-3-S2-S2.View Article
- Schulz S, Beisswanger E, van den Hoek L, Bodenreider O, van Mulligen EM: Alignment of the UMLS semantic network with BioTop: methodology and assessment. Bioinformatics. 2009, 25 (12): i69-i76. 10.1093/bioinformatics/btp194.View Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.