Dione: An OWL representation of ICD-10-CM for classifying patients’ diseases
© The Author(s) 2016
Received: 1 June 2016
Accepted: 21 September 2016
Published: 13 October 2016
Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT) has been designed as standard clinical terminology for annotating Electronic Health Records (EHRs). EHRs textual information is used to classify patients’ diseases into an International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) category (usually by an expert). Improving the accuracy of classification is the main purpose of using ontologies and OWL representations at the core of classification systems. In the last few years some ontologies and OWL representations for representing ICD-10-CM categories have been developed. However, they were not designed to be the basis for an automatic classification tool nor do they model ICD-10-CM inclusion terms as Web Ontology Language (OWL) axioms, which enables automatic classification. In this context we have developed Dione, an OWL representation of ICD-10-CM.
Dione is the first OWL representation of ICD-10-CM, which is logically consistent, whose axioms define the ICD-10-CM inclusion terms by means of a methodology based on SNOMED CT/ICD-10-CM mappings. The ICD-10-CM exclusions are handled with these mappings. Dione currently contains 391,669 classes, 391,720 entity annotation axioms and 11,795 owl:equivalentClass axioms which have been constructed using 104,646 relationships extracted from the SNOMED CT/ICD-10-CM and BioPortal mappings included in Dione using the owl:intersectionOf and the owl:someValuesFrom statements. The resulting OWL representation has been classified and its consistency tested with the ELK reasoner. We have also taken three clinical records from the Virgen de la Victoria Hospital (Málaga, Spain) which have been manually annotated using SNOMED CT. These annotations have been included as instances to be classified by the reasoner. The classified instances show that Dione could be a promising ICD-10-CM OWL representation to support the classification of patients’ diseases.
Dione is a first step towards the automatic classification of patients’ diseases by using SNOMED CT annotations embedded in Electronic Health Records (EHRs). The purpose of Dione is to standardise and formalise a medical terminology, thereby enabling new kinds of tools and new sets of functionalities to be developed. This in turn assists health specialists by providing classified information from EHRs and enables the automatic annotation of patients’ diseases with ICD-10-CM codes.
KeywordsICD-10-CM SNOMED CT Ontologies Automatic classification
The International Classification of Diseases, 10th Revision (ICD-10)  is a standard diagnostic tool for health management, epidemiology and clinical purposes. ICD-10 comprises Chapters I to XXII which cover diseases, a variety of signs and symptoms, abnormal findings, complaints, social circumstances and external causes of injuries and diseases. ICD-10-CM corresponds to the tenth version, clinical modifications, which is the current ICD version. This medical classification standard, maintained and published by the World Health Organisation (WHO) is used to classify diseases and health problems that have been recorded on death certificates and in other records. The accuracy of this classification is a very important issue because it is used, for example, to set capitation rates and allocate resources to medical centers. It is also used by medical and health services researchers to determine the case fatality and morbidity rates. Furthermore, ICD-10-CM has been mandatory in the U.S. and therefore in regular and routine use since October 1, 2015.
The concept of ontologies has been widely used in numerous real-word applications domains from Health Care and Life Science to Finance and Government. The majority of current ontologies are expressed in the well-known Web Ontology Language (OWL) . Semantic Reasoners such as Pellet , ELK , KAON2  and RacerPro  are all widely used to develop ontology-based automatic classification systems. Improving the accuracy of classification is the main purpose of using ontologies and OWL representations as the basis for a classification system. The literature review refers to several OWL ontologies for representing ICD-10-CM categories that have been developed. However, they were never intended to be the basis for an automatic classification tool nor do they model ICD-10-CM inclusion terms as OWL axioms, which enables this type of automatic classification.
SNOMED CT/ICD-10-CM alignments (mappings) have been established and described by Unified Medical Language System (UMLS) . These mappings have a cardinality of many to many. This means that one SNOMED CT concept can be mapped with many ICD-10-CM target categories and vice versa. When a SNOMED CT concept is not mapped to an ICD-10-CM category it means that it is not classifiable or is awaiting editorial review. One or more SNOMED CT source concepts can also be mapped with the same ICD-10-CM target category. In addition to these mappings provided by UMLS, BioPortal’s users have also defined SNOMED CT/ICD-10-CM mappings . In this case, the Bioportal’s mappings have the same cardinality as the mappings provided by UMLS.
The current situation is that EHRs are annotated using SNOMED CT concepts and textual information from these records is then used to classify the patients’ diseases into an ICD-10-CM category (usually by an expert). Interestingly, SNOMED CT/ICD-10-CM mappings can be exploited to define the ICD-10-CM inclusion terms based on the SNOMED standard definition of patients’ medical evidence, affected part of the body and symptoms and their relationships, connecting the standard annotations in EHRs with an ICD-10-CM category.
The main hypothesis of the work presented here is: (H1) It is possible to code, as OWL axioms, the ICD-10-CM inclusion terms obtained from SNOMED CT/ICD-10-CM mappings and use these OWL axioms to build an OWL representation of the ICD-10-CM diseases. As a result (H2) we obtain a useful OWL representation, which can be used as the basis for a semantic classification system. This system will then use the set of SNOMED CT concepts and relationships between SNOMED CT concepts, taken from EHRs, as input to automatically classify patients’ diseases into an ICD-10-CM category. The objective of using this OWL representation is to improve the accuracy of the manual annotation. The accuracy of the classification is particularly relevant at a time when diagnoses codes, such as the ICD-10-CM codes, can significantly affect the total funding that a hospital may receive for patients admitted . For example, in the United States, diagnosis-related groups (DRGs) based on ICD codes are the basis for hospital reimbursement for acute-care stays of Medicare beneficiaries . Another fact is that health services researchers use the ICD codes to study risk-adjusted, cross-sectional, and temporal variations in access to care, quality of care, costs of care, and effectiveness of care .
This has motivated us to develop an OWL representation to help find an automated approach to classify patients’ diseases in a medical context. Inclusion terms for each ICD-10-CM category are formalised as OWL axioms by exploiting SNOMED CT/ICD-10-CM mappings. These mappings allow SNOMED CT concepts and relationships to be used to define the inclusion terms. The resulting OWL representation called Dione1, which includes the ICD-10-CM Chapters I to XIV, is as complete as the available SNOMED CT/ICD-10-CM mappings allow. The exclusions proposed by ICD-10-CM are already handled by the mappings.
We have defined an algorithm which, starting from an ICD-10-CM category code, is able to obtain its corresponding SNOMED CT concept and all its relationships.
The relationships obtained are considered to be the inclusion terms for the ICD-10 category. Therefore, we have also defined an algorithm for translating these relationships to OWL axioms.
To test H1 we have developed Dione, an OWL representation of ICD-10-CM, specifically designed to classified patients’ diseases by exploiting OWL’s (Description Logic) reasoning capabilities. Dione represents the ICD-10-CM categories as classes and the inclusion terms as OWL axioms related to the class by means of the owl:equivalentClass statement. Classes representing ICD-10-CM categories as well as classes representing SNOMED CT concepts are organised as several hierarchies.
To test H2, Dione’s consistency has been checked and information from real clinical records has been classified to show Dione’s applicability through three clinical use cases from the Virgen de la Victoria Hospital (Málaga, Spain).
There have been some attempts made to construct OWL models using biomedical classifications like SNOMED CT and ICD-10. For ICD, the work developed by  was the first attempt to model the ICD-9 ontology, an older version of the current ICD-10. In , the authors proposed the first formal representation of the ICD-10 based on three logical layers of the GALEN Core Reference Model (CRM) terminology system . They used a description logic-like language called GRAIL  which allows classes to be inferred with the semantics of role propagation and links a more detailed description of a diagnosis to a more abstract class. The ICD-10 ontology presented in  contains only ICD-10 categories and their definitions. The hierarchical relationship of the ICD-10 is not represented and the ICD-10 category definitions are limited to three concepts defined by a multi-axial conceptual system that includes the anatomy, the morphology and the etiology. However, it has to be said that the methodology adopted by the authors to formalise the ICD-10 has some limitations: first, only two ICD-10 chapters are represented; second, not all the ICD terms are represented using GALEN and finally, the ontology was not loaded into an OWL reasoner and therefore, the formal consistency was neither checked nor classified. Given these problems, the authors presented a DOLCE-based formal representation . DOLCE is a descriptive upper-level ontology designed for ontology cleaning and interoperability. In this formal representation of the ICD-10, anatomical entities were taken from the Foundational Model of Anatomy (FMA) , morphological abnormalities and procedures were taken from SNOMED CT, the organisms used were from the biological taxonomy and the chemical objects were taken from the International Union of Pure and Applied Chemistry nomenclature (IUPAC). Despite these improvements over the previous version of the GALEN-based ICD-10 representation, some problems have yet to be solved. For example, not elsewhere classified diseases are modeled as logical exclusions of elsewhere classified ICD categories from the appropriate parent concepts. This solution does not provide any information for a system which aims to automatically classify a patient’s disease d. The doctor should assert or the system should infer that d is an instance of the negation of a class. Due to the OWA (Open World Assumption) semantics of OWL, if d is not an instance of class C, the reasoner cannot infer that d is an instance of ¬C. Furthermore, the ontology has not been checked or classified by a reasoner.
The last approach to represent ICD-10 in OWL was developed in . In this study, an ontology was created based on two super-classes, the icd10:Entry and the icd10:Modifier which contain ICD-10 codes from the WHO and the German Institute for Medical Documentation and Information (German: Deutsches Institut für Medizinische Dokumentation und Information) , respectively. The general structure of the ICD-10 ontology includes Chapters I to XXI; classes are represented by an URL which consists of a name space and the ICD-10 code name, and their relations are established with owl:subClassOf axioms. The ICD-10 exclusions are handled with the owl:disjointWith axiom. This approach does not provide information as to in which class the diseases should be classified. If the same disease is classified in both classes, the reasoner infers that the ontology is inconsistent, but is unable to distinguish the correct class. Furthermore, to solve the problem of exclusions that are shared with multiple exclusions, the authors proposed the inclusion of icd10:hasExcludes that links to a icd10:ICDdescription (with a rdf:type and rdf:label predicates) which has an icd10:concernsClass property. As icd10:concernClass can involve other ICD-10 categories, the ontology requires an OWL-full expressivity. The inclusions are modelled in the same way as the exclusions. The OWL-full properties presented in the ontology invalidate it for use by reasoners and thus, it is not possible to check the ontology’s consistency or classify it.
According to the literature review, there have been several attempts to model ICD-10 in OWL. However, these studies have various weaknesses which can be summarised as follows: 1) some difficulties in correctly handling inclusions and exclusions in an OWL representation of ICD-10; 2) the lack of a validation process using an OWL reasoner to check the consistency of the ontology. This step is very important in ontology development and testing  because an ontology can be used by OWL reasoners without human supervision. If an ontology is inconsistent, the reasoning may lead to erroneous conclusions; 3) no application of these OWL representations to real clinical use cases to show how they can support a clinician in decision making and 4) the reviewed work uses ICD-9 and/or ICD-10. In this paper, we have worked with ICD-10-CM. Although the intention is to replace ICD-9-CM with ICD-10-CM, it has been reported in  that for reasons such as the complexity of ICD-10-CM and the costs of migrating from one system to the other would explain why the ICD-9-CM version is still in use . The Center for Disease Control and Prevention (CDC) encourages the use of the ICD-10-CM version because of the improvements that it has over ICD-9-CM and ICD-10 . These improvements include the addition of information relevant to ambulatory and managed care encounters; expanded injury codes; the creation of combination diagnosis and symptom codes to reduce the number of codes needed to fully describe a condition; the addition of sixth and seventh characters; incorporation of common fourth and fifth digit subclassifications; laterality and greater specificity in code assignment .
Formalising the ICD-10-CM categories in OWL
For the construction of Dione, we focused on three basic issues that are critical when modelling an ontology or an OWL representation, and are specified in the “Ontology 101 development process” methodology . First, a selection of the concepts used to cover the objectives to be accomplished in the health domain; second, the organisation of all concepts in a hierarchy and third, a semantic formalisation of these concepts using a knowledge representation language such as the description logic (DL) formalism.
In order to select the terms for each concept, an XML file containing the ICD-10-CM categories in the English version was downloaded from the Centers for Disease Control and Prevention (CDC) website that stores all the ICD versions . ICD-10-CM consists of “chapters” that are sub-divided into homogeneous blocks of three-character categories (a capital letter and two arabic numerals). These categories are sub-divided by means of four-character categories (a capital letter and three arabic numerals) and these are further divided into five-character categories. The file with the ICD-10-CM categories was parsed to output a tree with parent and child nodes (Additional file 1). The upper-level and lower-level nodes of the tree generated from the XML file correspond to the ICD-10-CM upper and lower levels, which involve blocks of three-, four- and five-character categories. For the semantic formalisation, the hierarchy tree from the output file, which includes Chapters I to XIV, was encoded in OWL using the OWL API library . Using the information from ICD-10-CM about blocks and their categories, the OWL hierarchy was modelled establishing the Diseases category as super-class because Chapters I to XIV, are related to diseases. All Dione classes are identified by an URI, which consists of a namespace and a special term which corresponds to the name of each ICD-10-CM category code. The Diseases category includes the following ICD-10-CM categories: A00-B99 (Certain infectious and parasitic diseases), C00-D49 (Neoplasms), D50-D89 (Diseases of the blood and blood-forming organs and certain disorders involving the immune mechanism), E00-E89 (Endocrine, nutritional and metabolic diseases), F01-F99 (Mental, Behavioural and Neurodevelopmental disorders), G00-G99 (Diseases of the nervous system), H00-H59 (Diseases of the eye and adnexa), H60-H95 (Diseases of the ear and mastoid process), I00-I99 (Diseases of the circulatory system), J00-J99 (Diseases of the respiratory system), K00-K95 (Diseases of the digestive system), L00-L99 (Diseases of the skin and subcutaneous tissue), M00-M99 (Diseases of the musculoskeletal system and connective tissue) and N00-N99 (Diseases of the genitourinary system).
Inclusion of the OWL axioms
SNOMED relationships examples. Examples of SNOMED CT concept related to SNOMED CT through causative-agent and associated-morphology relationships
SNOMED CT concept
SNOMED CT relationship
SNOMED CT concept
Cholera-non-01 group vibrio, disorder
Vibrio cholerae, non-O1, an organism
Bullous pyoderma, disorder
Chronic superficial ulcer, a morphologic abnormality
Once the Dione properties had been identified, the relationships between two SNOMED CT concepts (one of which maps to the ICD-10-CM code included in Dione ICD-10-CM hierarchy) were extracted and modelled by means of the owl:equivalentClass restriction of Dione classes and owl:intersectionOf statement in order to model the ICD-10-CM inclusion terms. The ICD-10-CM category was defined with the following properties: the SNOMED CT relationship (the object property in Dione), the owl:someValuesFrom restriction and the SNOMED CT concept. In order to illustrate this, a good use case is the I10 (Essential primary hypertension) ICD-10-CM category:
“Hypertensive episode (disorder)” (62275004)
“Complication of systemic hypertensive disorder (disorder)” (449759005)
I10 ≡∃ hasDefinitionalManifestation.24184005 ⊓ ∃ associatedWith.38341003
SNOMED CT relationships that were used to define the Dione classes
Dione object property
SNOMED CT relationship
Number of uses in Dione
The part of the body affected by a condition
Represents a sequence of events where a clinical finding occurs after another
Represents a clinically relevant association between concepts without either asserting or excluding a causal or sequential relationship between the two
Identifies the direct causative agent of a disease (e.g., an organism)
Relates a clinical finding directly to a cause such as another clinical finding or a procedure
Links concepts in the situation with explicit context hierarchy to their related clinical finding
Specifies the morphologic changes seen at the tissue or cellular level that are characteristic features of a disease
Has definitional manifestation
Links disorders to the manifestations (observations) that define them
Refers to a specific period of life during which a condition first presents
Provides information about the underlying pathological process for a disorder, but only when the results of that process are not structural and cannot be represented by the associated morphology relationship
Refers to the entity being evaluated or interpreted, when an evaluation, interpretation or judgment is intrinsic to the meaning of a concept
Represents a sequence of events where a clinical finding occurs after another clinical finding or a procedure
Completion with BioPortal mappings
Once Dione had been developed, we have determined the number of classes defined with the relationships from the SNOMED CT/ICD-10-CM mappings provided by UMLS (Additional file 5). Those classes that did not involve either an inherited or non-inherited axiom (in the case of a superclass of the ICD-10-CM disease branch of Dione) were defined using the SNOMED CT/ICD-10-CM mappings from the BioPortal website . To do this, the new mappings between the SNOMED CT concepts and the ICD-10-CM categories were extracted from the BioPortal API  and inferred to avoid duplicate SNOMED CT/ICD-10-CM mappings from NIH. Following the methodology described in the previous subsections, the new mappings were stored in the database. The new OWL statements for the inclusion terms of the ICD-10-CM categories without any axioms were generated and included in Dione. Thus, we obtained the most complete Dione version possible with the resources available, given that some classes were not defined (statistics are presented in the “Results” section).
Completing the definition of Dione classes
As we have mentioned, we could not find SNOMED CT/ICD-10-CM mappings for all ICD-10-CM categories. This means that we could not include OWL statements that model inclusion terms, for all Dione classes. In some cases where we did find a mapping, the SNOMED CT concept which the ICD-10-CM category was mapped to did not have relationships that could be translated into OWL statements.
K58 (Irritable bowel syndrome) ≡∃ affects.113276009 (Intestinal structure)
E73_0 (Congenital lactase deficiency) ≡∃ affects.113276009 (Intestinal structure) ⊓ ∃ hasOccurrence.255399007 (Congenital)
K58 (Irritable bowel syndrome) ≡
∃affects.113276009 (Intestinal structure) ⊓
∃affects.71854001 (Colon structure)
With this new definition E73_0 is not a subclass of K58.
Dione consistency and classification
For Dione classification, we used the ELK reasoner with the OWL API (Additional file 8). After some attempts to apply Dione classification with reasoner systems such as Fact++ , Hermit , Pellet , TrOWL , RacerPro  and CEL , it was found that the ELK reasoner  was the only reasoner able to classify Dione while simultaneously checking that Dione was consistent. Fact++, Pellet, RacerPro and CEL failed due to an out-of-memory error (heap space set to 12 GB). TrOWL and Hermit failed due to a timeout after 48 h. The experiments were performed on a PC Intel(R) Core (TM) i7-2600 CPU with 3.39 GHz and 16 GB of RAM and took 2781 s.
Results and discussion
Level of completion of Dione
Dione has been built based on the ICD-10-CM terms (2014 release) provided by the CDC  and SNOMED CT terms from UMLS (March 2013 release). Dione contains 391,669 classes, 391,720 entity annotation axioms and 19,797 owl:equivalentClass axioms which were constructed with 104,646 relationships extracted from the SNOMED CT/ICD-10-CM and Bioportal mappings and included in Dione using the owl:intersectionOf and the owl:someValuesFrom constructs.
Validation of Dione axioms
The Dione axioms have been included using the SNOMED CT/ICD-10-CM mappings, which has been constructed by a collaborative community of trained terminology specialists (closely following the methodology of SNOMED CT to ICD-10 Crossmap project). These final mappings are published only if they have been established as identical by a group of experts and pass a final review. In the case that SNOMED CT/ICD-10-CM mappings are not available for certain ICD-10-CM category, BioPortal provides SNOMED CT/ICD-10-CM mappings that have been previously inferred by the BioPortal algorithm and/or included (and validated) by the BioPortal user community [33, 34]. The SNOMED CT/ICD-10-CM mappings that are equal to the UMLS mappings have been removed to avoid duplicate ICD-10-CM inclusions.
As the percentage of Dione classes with axioms is 93,3 %, it is worth noting that we could have manually completed the mappings to generate the axioms for defining those classes which do not have any axiom, either inherited from parent classes or defined. However, we prefer to release the first version of Dione using only those mappings that are available and widely accepted by the scientific community. We have called this current version Dione V0.933. As new mappings are created, new axioms will be used to complete the current version of Dione.
Applicability of Dione in clinical use cases
As Dione is logically consistent, we have used it together with the ELK reasoner to classify clinical records. Clinical record information is codified by means of the Dione object property assertions which use SNOMED CT concepts. The objective of these use cases is to show how Dione can assist health specialists by providing ICD-10-CM classified information. We have chosen three clinical records from the Virgen de la Victoria Hospital (Málaga, Spain).
Advantages of formalising the ICD-10-CM categories in OWL
There are two types of SNOMED CT/ICD-10-CM mappings: one-to-one and one-to-many mappings. This means that not every SNOMED CT concept can be mapped to only one ICD-10-CM categories with an identical meaning. Rather it can be mapped to more than one ICD-10-CM categories with several meanings.
This approach does not allow direct reasoning based on hierarchy relationship of ICD-10-CM categories established by owl:subClassOf axiom, the axioms included in owl:equivalentClass axioms to define the ICD-10-CM categories and the type of object properties that are established in the OWL model.
For these reasons, we have built an OWL hierarchy with ICD-10-CM categories and used the SNOMED CT/ICD-10-CM mappings to model the ICD-10-CM inclusion terms. According to the applicability of Dione to real clinical use cases demonstrated in the Results section, this approach provides users with a direct OWL reasoning over a set of instances from one or more actual problems (Electronic Health Records) proposed by the physician to infer new relationships and provide a new approach to the classification problem.
Comparison with other OWL ICD models
The representation from a clinical terminology to an OWL model can cause semantic inconsistencies. According to the reviewed literature (Background section), there is a lack of consistency checking in the proposed ICD-10-CM formal representations and therefore, ABox and TBox classifications3 have not been done. In this approach, Dione has been validated by the ELK reasoner, which was found to be the only reasoner able to classify it, after testing all reasoners that have been successful in classifying large and widely-used real-world ontologies like SNOMED CT . The ELK reasoner returns that Dione is consistent and performs TBox and ABox classifications. In order to carry out the ABox classification of Dione, a set of instances from clinical use cases taken from the Virgen de la Victoria Hospital (Málaga, Spain) has been included in Dione and classified to ICD-10-CM categories as is fully explained in the Results section.
In the Background section, we highlighted some limitations of existing work in the literature. In this section, we discuss the formal representation in OWL of ICD-10-CM proposed by  given that it is an ICD-10-CM representation that has improved upon other approaches. In this approach, the authors created an ICD-10-CM hierarchy with owl:subClassOf handling the ICD-10-CM exclusions with owl:disjointWith axioms. We consider that such an approach is limited given that only using owl:disjointWith could result in a lack of information, because if the same diseases is classified in two disjoint classes, the reasoner infers that the ontology is inconsistent, being unable to distinguish the correct class. Furthermore, the exclusions that are shared with multiple exclusions are modelled with OWL-Full, making it impossible to validate and classify the ontology with a reasoner. The inclusion terms are modelled in the same way as the exclusions. In our case, the semantic Disease hierarchy of Dione is constructed with owl:subClassOf axioms (using the ICD-10-CM concepts which have not been used in other approaches in the literature) and the inclusion terms are modelled from the information extracted from SNOMED CT/ICD-10-CM mappings. The approach adopted solves the problem of integrating features from OWL-Full. As mentioned, in the case of dealing with the exclusions, the mappings include an exhaustive mapping of the low-level descendants of those SNOMED CT concepts that could lead to a different ICD-10-CM category given ICD-10-CM exclusions and other rules.
This paper has presented the implementation process of Dione, an OWL representation of ICD-10-CM, which uses SNOMED CT/ICD-10-CM mappings to formalise the ICD-10-CM diseases categories and their inclusion terms. The main hypothesis guiding us is: (H1) It is possible to code, as OWL axioms, the ICD-10-CM inclusion terms obtained from SNOMED CT/ICD-10-CM mappings and use these OWL axioms to build an OWL representation of the ICD-10-CM diseases. Therefore, we have used an automatic process to build a hierarchy tree with ICD-10-CM disease categories and their axioms using the owl:equivalentClass axiom. The main objective of our approach has been to build a model that can be used by a reasoner. Therefore, we have also shown that Dione is consistent and a TBox classification has been carried out by the ELK reasoner. It is worth noting that the automatisation of the ICD-10-CM disease categories is important given that new mappings are continuously being added to complete Dione, whose initial version is released with this paper. In its current version, we have not been able to find validated SNOMED CT/ICD-10-CM mappings for each ICD-10-CM category and so did not have correct results in several cases. Therefore, our first objective is to complete the definition of all classes. We plan to add more mappings from other ontologies which will relate Dione with more biological information from different areas that involve personalised medicine. BioPortal provides ontologies mapped to ICD-10-CM categories such as the National Drug Data File (with 2,857 ICD-10-CM mappings) , OMIM (with 3,921 ICD-10-CM mappings) , the Human Phenotype Ontology (with 1,370 ICD-10-CM mappings)  and the Regulation of Transcription Ontology (with 61 ICD-10-CM mappings) [41, 42]. Using the approach presented in this paper, the mappings can be extracted from BioPortal and stored in a database. The axioms to be included in Dione using owl:equivalentClass can be defined with an “affect” object property or also with the axioms that are defined in the ICD-10-CM mapped class of the target ontology. It may also be possible to obtain information from an expert (i.e. doctor) to complete the definition of some classes.
As a secondary hypothesis, we (H2) obtain a useful OWL representation which can be used as the basis for a semantic classification system. This enables a set of SNOMED CT concepts and their relationships, taken from EHRs, as input to automatically classify patients’ diseases into an ICD-10-CM category. This has been tested by including object property assertions in Dione from the Virgen de la Victoria Hospital’s (Málaga, Spain) clinical records which have been classified into ICD-10-CM categories showing the applicability of the OWL representation. After completing Dione, we plan to measure the accuracy of the classification and to study new ways to improve it. As the development of Dione is ongoing, further work will include looking at how Dione reasoning can assist experts in providing classified information for ICD-10-CM disease categories from actual use cases of patient records. Dione should be supported in the future by a semi-automatic EHR annotation tool, probably based on text mining and natural language processing techniques. Our intention is to use an end-user interface where the information processed can be displayed in such a way as to make it easily understandable to specialists in the field. This application could provide functionalities which allow users to make specific OWL queries such as: “Find a disease which affects a body structure (finding site) like “Systematic circulatory system structure (body structure)” AND hasDefinitionManifestation “Finding of increase blood pressure (finding)”. These kinds of queries can be used to retrieve an ICD-10-CM disease category that has the same definition in Dione and also, to automatically generate a list of possible ICD-10-CM categories in which the instances from real clinical records can be classified. Finally, we will study how Dione could be combined with other techniques such as SWRL (Semantic Web Rule Language) rules  and probabilistic databases in order to develop a diagnostic assistance tool.
1Dione is available at http://www.khaos.uma.es/dione
2According to the semantics of OWL, this represents an anonymous class. It has an object property hasDefinitionalManifestation. At least one value for hasDefinitionalManifestation must be an instance of 24184005 “Finding of increased blood pressure (finding).
3TBox and ABox are known as terminological and assertion components, respectively. TBox statements describe the set of Dione classes and properties and ABox statements are used to describe the instances associated with those classes and properties.
Center for disease control and prevention
Core reference model
Electronic health record
Foundational model of anatomy
International classification of diseases, tenth revision
International classification of diseases, tenth revision, clinical modification
International union of pure and applied chemistry
National institutes of health OWL: Web ontology language
- SNOMED CT:
Systematized nomenclature of medicine - clinical terms
Unified medical language system
World health organisation
The authors would like to thank the Admissions and Medical Documentation Service at the Virgen de la Victoria Hospital (Málaga, Spain) for providing us with the necessary clinical reports to demonstrate the viability of Dione.
This work was partially funded by grants TIN2014-58304-R (Ministerio de Economía y Competitividad), P11-TIC-7529 and P12-TIC-1519 (Plan Andaluz de Investigación, Desarrollo e Innovación).
MMRG designed the methodology for creating the Dione inclusion terms as OWL axioms from the SNOMED CT/ICD-10-CM mappings, implemented the relational database containing the SNOMED CT concepts and relationships and designed the algorithms for implementing Dione. MJGG contributed to the design of the algorithms, implemented the algorithms and the OWL representation and validated Dione. MMRG and MJGG designed and implemented the use cases for testing the applicability of Dione and wrote the manuscript. Finally, JFAM is the director of the research group. He devised and supervised the work and collaborated in writing the manuscript. All authors discussed, read and approved both Dione and the manuscript.
The authors declare that they have no competing interests.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- International Classification of Diseases (ICD), 10th Revision, Clinical Modification website. http://www.cdc.gov/nchs/icd/icd10cm.htm. Accessed 28 July 2015.
- OWL Web Ontology Language. https://www.w3.org/TR/owl-features/. Accessed 18 July 2016.
- Sirin E, Parsia B, Cuenca Grau B, Kalyanpur A, Katz Y. Pellet: A Practical OWL-DL Reasoner. Web Semantics. 2007; 5(2):51–3.View ArticleGoogle Scholar
- Kazakov Y, Krötzsch M, Simancik F. ELK Reasoner: Architecture and Evaluation, In: Horrocks I, Yatskevich M, Jiménez-Ruiz E, editors. Proceedings of the 1st International Workshop on OWL Reasoner Evaluation (ORE-2012). CEUR Workshop Proceedings. Manchester, UK: CEUR-WS.org: 2012.Google Scholar
- Motik B, Studer R. KAON2–A Scalable Reasoning Tool for the Semantic Web. In: Proceedings of the 2nd European Semantic Web Conference (ESWC’05). Berlin: Springer-Verlag: 2005.Google Scholar
- Haarslev V, Möller R. RACER System Description. In: Proceedings of the First International Joint Conference on Automated Reasoning. IJCAR ’01. London, UK: Springer: 2001. p. 701–6.Google Scholar
- SNOMED CT. https://www.nlm.nih.gov/healthit/snomedct/. Accessed 18 July 2016.
- SNOMED CT to ICD-10-CM map. https://www.nlm.nih.gov/research/umls/mapping_projects/snomedct_to_icd10cm.html. Accessed 27 Sept 2016.
- Bioportal Website. http://bioportal.bioontology.org/. Accessed 28 July 2015.
- Carly Uzkuraitis KH, Torney B. Casemix funding optimisation: working together to make the most of every episode. Health Inf Manag J. 2010; 39(3):47–9.Google Scholar
- Congress US. Medicare’s Prospective Payment System: Strategies for Evaluating Cost, Quality, and Medical Technology : Summary. United States: Congress of the U.S., Office of Technology Assessment; 1985. https://books.google.es/books?id=SlnYoQEACAAJ. Accessed 28 July 2016.Google Scholar
- O’Malley KJ, Cook KF, Price MD, Wildes KR, Hurdle JF, Ashton CM. Measuring diagnoses: Icd code accuracy. Health Serv Res. 2005; 40(5p2):1620–1639.View ArticleGoogle Scholar
- Möller M, Mukherjee S. Context-Driven Ontological Annotations in DICOM Images - Towards Semantic Pacs In: Azevedo L, Londral AR, editors. Proceedings of the Second International Conference on Health Informatics, HEALTHINF. Porto, Portugal: Press, INSTICC. p. 294–299.
- Héja G, Surján G, Lukácsy G, Pallinger P, Gergely M. GALEN based formal representation of ICD10. I J Medical Inf. 2007; 76(2-3):118–23.View ArticleGoogle Scholar
- Rector AL, Bechhofer S, Goble CA, Horrocks I, Nowlan WA, Solomon WD. The GRAIL concept modelling language for medical terminology. Artif Intell Med. 1997; 9(2):139–71.View ArticleGoogle Scholar
- Rector AL, Bechhofer S, Goble CA, Horrocks I, Nowlan WA, Solomon WD. The GRAIL concept modelling language for medical terminology. Artif Intell Med. 1997; 9(2):139–71.View ArticleGoogle Scholar
- Héja G, Varga P, Surján G. Design principles of DOLCE-based formal representation of ICD10 In: Andersen SK, Klein GO, Schulz S, Aarts J, editors. MIE. Studies in Health Technology and Informatics. Göteborg, Sweden: IOS Press: 2008. p. 821–6.Google Scholar
- Golbreich C, Zhang S, Bodenreider O. The foundational model of anatomy in owl: Experience and perspectives. Web Semant. 2006; 4(3):181–95.View ArticleGoogle Scholar
- Möller M, Sonntag D, Ernst P. Modeling the International Classification of Diseases (ICD-10) in OWL. In: Knowledge Discovery, Knowledge Engineering and Knowledge Management. Communications in Computer and Information Science. Berlin: Springer: 2013. p. 226–40.Google Scholar
- DIMDI (Deutsches Institut für Medizinische Dokumentation und Information). http://www.dimdi.de. Accessed 28 July 2015.
- Baclawski K, Kokar MM, Waldinger R, Kogut PA. Consistency Checking of Semantic Web Ontologies. In: ISWC 2002. LNCS 2342. Sardinia, Italia: Springer: 2002.Google Scholar
- Krive J, Patel M, Gehm L, Mackey M, Kulstad E, Li J, Lussier Y, Boyd A. The complexity and challenges of the international classification of diseases, ninth revision, clinical modification to international classification of diseases, 10th revision, clinical modification transition in eds. Am J Emerg Med. 2014; 33(5):713–8.View ArticleGoogle Scholar
- CDC-ICD-10-CM website. http://www.cdc.gov/nchs/icd/icd10cm.htm. Accessed 28 July 2015.
- Noy NF, Mcguinness DL. (Hrsg.): Ontology development 101: A guide to creating your first ontology. Technical report. 2001.
- Centers for Disease Control and Prevention (National Center for Health Statistics). http://www.cdc.gov/nchs/icd/icd10cm.htm. Accessed 28 July 2015.
- OWL API. http://owlapi.sourceforge.net/. Accessed: 28 July 2015.
- SNOMED CT release files. http://www.nlm.nih.gov/research/umls/. Accessed: 28 July 2015.
- Bioportal API. http://data.bioontology.org/. Accessed: 28 July 2015.
- Tsarkov D, Horrocks I. FaCT++ Description Logic Reasoner: System Description. In: Proc. of the Int. Joint Conf. on Automated Reasoning (IJCAR 2006), Seattle, WA, USA, 2006. Lecture Notes in Artificial Intelligence. Berlin: Springer-Verlag: 2006. p. 292–7.Google Scholar
- Shearer R, Motik B, Horrocks I. HermiT: A Highly-Efficient OWL Reasoner,. In: Proceedings of the Fifth OWLED Workshop on OWL: Experiences and Directions, Collocated with the 7th International Semantic Web Conference (ISWC-2008). Karlsruhe, Germany: CEUR-WS.org: 2008.Google Scholar
- Thomas E, Pan JZ, Ren Y. TrOWL: Tractable OWL 2 Reasoning Infrastructure In: Aroyo L, Antoniou G, Hyvönen E, Ten Teije A, Stuckenschmidt H, Cabral L, Tudorache T, editors. Proceedings of the 7th Extended Semantic Web Conference (ESWC’10), Heraklion, Greece, May 30-June 3. Lecture Notes in Computing Science. Berlin: Springer-Verlag: 2010. p. 431–5.Google Scholar
- Mendez J, Suntisrivaraporn B. Reintroducing CEL as an OWL 2 EL Reasoner. In: Proceedings of the 22nd International Workshop on Description Logics (DL 2009), CEUR Workshop Proceedings. Oxford: CEUR-WS.org: 2009.Google Scholar
- BioPortal-Mappings. http://www.bioontology.org/wiki/index.php/BioPortal_Mappings. Accessed 28 July 2015.
- Noy N, Griffith N, Musen M. Collecting community-based mappings in an ontology repository. In: The Semantic Web - ISWC 2008. Lecture Notes in Computer Science. Berlin: Springer-Verlag: 2008. p. 371–86.Google Scholar
- Bousquet C, Sadou É, Souvignet J, Jaulent MC, Declerck G. Formalizing MedDRA to support semantic reasoning on adverse drug reaction terms. Journal of Biomedical Informatics. 2014; 49:282–91.View ArticleGoogle Scholar
- I-MAGIC mapper. https://imagic.nlm.nih.gov/imagic/code/map. Accessed: 18 July 2016.
- Dentler K, Cornet R, ten Teije A, de Keizer N. Comparison of reasoners for large ontologies in the OWL 2 EL profile. Semantic Web. 2011; 2(2):71–87.Google Scholar
- National Drug Data File. https://bioportal.bioontology.org/ontologies/NDDF. Accessed 18 July 2016.
- Hamosh A, Scott AF, Amberger JS, Bocchini CA, Mckusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005; 33(Database issue):514–7.View ArticleGoogle Scholar
- Robinson PN, Köhler S, Bauer S, Seelow D, Horn D, Mundlos S. The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am J Human Gen. 2008; 83(5):610–5.View ArticleGoogle Scholar
- Regulation transcription ontology. https://bioportal.bioontology.org/ontologies/RETO. Accessed 18 July 2016.
- Mappings between ICD-10-CM and other biomedical ontologies. https://bioportal.bioontology.org/mappings. Accessed 28 July 2015.
- Horrocks I, Patel-Schneider PF, Bechhofer S, Tsarkov D. OWL rules: A proposal and prototype implementation. Web Semant Sci Serv Agents World Wide Web. 2005; 3(1):23–40.View ArticleGoogle Scholar
- Giannangelo K. Healthcare Code Sets, Clinical Terminologies, and Classification Systems 2nd Edition. Chicago, United States: AHIMA press; 2010.Google Scholar
- SNOMED CT document library. http://ihtsdo.org/fileadmin/user_upload/doc/. Accessed 18 July 2016.