Towards an ontological representation of morbidity and mortality in Description Logics
- Filipe Santana†1Email author,
- Fred Freitas†1,
- Roberta Fernandes†1,
- Zulma Medeiros†2, 3 and
- Daniel Schober†4Email author
© Santana et al.; licensee BioMed Central Ltd. 2012
Published: 21 September 2012
Despite the high coverage of biomedical ontologies, very few sound definitions of death can be found. Nevertheless, this concept has its relevance in epidemiology, such as for data integration within mortality notification systems. We here introduce an ontological representation of the complex biological qualities and processes that inhere in organisms transitioning from life to death. We further characterize them by causal processes and their temporal borders.
Several representational difficulties were faced, mainly regarding kinds of processes with blurred or fiat borders that change their type in a continuous rather than discrete mode. Examples of such hard to grasp concepts are life, death and its relationships with injuries and diseases. We illustrate an iterative optimization of definitions within four versions of the ontology, so as to stress the typical problems encountered in representing complex biological processes. We point out possible solutions for representing concepts related to biological life cycles, preserving identity of participating individuals, i.e. for a patient in transition from life to death. This solution however required the use of extended description logics not yet supported by tools. We also focus on the interdependencies and need to change further parts if one part is changed.
The axiomatic definition of mortality we introduce allows the description of biologic processes related to the transition from healthy to diseased or injured, and up to a final death state. Exploiting such definitions embedded into descriptions of pathogen transmissions by arthropod vectors, the complete sequence of infection and disease processes can be described, starting from the inoculation of a pathogen by a vector, until the death of an individual, preserving the identity of the patient.
With the growing need to cope with large-scale biomedical data, researchers have been relying on ontologies to ensure a shared and computer-interpretable meaning of linguistic terms describing such data, fostering intelligent information integration and interoperability . Indeed, more than 250 ontologies (December 26th 2011) are available in the BioPortal ontology library .
Despite many efforts devoted to the development of genomics and metabolomics ontologies, often motivated by the prototypical Gene Ontology , few are focusing also on patient and disease centered data. This is required, for instance, in epidemiology to study the dynamics of diseases, and to define health policies for epidemiological surveillance. Morbidity databases, such as the National Morbidity Notification Information System in Brazil (SINAN) , are used as the main sources for epidemiological disease surveillance, prevention and control.
Furthermore, mortality databases describing the cause of death are of interest to the World Health Organization (WHO) enabling the production of local or global health-related statistics. In Brazil, this data is stored in the national Brazilian Mortality System (SIM) , and grouped by the primary causes of death.
If the goal is to leverage synergies resulting from querying and comparing the two databases at the same time, ontologies can play an important role by enabling a common communication channel needed to ensure semantic interoperability. Rendering the separately maintained mortality and morbidity databases accessible via a common ontology allows for synergistic data exploitation, i.e. use of contextual enrichment, consistency checks and reasoning at schema and data level .
The purpose of the current study is to ontologically formalize foundational disease processes and other lifecycle related processes as occurring in the mentioned data sources to expand the Neglected Tropical Disease Ontology (NTDO) [7, 8] and ultimately to use NTDO for integrated querying of the Brazilian mortality and morbidity databases.
Ontologies, from a formal point of view, intend to describe a consensus on the nature of entities in a given scientific domain, independently of linguistic variation of the terms used in human communication. Accordingly, formal ontologies are expressed by means of a formal semantics, like Description Logics (DL) , nowadays generally using the World Wide Web Consortium (W3C) recommended exchange syntax Web Ontology Language (OWL) .
When trying to integrate heterogeneous databases, such as SIM and SINAN, many interesting problems arose. For instance, we observed that the identifiers in both databases do not follow strict rules so as to prevent misidentification and to leverage data integration. This syntactical problem is usually addressed by algorithms that compute cumulative evidence  from other pieces of the registers to decide for a matching, i.e. comparing other data than the proper identification of an individual (e.g. mother's name, birth date, among others).
However, a more interesting semantic integration problem occurred while querying the two databases together: an individual may happen to die due to a certain disease, but instead of reporting the trigger event that ultimately lead to death, i.e. a particular disorder like Chagas disease, a secondary cause, i.e. something else related to the disease like heart attack is reported as the primary cause and stored in the database. This heart complication in a Chagas patient is a frequent secondary effect of the primary cause, the Chagas infection, which should be tracked in the databases like this in order to prevent false epidemiologic measures.
The above requirement leads us to expand NTDO with classes allowing for such granular distinctions and a sound ontological representation of death. Many subtle aspects hamper a precise definition in this case, i.e. a) the conditions in which an individual is considered dead, and b) the ontological problem of preserving identity of an individual when transitioning from a living to a dead organism in different stages of disintegration.
In order to support the integration and verification of morbidity and mortality data in the SINAN and SIM databases, we here present an ontological representation of death. As an example of iterative modeling, we outline four successive versions for representing mortality and discuss representational problems or the complexity of reasoning arising from each. We conclude by briefly describing what can be done with the mortality representation and the next steps of the NTDO project.
NTDO  leverages classes and relations provided by the upper level ontology BioTop , specializing it downwards to the required leaf node granularity. Additional classes and relations for representing time intervals and their boundaries were imported from the General Formal Ontology (GFO) [12, 13].
NTDO was based in established ontology construction guidelines , which suggested the untangling of asserted graphs into disjoint orthogonal axes, letting a DL reasoner maintain the tangled poly-hierarchy. Naming conventions provided by  were applied consistently.
Representation language and semantics
where Δ I is the domain of the interpretation.
For instance, if we want to describe all male children named with his father name as first name, the class of people who has the same name as his father can be described as: he is Human and hasFirstName ≐ hasChild ◦ hasFirstName. It means that the first name of the child must be the same as his father's name.
The mortality representation within NTDO was edited via the ontology editor Protégé v.4.1 using the embedded reasoner HermiT  for classification in most steps. Inference could not be performed over agreements since OWL2, and so HermiT, is not able to handle it.
As for the knowledge sources, apart from the literature review, other relevant sources were the morbidity and mortality systems themselves [4, 5]. At some extent we grounded our definitions on the way death cases are reported to the SIM . This system reports cases objectively by "primary cause of death" (e.g. a disease or an injury which triggered a chain of pathological events and led to death) and "other related causes of death" (e.g. other related disease of injury related to the death). The deaths are always identified by a forensic medicine service or the physician who was treating the subject of care for a disease, or injury, leading to the death.
As morbidity and mortality databases do contain homologous entries referring to the same subject of care, the primary cause of death in the mortality database can be correlated with the disease entry of the same subject of care in the morbidity database, postulating a causal relationship; i.e. the tracked primary death cause might be a secondary symptom of the progressing disease.
In this section, we describe our ontological representation of mortality. It assumes a disease/injury to be the primary cause of death and is necessary to describe both temporarily extended and instantaneous processes in health care relevant life cycle stages, starting with the transmission of a pathogen, over the disease as a pathological process, and finally ending in the process of dying. In the next subsections, we will provide DL definitions for all important parts of our mortality model, and represent complex issues encountered and how they were solved in our model.
Representing injury and death
Despite being a simplification only introduced for the easy handle of statistics, mortality registries worldwide store mainly the primary cause of death. In our ontology we follow this simplification, although extending it to accommodate multiple causes would be a quite straightforward process without additional computational costs for inferences.
At a given moment an individual organism can acquire a certain disease, e.g. dengue fever, which may cause premature death, depending on the circumstances. In medical terms, a disease cause is a function of the physiological state of the individual. There can be a causal link from a disease and its symptoms to a later death process.
Assuming all data is available, it should be possible to describe and trace the sequence of causally induced - and at times overlapping - pathologic processes which affect the life of that organism, from birth to death. Some of them may damage the organisms' overall physiological state to such an extent, that they directly initiate a process of physiological death, leading to death itself. This sequence of processes is sometimes evidenced by the records of an individual when the cause of death was previously registered in a morbidity system, i.e. the primary cause of death was already known.
Our definition of the 'Birth' process is based on the description of "live birth" provided by the Brazilian Institute of Geography and Statistics (IBGE) . It corresponds to the complete expulsion or extraction of a product generated by the maternal body after conception, which after separated from the maternal body, breathes or exhibits some other vital signs, e.g. heartbeat, voluntary muscle contraction, umbilical cord contraction, regardless of the cord being cut or not, and whether or not the placenta was expelled. Conversely, "death" as a state means absence of brain functions and cessation of all biological functions, inherent to the human body .
However, there are major difficulties related to the accurate representation of the processes that make an individual die:
Complexity is an issue, as the causal nature, which can be quite indirect at times with many unknown factors as comorbidities and interlaced parallel influences converging ultimately into a death process;
Another issue is relating sequences of processes and time, with a precise description of where and when each process took place, when it started and where its boundaries are.
Nevertheless, this exact information is probably not important at all if the aim of the proposed model is to deal with mortality data. Instead what is usually known and found in the databases is the knowledge of what is the sequence of typical signs and symptoms of a disease, because the time constraints involving them, e.g. during tuberculosis, a cough with secretion follows a pulmonary infection, can be checked in morbidity and mortality notifications. For stating a death record in a mortality notification database, viz. SIM, a physician certifies one underlying primary cause of death and sometimes one or more secondary causes. The ontology should support these two descriptions.
We also assume the notion of instantaneous processes available in BioTop as equivalent to events provided in GFO, which makes the subject of care exhibit a certain behavior which is linked, causally or not, to some processes .
Representational challenges of the mortality model
Next, we present the main challenges related to the representation and the logical axioms characterizing and solving these challenges. To allow the reader to follow our lines of reasoning we explain four successive versions of definitions for the core entities, demonstrating our iterative optimization approach and the evolution of the model to a final proposal.
The two major challenges encountered in creating a coherent representation for a mortality process were the preservation of the identity of related individuals by setting cardinalities, and the rendering of the resulting ontology in a decidable DL. Each of these items is discussed in the consecutive versions until we arrive at a satisfactory model.
Version 1: Introducing the death representation
indicating that a death event is an instantaneous process (i.e. it happens in the very moment when the person dies) in which a dead organism is a participant. It also states that there are one or more biological processes (e.g. a disease or an injury) as part of the death event and it is not temporarily extended.
This definition lacks precision regarding how to preserve identity between the living and the dead organism, as the living individual is not specified in the axiomatic description. According to the class definition, there is no guarantee that the living and the dead body are identical, since the patients of instances of DeathEvent and BiologicalDeathProcess may not be the same.
Also, the axiom expresses no cardinality constraint, which gives rise to different interpretations, such as the possibility of more than one individual dying by the same death process. Besides, subscribing to the idea that a living organism is eventually transformed into a dead one causes further representational problems. First, our imported top level, BioTop, restricts its organism hierarchy to living ones, requiring additional class expressions to refer to dead organisms (e.g. using the relation transformationOf). As a consequence, a dead human is not human any more, although possessing human organs, features, etc.
Besides losing its "humanity", identity is lost too, since any classification of living beings is rigid, i.e., once an individual is an instance of a rigid class, then it ceases to be an instance only when it does not exist anymore . Even if we assume that this description corresponds to a phased sortal , i.e. an entity which changes phases (from "living" to "dead"), it is not clear unless identity is be preserved, i.e. whether the ashes of a dead organism should be identified with the dead person.
Version 2: Representing death in the temporal axis
For the sake of clarity, we show here the definitions of Chronoid and its time boundaries in
When there are chronoids in sequence, the right time limit of a preceding process must be contiguous with the left of the subsequent one; the overlap representing the beginning of a new chronoid and the end of the previous.
Following the GFO and BioTop perspectives, LivingOrganism is represented as a MaterialObject. In order to define that a LivingOrganism can die, we need to specify that its existence is delimited, which is described in GFO.
stating that one living organism in exists in only one time interval (its lifespan).
Aditionally, processes are projected (gfo: projectsTo) to Chronoids, i.e., they exist in the time interval represented by a Chronoid . Establishing correspondences between GFO and BioTop to avoid mismatches in NTDO, the class gfo:Process must be mapped to the class biotop:Process.
On the one hand, the ontological problems with the existence of DeadOrganisms are solved, including the identity problem, as instances of LivingOrganism are formed at a certain time point (gfo:LeftTimeBoundary) and destroyed in another (gfo:RightTimeBoundary). On the other hand, by definition the relationship biotop:hasPatient allows more than one element in the range, which can lead to the erroneous interpretation that a process of death by injury or disease happen to several people simultaneously.
Moreover, it still contains three further identity problems: (a) The one between the DeathEvent and the BiologicalDeathProcess patients; (b) the set of definitions stated up to that point neither includes the moment of death nor synchronizes it with the end of the BiologicalDeathProcess that led to it; and (c) the same applies to the dying LivingOrganism, whose RightTimeBoundary should coincide with both the DeathEvent and the end of the BiologicalDeathProcess converging into it. Indeed, DeathEvent is exactly the last temporal part of a BiologicalDeathProcess; this is also an issue of coherence since the opposite (a BiologicalDeathProcess being part of a DeathEvent) would mean that an instantaneous process would have as part a process related to a time interval.
Version 3: Introducing the agreement operator to enable identity
The representation of instantaneous processes allows us to render the RightTimeBoundary of a BiologicalDeathProcess synchronous with the DeathEvent. For this purpose, the class iotop:InstantaneousProcess was used, as being a process that happens at the end of a preceding process, so as to form a process sequence, connecting the end of one process with the beginning of the next one, using the DL agreement operator (≐).
This operator is used in chains of properties to indicate that the instances to be described are connected. It is worth stressing, the difference between the two operators, ≐ and =. The former represents a coincidence in the value of two properties, or, in other words, a reference to a very same object, while the latter defines a formation rule for a property, which is usually based on property chains  as in the case above. In our ontology, we need, for instance, to establish that a certain process ends exactly when another starts; this is denoted by an agreement.
Despite not being the focus of the current work, which is about deaths caused by diseases, it is necessary to distinguish pathological processes, structures, and dispositions . Disorders are caused by an accident, a lesion, or a fracture and can lead to a disease. Thus, disorders follow injuries.
It describes which deceased organism is its patient, and which process is the primary cause of death. The agreement conditions are the more important ones. They ensure that the death occurs exactly when the BiologicalDeathProcess is finished (ntdo:hasInstant ≐ ntdo:precededBy o gfo:hasRightTimeBoundary) and that a deceased person is the same who participated in the injury event that led to the death, thus retaining the identity of the subject of care (the last condition).
This axiom addresses the processes that occur prior to the death process and after an injury or disease. As for the representation of participants (also described in DeathEvent), there is a need to identify the existence of one or more processes, even imperceptible or indirectly related. From the epidemiological point of view, these can only be completely defined a posteriori, since a previous cause (illness/injury) can only be linked to the primary cause of death in a post mortem analysis (by autopsy, for instance) or the statement of a physician who was taking care until the time of death. In the present ontology, from the axioms so far described, it is possible to assume a causal sequence of facts for an organism: illness/injury → biological death process → death.
The axioms formulated up to now mention only causal relationships (e.g. InjuryProcess or DeathEvent). However, this notion of causality, which is necessary for the representation, is based on the observer of the process, i.e. the physician who certified the cause of death. Taking as an example a death record in a mortality notification database, viz. SIM, a physician certifies the underlying primary cause of death and sometimes secondary ones.
In this ontology, this fact is supported by ntdo:BiologicalDeathProcess, because this class allows for the inclusion of more than one cause, and may be extended in the ntdo:DeathEvent, since we are only taking the primary cause into account here, the defining cause of death (which may not be the real one).
The presented model solved the identity problem; nevertheless a hidden problem not related to the representation but to the reasoning still remains: if agreements are not built over property chains of functional properties, then inference becomes undecidable .
Another subtle aspect is that biological death processes may occur due to injury and unknown causes, apart from diseases.
Version 4: Ensuring identity with transitive object properties
This definition has the advantage of stressing explicitly the fact that the dead patient and the participant of a BiologicalDeathProcess, of two consecutive and linked processes, has the same identity.
Since no ontology on mortality is available, we will compare our work with efforts that discuss mortality epistemologically. Although a related work about an ontology of death by Thomasma  enlists related terms and provides some connections among them, it does not provide a sound or formal definition for death. The approach to model a death event as subclass of an instantaneous process, which is applied here, is also present in . For him, death can only possibly be identified by another person.
Currently, we are elaborating use cases that match morbidity and mortality databases. The ontology is being used for checking whether the notified data is correct against the constraints imposed by the complex axioms (such as impossibility of a certain disease occur in some areas) and rectifying wrong data (such as symptoms of a disease mistakenly considered as primary causes of death instead of the disease itself).
In the current work, we represented complex processes, characterized by temporal marks, causality, identity preservation of the attending individuals within and the context of an objective and explicit representation of organisms transitioning from life to death. Several representational difficulties were faced, mainly regarding to the complexity of the represented entities.
Our iteratively optimized models, exemplified here by four versions of the ontology, aim at stressing the typical problems - such as preserving identity, asserting correct cardinalities and agreements among relations - encountered in representing complex biological events in description logic, as well as pointing out solutions.
The NTDO in its current status allows the description of the processes related to diseases and injuries, including their evolution that ultimately can lead to death. Using it together with other parts of NTDO, as the description of pathogen transmission by arthropod vectors, the sequence of processes can be identified, starting from the inoculation of a pathogen by a vector, until the death of an individual.
Unfortunately, the capabilities of reasoners could not be exploited, in the case the representation language they handle and the one chose for NTDO (i.e. OWL2) does not allow the usage of agreements.
Therefore, the ontology, with the current addition of mortality related contents, may serve many different purposes, such as supporting tutor systems, serving as shared vocabulary in data integration solutions, among others. The usage in data integration solutions seems to be promising, as mortality and morbidity databases contain erroneous and/or incomplete entries.
This work was sponsored by the German DFG grant JA 1904/2-1, DFG SCHU 2515/1-1 GoodOD (Good Ontology Design) and German Ministry of Education and Research (BMBF)-IB mobility project BRA 09/006. Publication was supported by the German Research Foundation (DFG) within the funding programme Open Access Publishing. We thank Stefan Schulz, Medical University of Graz, for critically reviewing the manuscript.
This article has been published as part of Journal of Biomedical Semantics Volume 3 Supplement 2, 2012: Proceedings of Ontologies in Biomedicine and Life Sciences (OBML 2011). The full contents of the supplement are available online at http://www.jbiomedsem.com/supplements/3/S2.
- Baader F, Calvanese D, Mcguinness DL, Nardi D, Patel-Schneider P: The Description Logics Handbook. 2007, Cambridge: Cambridge University Press, 624-2View ArticleGoogle Scholar
- Noy NF, Shah NH, Whetzel PL, Dai B, Dorf M, Griffith N, Jonquet C, Rubin DL, Storey M-A, Chute CG, Musen M: BioPortal: ontologies and integrated data resources at the click of a mouse. Nucleic Acids Res. 2009, 37: W170-3. 10.1093/nar/gkp440.View ArticleGoogle Scholar
- The Gene Ontology Consortium: Gene Ontology : tool for the unification of biology. Nature Genetics. 2000, 25: 25-29. 10.1038/75556.View ArticleGoogle Scholar
- Sistema de Informação de Agravos de Notificação (SINAN). [http://portal.saude.gov.br/portal/saude/visualizar_texto.cfm?idtxt=21383]
- Sistema de Informação de Mortalidade (SIM). [http://portal.saude.gov.br/portal/saude/visualizar_texto.cfm?idtxt=21377]
- Bodenreider O, Mitchell JA, McCray AT: Biomedical ontologies. Pacific Symposium on Biocomputing. 2005, 78: 76-8.Google Scholar
- Santana F, Schober D, Medeiros Z, Freitas F, Schulz S: Ontology patterns for tabular representations of biomedical knowledge on neglected tropical diseases. Bioinformatics. 2011, 27: i349-i356. 10.1093/bioinformatics/btr226.View ArticleGoogle Scholar
- Neglected Tropical Disease Ontology. [http://www.cin.ufpe.br/~ntdo/]
- OWL 2 Web Ontology: Language Document Overview. [http://www.w3.org/TR/2009/REC-owl2-overview-20091027/]
- Camargo KR, Coeli CM: Reclink: an application for database linkage implementing the probabilistic record linkage method. Cadernos de saúde pública/Ministério da Saúde, Fundação Oswaldo Cruz, Escola Nacional de Saúde Pública. 2000, 16: 439-47.View ArticleGoogle Scholar
- Beisswanger E, Schulz S, Stenzhorn H, Hahn U: BIOTOP : An Upper Domain Ontology for the Life Sciences. 2008, Applied Ontology, 3: 205-212.Google Scholar
- Heller B, Herre H: Ontological Categories in GOL. 2004, Axiomathes, 14: 57-76.Google Scholar
- Herre H, Heller B, Burek P, Hoehndorf R, Loebe F, Michalek H: General Formal Ontology (GFO): A Foundational Ontology Integrating Objects and Processes. Part I: Basic Principles. 2007, Leipzig, 85-Google Scholar
- Rector AL: Modularisation of domain ontologies implemented in description logics and related formalisms including OWL. Proceedings of the International Conference on Knowledge Capture - KCAP'03. New York, NY, USA: ACM Press. 2003, 121-View ArticleGoogle Scholar
- Schober D, Smith B, Lewis SE, Kusnierczyk W, Lomax J, Mungall C, Taylor CF, Rocca-Serra P, Sansone S-A: Survey-based naming conventions for use in OBO Foundry ontology development. BMC Bioinformatics. 2009, 10: 125-10.1186/1471-2105-10-125.View ArticleGoogle Scholar
- Motik B, Shearer R, Horrocks I: Hypertableau Reasoning for Description Logics. Journal of Artificial Intelligence Research. 2009, 36: 165-228.MATHMathSciNetGoogle Scholar
- Nascido Vivo. [http://www.ibge.gov.br/home/estatistica/populacao/registrocivil/nascido_vivo.shtm]
- Miller FG, Truog RD: Decapitation and the definition of death. Journal of Medical Ethics. 2010, 36: 632-4. 10.1136/jme.2009.035196.View ArticleGoogle Scholar
- Guarino N, Welty C: A Formal Ontology of Properties. Proceedings of EKAW-2000: The 12th International Conference on Knowledge Engineering and Knowledge Management. Edited by: Dieng R, Corby O. 2000, Menlo Park: AAAI PressGoogle Scholar
- Schulz S, Spackman K, James A, Cocos C, Boeker M: Scalable representations of diseases in biomedical ontologies. Journal of Biomedical Semantics. 2011, 2 (Suppl 2): S6-10.1186/2041-1480-2-S2-S6.View ArticleGoogle Scholar
- Schmidt-Schauss M: Computational aspects of an order-sorted logic with term declarations. 1989, Lecture no. Berlin; New York: Springer Verlag, 171-View ArticleGoogle Scholar
- Thomasma DC: The comatose patient, the ontology of death, and the decision to stop treatment. Theoretical Medicine. 1984, 5: 181-196. 10.1007/BF00489490.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.