Skip to main content

TNM-O: ontology support for staging of malignant tumours



Objectives of this work are to (1) present an ontological framework for the TNM classification system, (2) exemplify this framework by an ontology for colon and rectum tumours, and (3) evaluate this ontology by assigning TNM classes to real world pathology data.


The TNM ontology uses the Foundational Model of Anatomy for anatomical entities and BioTopLite 2 as a domain top-level ontology. General rules for the TNM classification system and the specific TNM classification for colorectal tumours were axiomatised in description logic. Case-based information was collected from tumour documentation practice in the Comprehensive Cancer Centre of a large university hospital. Based on the ontology, a module was developed that classifies pathology data.


TNM was represented as an information artefact, which consists of single representational units. Corresponding to every representational unit, tumours and tumour aggregates were defined. Tumour aggregates consist of the primary tumour and, if existing, of infiltrated regional lymph nodes and distant metastases. TNM codes depend on the location and certain qualities of the primary tumour (T), the infiltrated regional lymph nodes (N) and the existence of distant metastases (M). Tumour data from clinical and pathological documentation were successfully classified with the ontology.


A first version of the TNM Ontology represents the TNM system for the description of the anatomical extent of malignant tumours. The present work demonstrates its representational power and completeness as well as its applicability for classification of instance data.


Clinical and pathological staging of malignant tumours is one of the most important procedures in the diagnosis of cancer for prognosis assessment and treatment planning. The staging procedure compiles several clinical and pathological parameters such as the location and the size of the primary tumour, the location and the number of the infiltrated regional lymph nodes, and the existence of distant metastases.

A prerequisite for an evidence-based cancer treatment is a correct and unambiguous cancer diagnosis. Interdisciplinary expert groups, e.g. from clinical medicine, imaging, and pathology, have been working in close cooperation to establish criteria for precise tumour diagnoses [1]. One of the most challenging tasks in clinical oncology is to correctly classify and code clinical findings, using a multitude of available coding systems.

By far, the most important coding system for tumour staging is the Tumour-Node-Metastasis (TNM) classification [2] for malignant tumours, published by the Union for International Cancer Control (UICC)1. Besides a growing number of reliable biomarkers, TNM classification and staging are the most important information for the therapy planning for patients with colorectal cancer [35] and other solid tumours (e.g. cancer of the head and neck [6] or breast tumours [7]), except cancers of the central nervous system. In addition, the TNM classification system is important in cancer research for a correct description and classification of the anatomical extent of a given tumour. This is not only relevant for cancer epidemiology but also in fundamental tumour research (e.g. the dataset descriptions for researchers of the Surveillance, Epidemiology, and End Results Program (SEER) of the National Cancer Institute2 and predefined results using TNM stratified data3).

The TNM coding procedure requires advanced skills, encompassing both experience in tumour documentation and in-depth domain knowledge. The criteria for classification of the different primary tumour locations differ to the same extent as the underlying diseases. As a consequence, even expert coders and physicians for one organ system might encounter difficulties in the correct application or interpretation of TNM in a different organ system. Several combinations of tumour findings are difficult to encode due to ambiguous or overlapping criteria (non-disjoint definitions) or non-exhaustive definitions, which often result in cases where no TNM code or more than one TNM code is applicable to a given tumour state. A variety of problems with TNM coding has been described for different tumour locations. Main issues that arise in the practice of TNM coding derive from overly complex definitions of the underlying medical situation, which then result in interpretation problems even for experts [810]. The required in-depth knowledge of the domain, together with specific competences needed for TNM coding, result in poor coding completeness and quality, especially with the clinical staging in outpatients [11, 12]. Given the importance of TNM staging for the individual patient, deviation rates of about 20 % for clinical coding and 10 % for pathological coding can be interpreted as very high [13].

The complexity of TNM is mainly due to the development of the TNM classification as an evolutionary process [14], which has been constantly incorporating huge amount of new scientific insights in tumour prognosis and the dependency of therapeutic effects on tumour stage. Controlled by medical experts, TNM’s underlying structure has become more and more complex over the years. Experts in different fields of oncology have demanded a change in TNM maintenance, to address the increasing complexity, the detachment from clinical practice, and the resources needed for documentation [15, 16]. Therefore, standardisation of tumour classification and staging is an urgent requirement for improvement of tumour documentation in primary documentation, clinical studies and cancer registries [11, 1720].

Despite its importance and formal precision, to the knowledge of the authors, no formal representation of the complete TNM is available so far. Formal, i.e. computable representations would have several advantages over TNM’s current publication as a textbook. An initial attempt to represent staging of lung tumours and glioma tumours was not continued [21, 22]. More recently, a description logics based (DL) approach was presented [23].

One of the major requirements a formal representation of TNM could satisfy is the automatic classification of instance data obtained from clinical databases or mined from textual reports [2426]. Consecutively, instance data classification could inform higher order processes such as clinical documentation systems. Instance data on pathological or clinical conditions are collected during routine health care processes in pathology or other clinical information systems. Users could be supported by automatic encoding of instance data to TNM in real time or in spatially and temporally disseminated settings (e.g. in tumour documentation). For intelligent documentation systems in clinical oncology and pathology, a TNM ontology could be deployed as part of the knowledge base supporting the coding of tumour-related findings and the interpretation of TNM codes. In such systems a TNM ontology could enable automated reasoning based in description logics, which would timely detect logical inconsistencies and complexity related coding problems in databases and textual reports. In integrated clinical decisions support systems (DSS) TNM could be deployed to inform users about guideline-conformant treatment [27]. A further advantage of a formal approach would be the enhanced support for development and refinement of TNM. With a taxonomic backbone and axiomatic descriptions, the current complex natural language descriptions could be converted into computable structures. This would help decompose the descriptions into all their defining criteria, which in turn could facilitate the detection of coding errors, inconsistencies, and ambiguities in definitions [28, 29].

Description logics is the method of choice for a formalization of TNM [30]. Advanced retrieval and querying tools would be additional benefits that come with a logical representation following principles of Applied Ontology [31]. For these use cases, a formalised TNM version could constitute a unified source on which a variety of clinical documentation and analysis tools could be based. In addition, such a resource could be mapped to other DL-based clinical ontologies, especially to SNOMED CT.

With this work, we propose to close the gap of a missing formal representation by outlining and prototyping the TNM ontology (TNM-O). Following up on initial attempts in the breast cancer domain [32], the objectives of this work are (1) to present an ontological framework for the TNM classification system, (2) to implement a TNM ontology, describing colon and rectum tumours based on this framework, and (3) to evaluate this ontology using a tool for classifying pathology data.

The TNM classification

The canonical description of the TNM classification based on the anatomic extent of disease (EOD) is published by the UICC and the AJCC [2, 33]. The UICC published the first edition of the TNM coding system in 1968. Since then, the system has undergone several revisions, with the 7th edition published in 2009. The AJCC has recently announced the release of the 8th edition of the TNM classification for the beginning of 20174. The part of the new version for lung cancer is already in use with its important changes satisfying urgent medical requirements [34]. The objectives of the TNM coding system are six-fold. It supports treatment planning, prediction of outcomes (prognosis), evaluation of treatment results, exchange of information between different participants in health care processes, continuing research in malignant diseases, and cancer control [2, 14].

The core TNM classification uses three descriptors: T (tumour), N (metastasis in regional lymph nodes), and M (distant metastasis). The extent of the disease is indicated by integer values resp. character modifiers: TX (Tumour cannot be assessed), T0 (No evidence of primary tumour), T1-4 (increasing size or local extent), Tis (Carcinoma in situ); NX (Regional lymph nodes cannot be assessed), N0 (No regional lymph node metastasis), N1-3 (Increasing involvement of regional lymph nodes); M0 (No distant metastasis), M1 (Distant metastasis). For some entities further subdivisions of the categories are possible indicated by lower case characters (e.g. N2a and N2b).

The specific medical denotation for the different descriptors is dependent on the localisation of the tumour, designated by the ICD-O localisation code5. It is not possible to list all single regions addressed by the TNM classification here (for a current list see [2]). However, the TNM classification is not available for all body regions or systemic malignancies (e.g. C70-C72 Tumours of the Central Nervous System, C33 Trachea, C42, and C77 Tumours of haematopoietic and lymphoid tissues). For most of these malignancies the anatomical extent is either not determinable (systemic malignancies e.g. leukaemia) or the tumours have no metastasis (e.g. CNS tumours). The World Health Organisation (WHO) has published the 3rd edition of International Classification of Diseases for Oncology (ICD-O) in 2003. As an extension of the International Classification of Diseases (ICD-10) [35] for tumour diseases, the ICD-O is a dual classification system for the tumour morphology and the tumour localisation [36]. ICD-O is widely used in clinical medicine, tumour documentation, and research to encode tumour morphology and tumour localisation.

With an additional modifier, the TNM classification is divided into the pre-treatment clinical (indicated as cTNM) and post-surgical pathological (pTNM) classification. pTNM codes can only be assigned to the disease after pathological assessment following surgery and is the most important diagnostic item for following (adjuvant) radio- or chemotherapy or their combination. The results from the clinical assessment have to be accurately discerned from the pathological assessment due to their different meanings and evidence levels.

Besides the already complex semantics of the main numeric TNM codes, a series of additional symbols exists, which might have largely different meanings in the different tumour locations. Prefixes, suffixes, and certainty factors increase the confusion, e.g. for carcinoma in situ the suffix “is” has to be used (“Tis”). As TNM allows putting an “X” wherever the information about the clinical or pathological situation is incomplete or inaccurate, incomplete code assignments become widespread (e.g. MX for “no statement on metastases possible”). In this work only the classes with the descriptors T, N, and M with the modifiers c and p are represented (for a full list see Table 1).

Table 1 TNM classification descriptors and additional modifiers

pTNM codes are grouped into stages which are based on the prognosis of the patients. Stages are designated by the roman numerals I-IV and further subdivided into substages described by capital letters A-C. TNM staging has been subject to frequent changes during the history of the TNM classification, according to scientific and medical progress [34]. The mapping of the TNM classification for colon and rectum tumours to stages for version 7 is provided in [2, 4].


TNM-O, the TNM ontology presented here, uses the Foundational Model of Anatomy [37] for anatomical entities, together with BioTopLite 2 (BTL2) as a domain top-level ontology [38, 39]. Tailored for the biomedical domain and based on description logics [30], BTL2 provides upper-level types both for general categories like Material object, Process, Information object, Quality etc., as well as constraints on all of them, using a set of sixteen canonical relations, partly derived from the OBO Relation Ontology (RO) [40]. They constrain each category by means of a set of general class axioms. BTL2 also contains other axioms such as relationship chains, existential and value restrictions. Thus, the building of domain ontologies under BTL2 heavily constrains the freedom of the ontology engineer, which is fully intended as it guarantees a higher predictability of the outcomes of the domain ontology production under BTL2.

The design of BTL2 is top-level agnostic and has been influenced both by the Basic Formal Ontology (BFO and BFO2) and the Descriptive Ontology for Linguistic and Social Engineering (DOLCE) which is discussed in more detail in [39]. BTL2 is especially appropriate as domain top-level for TNM-O because it provides a lean, yet exhaustive ontological framework for the representation of clinical documentation artefacts. Moreover, it is fully axiomatised using RO (see above) so that it is interoperable with other ontologies in the biomedical domain.

The development of TNM-O is an ongoing process. For this study, colorectal cancer was chosen as use case for several reasons. It is the third most common cancer worldwide and accounts for 9 % of all cancer incidence [41, 42], affecting more than one million humans in 2002. Treatment of cancer patients and research on causes of cancer are main goals of worldwide cancer control programs6. In prior work, the TNM classification for breast tumours (ICD-O C50) had been formally represented [32]. The selection of breast and colorectal tumours was motivated both by their paramount medical importance and their complexity in TNM, where both follow non-trivial medical classification principles, especially for the cN and pN classifications. Demonstrating the appropriateness and feasibility of TNM-O for these two tumour locations provides a good support for the general applicability of the approach.

The general rules of the TNM classification and the specific TNM classification for tumours of the colon and the rectum (ICD-O topography chapters C18 – C21, for ICD-O morphology codes see Table 2) were represented as described [2, 43].

Table 2 ICD-O 3 morphology codes for tumours of the colon and the rectum

A classifying tool for individuals (instances) derived from pathology reports was developed employing the OWL API (version 4.0.1)7 and the HermIT DL reasoner (version 1.3.8)8. It classifies breast tumour and colorectal tumour data based on the corresponding TNM ontologies. It reads either tabular input data from files or processes data from manual entry via a graphical user interface.

The objective of TNM-O is not to re-design an existing tumour classification into a new system. At the current level of development, TNM-O is the result of an ontological analysis of what has been developed by the medical community over a long period, followed by its translation into a formal language, incorporating ontological principles, in order to improve the development, maintenance, and application of the TNM classification system.

In the following two sections, we describe (1) the TNM classification in detail as foundation of what has to be represented by TNM-O, (2) how the TNM classification artefacts are represented by information artefacts of TNM-O, (3) how these information artefacts are related to the actual tumour entities, and (4) how the patho-anatomical reality of tumour disease is constructed in terms of what is required for the TNM classification.

Design of the TNM-O

The relation between the artefacts of the TNM classification and the actual tumour diseases is denotational: the T code denotes the extent (size, infiltration) of the primary tumour, the N code the extent of regional lymph node metastases, and the M code the existence of distant metastases. For TNM-O, we adopted an approach which is compliant with the Information Artefact Ontology from the OBO Foundry and recently published work on the aboutness relation [44, 45]. In TNM-O, coding artefacts of the TNM classification i.e. the classes of the classification are represented by subclasses of btl2:InformationObject as RepresentationalArtefact. Information reported on individual patients, e.g. as TNM-codes in patient records are thus individuals of these classes. Individuals from subclasses of InformationObject are related by btl2:represents to individuals of classes about the current disease state (AnatomicalStructure). The inverse relation is btl2:isRepresentedBy connects material or processual entities with the respective TNM-artefact.

As the TNM classification is compositional, the individual classes of the three descriptors can be independently combined to a joint code. Classes are only dependent on the location of the primary tumour and additional modifiers c or p: e.g. cN1 for colon cancer has a different meaning than cN1 for breast cancer, and cT1 has a different meaning than pT1 for all locations where these codes are available). This characteristic is conserved in TNM-O. The class RepresentationalUnit is a superclass of organ specific classes separated in a clinical and a pathological branch.

For representing anatomical structure, TNM-O uses content from the Foundational Model of Anatomy, restricted to cancer-related anatomy as referred to by the TNM classification. All primary tumours individuals and metastases are then related to individuals anatomical entities by the relation btl2:locatedIn, thus providing them with an exact topography and extent. The extent of primary tumours cannot only be described by their localisation (i.e. occupying space or infiltrating through layers of an organ) but can be further characterised by qualities, e.g. tumour size or infiltration patterns. These qualities are dependent on the localisation of the primary tumour and can substantially differ between them.

What makes a lymph node a regional lymph node depends on its proximity to a primary organ. An axillary lymph node is a regional lymph node of the breast gland but not of the colon. For all relevant organs, these regional lymph node groups are to be defined. Moreover, the formalisation of infiltrated regional lymph nodes depends on the aggregate of a localised primary tumour together with some metastasis in a regional lymph node of that organ in which the primary tumour is located. Thus, an infiltrated axillary lymph node is a regional lymph node metastasis for a breast tumour, but certainly not for a colon cancer. Distant metastases are, by definition, those located in a tumour aggregate that is not a regional lymph node of the primary tumour.

Classification of pathology data

We computationally classified data describing the extent of 291 colorectal cancer specimens into TNM, documented at the Institute of Surgical Pathology, Medical Center – University of Freiburg using a pathology information system. This data were re-coded as RDF-OWL instance data and classified into classes of TNM-O by an application based on the OWL API using an OWL classifier9. Automatic classification was solely based on axioms defined in the colorectal TNM-O version 7 (TNM-O_colon_7.owl). The complete set of criteria is shown in Table 3.

Table 3 Criteria of TNM version 7 for colorectal cancers. All TNM codes can be inferred from this criteria. The exact wording of the textual definitions of the TNM in version 7 is diverging. Exact count of infiltrated organs in distant metastasis is omitted

For comparison of the ontology-based TNM classification with a manual expert TNM classification, the data were manually classified by a pathologist into TNM version 7.


TNM-O is designed as a modular system of independent ontologies under BTL2. For every organ or organ system based module of the TNM classification system, TNM-O provides a set of specific ontologies. The TNM connecting ontology serves as a hub to import BTL2 as well as the organ and organ system specific TNM ontologies (see Table 4). With the modular architecture only those modules are included that are needed by a tumour-specific application.

Table 4 Modular structure of TNM-O. Codes in clinical documentation and cancer registries follow TNM versions, because the meaning of codes and stages may change between versions. The modular structure is designed to include versions for every available TNM encoded entity (tumour location) so that the intended meaning is preserved according to the version used for coding

The hub TNM Ontology for all tumours can be downloaded from The ontologies for breast tumours and colorectal tumours are named according to Table 4 and can be downloaded from the same site. They need to be loaded in the hub ontology.

Without inclusion of BTL2, the TNM hub ontology has the description logic expressivity of \(\mathcal {A}LC\) (for a short introduction to the DL nomenclature see [46] section Description Logic Nomenclature). It consists of 79 axioms, 38 logical axioms, and 39 classes. It includes 35 subClassOf and one EquivalentTo axioms. Most of the classes are proxy classes to BTL2. Inclusion of BTL2 changes the DL expressivity to \(\mathcal {S}RI\).

The TNM ontology for colorectal tumours has the description logic expressivity of \(\mathcal {A}LC\). For TNM version 7.0 (version 6.0 in brackets), it consists of 366 (357) axioms, 198 (199) logical axioms, and 161 (149) classes. It includes 123 (160) subClassOf, 57 (18) EquivalentTo and 18 (18) DisjointClasses axioms.

Representational units in the TNM-Ontology

The representation of the TNM system is decomposed into the representational units T, N, and M, together with the location of the primary tumour. Thus, for every existing code Tn, Nn, and Mn in combination with a specific organ there exists one TNM-O:RepresentationalUnit which is an btl2:InformationObject. E.g. every TNM code for colorectal cancer is represented by a separate class. Axioms using the relation btl2:isRepresentedBy introduce possible TNM values for subclasses of PrimaryTumour or TumourAggregate. This is done by connecting these values via the universal quantifier ONLY (role restriction). In all of these cases, the clause “or (not RepresentationalUnitInTNMClassification)” allows other values that are not TNM representational units. In the remaining text, the namespace of the TNM ontology is suppressed for clarity:

TumourOfColonAndRectumWith7OrMoreMetastaticRegional- LymphNodes subClassOfTumourAggregate andbtl2:isRepresentedBy only (ColonRectumTNM_pN2b or ColonRectumTNM_N2b or (not RepresentationalUnitInTNMClassification))

Representation of the primary tumour

The primary tumour is represented as PrimaryTumour, a subclass of MalignantAnatomicalStructure. The tumour characteristics relevant for the representational unit T of the TNM classification system are represented as location and qualities of PrimaryTumour. For colorectal tumours, the exact localization of the tumour in the gut wall, the quality of the tumour confinement with respect to neighbouring organs (confined or invasive), the quality of the assessment (no assessment, no evidence or carcinoma in situ), are important:

InvasiveTumourOfSubmucosaOfColonAndRectumEquivalentTo ColonAndRectumTumour and (btl2:isBearerOf some (Confinement and (btl2:projectsOnto some Invasive))) and (btl2:isIncludedIn someSubmucosaOfLargeIntestine)

The specific tumour defined as subclass of PrimaryTumour above is directly related to the corresponding representational unit as introduced in the section above.

InvasiveTumourOfSubmucosaOfColonAndRectumsubClassOfbtl2:isRepresentedBy some (ColonRectumTNM_T1 orColonRectumTNM_pT1) andbtl2:isRepresentedBy only (ColonRectumTNM_T1 orColonRectumTNM_pT1 or (not RepresentationalUnitInTNMClassification))

Representation of regional lymph nodes

The most complex part of the TNM classification of many primary tumour locations is the interpretation of the axis N, which describes the extent of infiltration of regional lymph nodes by the primary tumour. The anatomy of lymph nodes draining the colon and rectum was modelled according to clinical anatomical conventions. Metastatic regional lymph nodes can exactly be located by the exact subclass of infiltrated regional lymph node:

MetastaticLymphNodeOfColonAndRectumTumourEquivalentTo LymphNode and (btl2:hasPart someMetastasisOfColonAndRectumTumour)

MetastaticRegionalLymphNodeOfColonAndRectumTumourEquivalentToMetastaticLymphNodeOfColonAndRectumTumour andColonAndRectumRegionalLymphNode

To define regional lymph node metastases of colorectal cancers, the aggregate of primary tumour and infiltrated lymph nodes around the colon and rectum (TumourAggregate) has to be considered as one (composite) entity. The representational unit N of the TNM classification of colorectal cancers depends on the count of metastatic regional lymph nodes and the presence of subserosal tumour deposits without regional lymph node metastases. The count of metastatic lymph nodes is represented by subclasses of CardinalityValueRegion:

TumourOfColonAndRectumWith2or3MetastaticRegional- LymphNodes EquivalentToTumourOfColonAndRectumWith1to3MetastaticRegional- LymphNodes and (btl2:isBearerOf some (Cardinality and (btl2:projectsOnto someCardinality2or3) and (btl2:projectsOnto onlyCardinality2or3)))

Representation of distant metastases

For the representational unit M of the TNM classification system the existence and number of distant metastases are evaluated. The definition of distant metastases excludes regional lymph nodes as their localisation:

DistantMetastasisOfColonAndRectumTumour EquivalentToMetastasisOfColonAndRectumTumour and (not (btl2:isIncludedIn someColonAndRectumRegionalLymphNode))

TumourOfColonAndRectumWithDistantMetastasisEquivalentToTumourOfColonAndRectumAggregate and (btl2:hasPart someDistantMetastasisOfColonAndRectumTumour)

TumourOfMammaryGlandWithDistantMetastasissubClassOf (btl2:isRepresentedBy only (MammaryGlandTNM_M1 orMammaryGlandTNM_pM1 or (not RepresentationalUnitInTNMClassification))

Classification of pathology data

All instance data of 291 samples of colorectal cancer could be classified into classes of TNM-O on colorectal cancer. A posteriori comparison of the automatic classification results with a manual TNM coding based on the same findings from the pathology database by an experienced pathologist showed 100 % agreement. Table 5 shows 15 exemplary tabular instance data rows and the corresponding manual and automatic classification results. Figures 1 and 2 shows an example of an RDF-OWL instance which corresponds with rows 6 and 8 of Table 5. For clarity, the RDF example focuses on TNM N, other details on tumour invasion and distant metastasis were left out. All automatic classification results are based on TNM-O, TNM-O_colorectal_7 and RDF-OWL instance data.

Fig. 1
figure 1

N1b representational unit of TNM-O for colorectal tumours. Graph of the patho-anatomical structures represented by an N1b representational unit of the TNM-O for colorectal tumours version 7 (TNM-O_colorectal_7.owl). T and M representational units are unspecified

Fig. 2
figure 2

RDF-OWL instance of a tumour aggregate and corresponding OWL classes. Graph of an RDF instance of a tumour aggregate as created from tabular data according to TNM-O for colorectal tumours version 7 (TNM-O_colorectal_7.owl). RDF instances data are depicted with a purple diamond. RDF instance for T and M classification are omitted. Instances of this type are classified as TNM N1b

Table 5 TNM relevant tabular data, manual expert TNM classification (subscript P), and ontology-based automatic TNM classification (subscript O)


TNM is a globally accepted system to describe the anatomical extent of malignant tumours [2, 14]. Although TNM is of high importance for tumour staging, to the knowledge of the authors, there exists no comprehensive formal representation of TNM so far. With this work, the authors provide a first version of a TNM ontology (TNM-O) and a prototypical implementation of TNM for colorectal cancers. Further, this work shows that TNM-O classifies instance data.

Over time, TNM has developed into a coding system, which had to accommodate both the pragmatics of coding and representational accuracy. The literature on ambiguities and difficulties of TNM in practice is abundant. The discussion of TNM for breast tumours illustrates the dilemma of its maintainers [8, 47, 48]. They had to account for the rapid progression of scientific knowledge on tumours and to keep it usable at the same time: new versions of TNM are already outdated when compared with new scientific insights. On the other hand, TNM has become increasingly complex, with a negative impact on its usability by both expert and non-expert documentation staff and physicians.

Encoding clinical conditions using TNM as well as the selection of the right treatment according to TNM codes is daily routine in oncology. In order to assist in these difficult and time consuming decision processes, several systems have been proposed, usually based on text extraction from pathology reports and machine learning algorithms [2426]. The accuracy of these approaches was relatively low [24]. Here, we present an ontology, which classifies instance data with 100 % accuracy in an experimental setting based on structured data. We hypothesise that DL based classification using TNM-O could also improve the results from automated information extraction from unstructured data as done in the above mentioned approaches. Such systems could also be made available in intelligent documentation systems in the form of embedded decision support systems, which could help to choose the right codes for a clinical condition and/ or the right guideline compliant treatment for a given code (describing a clinical condition). Furthermore, we think that with an ontology the curation of the TNM itself could be improved. Based on a taxonomic and axiomatic description, the detection of coding errors, inconsistencies, and ambiguities in definitions could be facilitated [28, 29]. A formal description logic based axiomatisation allows the use of specific reasoning tools to check for inconsistencies during the ontology engineering process, which would indicate conflicting axioms. Redundancies or wrong hierarchical dependencies is detected by checking the inferred class hierarchy after DL classification.

This study is limited as far as we provide here a first version of the TNM Ontology (TNM-O), limited to mammary gland [32] and colorectal tumours. As these two tumour entities are the most complex and best represented ones in TNM, the current version is already sufficiently complete and stable to be used as a blueprint for TNM-O extensions to other organ systems.

Due to the nature of the domain and the rich top-level ontology employed, the computational resources needed to classify the ontology are considerable. In order to alleviate performance issues, TNM-O will be provided as modules for different organ systems. Thus, the users can import only the modules of interest into their application context.

Future research should evaluate the presented prototype ontology (i) by implementing further tumour locations, and (ii) by systematic application in clinical classification and retrieval scenarios. We will provide the formalization of TNM for other primary tumour locations in a modular way, so that users can select which part of the TNM-O they would like to use. In this way, we hope to reduce the computational resources already needed to a minimum.


We presented a first version of an ontology (TNM-O) that represents the TNM tumour classification system. The present work demonstrates its representational power and completeness as well as its applicability for classification of instance data. This work provides a foundation for an exhaustive TNM ontology.












  1. DeVita VT, Lawrence TS, Rosenberg SA, (eds).DeVita, Hellman, and Rosenberg’s Cancer: Principles & Practice of oncology, 9th edn. Philadelphia: Wolters Kluwer Health/Lippincott Williams & Wilkins; 2011.

    Google Scholar 

  2. Sobin LH, Gospodarowicz MK, Wittekind C. TNM Classification of Malignant Tumours, 7edn. Chichester, West Sussex; Hoboken: John Wiley & Sons; 2009.

    Google Scholar 

  3. Glimelius B, Tiret E, Cervantes A, Arnold D, Group OBOTEGW. Rectal cancer: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Ann Oncol. 2013; 24(suppl 6):81–8. doi:10.1093/annonc/mdt240.

    Article  Google Scholar 

  4. Labianca R, Nordlinger B, Beretta GD, Mosconi S, Mandalà M, Cervantes A, Arnold D, Group OBOTEGW. Early colon cancer: ESMO clinical practice guidelines for diagnosis, treatment and follow-up. Ann Oncol. 2013; 24(suppl 6):64–72. doi:10.1093/annonc/mdt354.

    Article  Google Scholar 

  5. Poston GJ, Tait D, O’Connell S, Bennett A, Berendse S. Diagnosis and management of colorectal cancer: summary of NICE guidance. BMJ. 2011; 343:6751. doi:10.1136/bmj.d6751.

    Article  Google Scholar 

  6. Roland NJ, Paleri V, British Association of Otolaryngologists. Head and Neck Cancer: Multidisciplinary Management Guidelines. London: ENT-UK; 2011.

    Google Scholar 

  7. Senkus E, Kyriakides S, Penault-Llorca F, Poortmans P, Thompson A, Zackrisson S, Cardoso F. Group on behalf of the EGW: Primary breast cancer: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Ann Oncol. 2013; 24(suppl 6):vi7–vi23. doi:10.1093/annonc/mdt284.

    Article  Google Scholar 

  8. Güth U, Jane Huang D, Holzgreve W, Wight E, Singer G. T4 breast cancer under closer inspection: A case for revision of the TNM classification. The Breast. 2007; 16(6):625–36. doi:10.1016/j.breast.2007.05.006.

    Article  Google Scholar 

  9. Nagtegaal ID, Marijnen CAM. The future of TNM staging in rectal cancer: The era of neoadjuvant therapy. Curr Color Cancer. 2008; 4(3):147–54. doi:10.1007/s11888-008-0024-z.

    Google Scholar 

  10. Adsay NV, Bagci P, Tajiri T, Oliva I, Ohike N, Balci S, Gonzalez RS, Basturk O, Jang KT, Roa JC. Pathologic staging of pancreatic, ampullary, biliary, and gallbladder cancers: pitfalls and practical limitations of the current AJCC/UICC TNM staging system and opportunities for improvement. Semin Diagn Pathol. 2012; 29(3):127–41. doi:10.1053/j.semdp.2012.08.010.

    Article  Google Scholar 

  11. Abernethy AP, Herndon JE, Wheeler JL, Rowe K, Marcello J, Patwardhan M. Poor Documentation prevents adequate assessment of quality metrics in colorectal cancer. J Oncol Pract. 2009; 5(4):167–74. doi:10.1200/JOP.0942003.

    Article  Google Scholar 

  12. Walters S, Maringe C, Butler J, Brierley JD, Rachet B, Coleman MP. Comparability of stage data in cancer registries in six countries: Lessons from the International Cancer Benchmarking Partnership. Int J Cancer. 2013; 132(3):676–85. doi:10.1002/ijc.27651.

    Article  Google Scholar 

  13. Brierley JD, Catton PA, O’Sullivan B, Dancey JE, Dowling AJ, Irish JC, McGowan TS, Sturgeon JF, Swallow CJ, Rodrigues GB, et al.Accuracy of recorded tumor, node, and metastasis stage in a comprehensive cancer center. J Clin Oncol. 2002; 20(2):413–9.

    Article  Google Scholar 

  14. Webber C, Gospodarowicz M, Sobin LH, Wittekind C, Greene FL, Mason MD, Compton C, Brierley J, Groome PA. Improving the TNM classification: Findings from a 10-year continuous literature review. Int J Cancer. 2014; 135(2):371–8. doi:10.1002/ijc.28683.

    Article  Google Scholar 

  15. Quirke P, Cuvelier C, Ensari A, Glimelius B, Laurberg S, Ortiz H, Piard F, Punt CJ, Glenthoj A, Pennickx F, Seymour M, Valentini V, Williams G, Nagtegaal ID. Evidence-based medicine: the time has come to set standards for staging. J Pathol. 2010; 221(4):357–60. doi:10.1002/path2720.

    Google Scholar 

  16. Quirke P, Williams GT, Ectors N, Ensari A, Piard F, Nagtegaal I. The future of the TNM staging system in colorectal cancer: time for a debate?Lancet Oncol. 2007; 8(7):651–7. doi:10.1016/S1470-2045(07)70205-X.

    Article  Google Scholar 

  17. Filson CP, Boer B, Curry J, Linsell S, Ye Z, Montie JE, Miller DC. Improvement in clinical TNM staging documentation within a prostate cancer quality improvement collaborative. Urology. 2014; 83(4):781–7. doi:10.1016/j.urology.2013.11.040.

    Article  Google Scholar 

  18. Aumann K, Amann D, Gumpp V, Hauschke D, Kayser G, May AM, Wetterauer U, Werner M. Template-based synoptic reports improve the quality of pathology reports of prostatectomy specimens. Histopathology. 2012; 60(4):634–44. doi:10.1111/j.1365-2559.2011.04119.x.

    Article  Google Scholar 

  19. Compton CC. Key issues in reporting common cancer specimens: problems in pathologic staging of colon cancer. Arch Pathol Lab Med. 2006; 130(3):318–24. doi:10.1043/1543-2165(2006)130[318:KIIRCC]2.0.CO;2.

    Google Scholar 

  20. Nagtegaal ID, Kranenbarg EK, Hermans J, van de Velde CJH, van Krieken JHJM, Committee TPR. Pathology data in the central databases of multicenter randomized trials need to be based on pathology reports and controlled by trained quality managers. J Clin Oncol. 2000; 18(8):1771–9.

    Google Scholar 

  21. Dameron O, Roques É, Rubin D, Marquet G, Burgun A. Grading lung tumors using OWL-DL based reasoning. In: 9th International Protégé Conference - Presentation Abstracts. Stanford, USA: Stanford University: 2006. p. 69.

    Google Scholar 

  22. Marquet G, Dameron O, Saikali S, Mosser J, Burgun A. Grading glioma tumors using OWL-DL and NCI Thesaurus. AMIA Annu Symp Proc. 2007; 2007:508–12.

    Google Scholar 

  23. Massicano F, Sasso A, Amaral-Silva H, Oleynik M, Nobrega C, Patrao DF. An Ontology for TNM Clinical Stage Inference In: Freitas F, Baiao F, editors. Proceedings of the Brazilian Seminar on Ontologies (ONTOBRAS 2015). Sao Paulo, Brazil: 2015.

    Google Scholar 

  24. Spasić I, Livsey J, Keane JA, Nenadić G. Text mining of cancer-related information: Review of current status and future directions. Int J Med Inf. 2014; 83(9):605–23. doi:10.1016/j.ijmedinf.2014.06.009.

    Article  Google Scholar 

  25. McCowan IA, Moore DC, Nguyen AN, Bowman RV, Clarke BE, Duhig EE, Fry MJ. Collection of Cancer Stage Data by Classifying Free-text Medical Reports. J Am Med Inform Assoc. 2007; 14(6):736–45. doi:10.1197/jamia.M2130.

    Article  Google Scholar 

  26. Nguyen AN, Lawley MJ, Hansen DP, Bowman RV, Clarke BE, Duhig EE, Colquist S. Symbolic rule-based classification of lung cancer stages from free-text pathology reports. J Am Med Inform Assoc. 2010; 17(4):440–5. doi:10.1136/jamia.2010.003707.

    Article  Google Scholar 

  27. Rossille D, Laurent JF, Burgun A. Modelling a decision-support system for oncology using rule-based and case-based reasoning methodologies. Int J Med Inf. 2005; 74(2):299–306. doi:10.1016/j.ijmedinf.2004.06.005.

    Article  Google Scholar 

  28. Ceusters W, Smith B, Kumar A, Dhaen C. Ontology-based error detection in SNOMED-CT (R) In: Fieschi M, Coiera E, Li YCJ, editors. Medinfo 2004: Proceedings of the 11th World Congress on Medical Informatics, Pt 1 and 2 vol 107. Amsterdam: IOS Press: 2004. p. 482–6.

    Google Scholar 

  29. Cornet R, Abu-Hanna A. Description logic-based methods for auditing frame-based medical terminological systems. Artif Intell Med. 2005; 34(3):201–17. doi:10.1016/j.artmed.2005.01.003.

    Article  Google Scholar 

  30. Baader F, Calvanese D, McGuinness DL, Nardi D, Patel-Schneider PF. The Description Logic Handbook: Theory, Implementation And Applications, 2nd edn. Cambridge: Cambridge University Press; 2008.

    MATH  Google Scholar 

  31. Smith B. Applied ontology: A new discipline is born. Philos Today. 1998; 12(29):5–6.

    Google Scholar 

  32. Boeker M, Faria R, Schulz S. A Proposal for an Ontology for the Tumor-Node-Metastasis Classification of Malignant Tumors: a Study on Breast Tumors In: Jansen L, Boeker M, Herre H, Loebe F, editors. Ontologies and Data in Life Sciences (ODLS 2014). Proceedings of the 6th Workshop of the GI Workgroup Ontologies in Biomedicine and Life Sciences (OBML). Volume 1/2014. Leipzig: Universität Leipzig: 2014. [IMISE-REPORTS].

    Google Scholar 

  33. Sobin LH, Wittekind C. TNM Classification of Malignant Tumours, 6. auflage edn. New York: John Wiley & Sons; 2002.

    Google Scholar 

  34. Goldstraw P, Chansky K, Crowley J, Rami-Porta R, Asamura H, Eberhardt WEE, Nicholson AG, Groome P, Mitchell A, Bolejack V, Goldstraw P, Rami-Porta R, Asamura H, Ball D, Beer DG, Beyruti R, Bolejack V, Chansky K, Crowley J, Detterbeck F, Eberhardt WEE, Edwards J, Galateau-Sallé F, Giroux D, Gleeson F, Groome P, Huang J, Kennedy C, Kim J, Kim YT, Kingsbury L, Kondo H, Krasnik M, Kubota K, Lerut A, Lyons G, Marino M, Marom EM, van Meerbeeck J, Mitchell A, Nakano T, Nicholson AG, Nowak A, Peake M, Rice T, Rosenzweig K, Ruffini E, Rusch V, Saijo N, Schil PV, Sculier JP, Shemanski L, Stratton K, Suzuki K, Tachimori Y, Thomas CF, Travis W, Tsao MS, Turrisi A, Vansteenkiste J, Watanabe H, Wu YL, Baas P, Erasmus J, Hasegawa S, Inai K, Kernstine K, Kindler H, Krug L, Nackaerts K, Pass H, Rice D, Falkson C, Filosso PL, Giaccone G, Kondo K, Lucchi M, Okumura M, Blackstone E, Cavaco FA, Barrera EA, Arca JA, Lamelas IP, Obrer AA, Jorge RG, Ball D, Bascom GK, Orozco AIB, Castro MAG, Blum MG, Chimondeguy D, Cvijanovic V, Defranchi S, Navarro B. d. O, Campuzano IE, Vidueira IM, Araujo EF, García FA, Fong KM, Corral GF, González SC, Gilart JF, Arangüena LG, Barajas SG, Girard P, Goksel T, Budiño MTG, Casaurrán GG, Blanco JAG, Hernández JH, Rodríguez HH, Collantes JH, Heras MI, Elena JMI, Jakobsen E, Kostas S, Atance PL, Ares AN, Liao M, Losanovscky M, Lyons G, Magaroles R, Júlvez LDE, Gorospe MM, McCaughan B, Kennedy C, Íñiguez RM, Sorribes LM, Gozalo SN, de Arriba CÁ, Delgado MN, Alarcón JP, Cuesta JCP, Park JS, Pass H, Fernández MJP, Rosenberg M, Ruffini E, Rusch V, Escuín JSdC, Vinuesa AS, Mitjans MS, Strand TE, Subotic D, Swisher S, Terra R, Thomas C, Tournoy K, Schil PV, Velasquez M, Wu YL, Yokoi K. The IASLC Lung Cancer Staging Project: Proposals for Revision of the TNM Stage Groupings in the Forthcoming (Eighth) Edition of the TNM Classification for Lung Cancer. J Thorac Oncol. 2016; 11(1):39–51. doi:10.1016/j.jtho.2015.09.009.

    Article  Google Scholar 

  35. WHO. International Classification of Diseases (ICD). 2016. Accessed 9 May 2016.

  36. WHO. International Classification of Diseases for Oncology, 3rd Edition (ICD-O-3). 2016. Accessed 9 May 2016.

  37. Rosse C, Mejino Jr. JLV. A reference ontology for biomedical informatics: the Foundational Model of Anatomy. J Biomed Inform. 2003; 36(6):478–500. doi:10.1016/j.jbi.2003.11.007.

    Article  Google Scholar 

  38. Beißwanger E, Schulz S, Stenzhorn H, Hahn U. BioTop: An Upper Domain Ontology for the Life Sciences - A Description of its Current Structure, Contents, and Interfaces to OBO Ontologies. Appl Ontol. 2008; 3(4):205–12.

    Google Scholar 

  39. Schulz S, Boeker M. BioTopLite: An Upper Level Ontology for the Life Sciences, Evolution, Design and Application In: Hornbach M, editor. INFORMATIK 2013. Ontologien in Den Lebenswissenschaften. Lecture Notes in Informatics, vol. p-220. Bonn: Gesellschaft für Informatik: 2013. p. 1889–99.

    Google Scholar 

  40. Smith B, Ceusters W, Klagges B, Köhler J, Kumar A, Lomax J, Mungall C, Neuhaus F, Rector AL, Rosse C. Relations in biomedical ontologies. Genome Biol. 2005; 6(5):46. doi:10.1186/gb-2005-6-5-r46.

    Article  Google Scholar 

  41. Marmot M, Atinmo T, Byers T, Chen J, Hirohata T, Jackson A, James W, Kolonel L, Kumanyika S, Leitzmann C, Mann J, Powers H, Reddy K, Riboli E, Rivera JA, Schatzkin A, Seidell J, Shuker D, Uauy R, Willett W, Zeisel S. Food, Nutrition, Physical Activity, and the Prevention of Cancer: a Global Perspective. 2007.

  42. Haggar FA, Boushey RP. Colorectal cancer epidemiology: incidence, mortality, survival, and risk factors. Clinics Colon Rectal Surg. 2009; 22(4):191–7. doi:10.1055/s-0029-1242458.

    Article  Google Scholar 

  43. Hamilton SR, Aaltonen LA, (eds).World Health Organization Classification of Tumours. Pathology and Genetics of Tumours of the Digestive System. Volume 48. Lyon: IARC press; 2000. [IARC WHO Classification of Tumours].

    Google Scholar 

  44. Schulz S, Schober D, Daniel C, Jaulent MC. Bridging the semantics gap between terminologies, ontologies, and information models In: Safran C, Reti S, Marin HF, editors. MEDINFO 2010 - Proceedings of the 13th World Congress on Medical Informatics. Studies in Health Technology and Informatics, vol. 160. Amsterdam: IOS Press: 2010. p. 1000–1004, doi:%002010.3233/978-1-60750-588-4-1000.

    Google Scholar 

  45. Smith B, Ceusters W. Aboutness: Towards foundations for the information artifact ontology In: Couto FM, Hastings J, editors. Proceedings of the Sixth International Conference on Biomedical Ontology (ICBO). Lisbon: 2015.

    Google Scholar 

  46. Rudolph S. Foundations of description logics. In: RW’11 Proceedings of the 7th international conference on Reasoning web: semantic technologies for the web of data. Berlin, Heidelberg: Springer: 2011. p. 76–136.

    Google Scholar 

  47. Barr LC, Baum M. Time to abandon TNM staging of breast cancer?The Lancet. 1992; 339(8798):915–7. doi:10.1016/0140-6736(92)90941-U.

    Article  Google Scholar 

  48. Gusterson BA. The new TNM classification and micrometastases. The Breast. 2003; 12(6):387–90. doi:10.1016/S0960-9776(03)00141-3.

    Article  Google Scholar 

Download references


The article processing charge was funded by the German Research Foundation (DFG) and the Albert Ludwigs University Freiburg in the funding programme Open Access Publishing.

Authors’ contributions

MB and SS designed the structure of TNM-O. FF implemented TNM-O for colorectal cancer, developed the module structure of TNM-O and curated TNM-O for breast cancer. PB and MB designed the classification study on pathology data for which PB provided the pathology dataset and evaluated the classification results. The manuscript was primarily drafted by MB and SS, and edited and approved for publication by all authors.

Competing interests

The authors declare that they have no competing interests.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Martin Boeker.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Boeker, M., França, F., Bronsert, P. et al. TNM-O: ontology support for staging of malignant tumours. J Biomed Semant 7, 64 (2016).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: