Enriching a primary health care version of ICD-10 using SNOMED CT mapping

Nyström, Mikael; Vikström, Anna; Nilsson, Gunnar H; Åhlfeldt, Hans; Örman, Håkan

doi:10.1186/2041-1480-1-7

Research
Open access
Published: 17 June 2010

Enriching a primary health care version of ICD-10 using SNOMED CT mapping

Mikael Nyström¹,
Anna Vikström²,
Gunnar H Nilsson²,
Hans Åhlfeldt¹ &
…
Håkan Örman¹

Journal of Biomedical Semantics volume 1, Article number: 7 (2010) Cite this article

10k Accesses
17 Citations
Metrics details

Abstract

Background

In order to satisfy different needs, medical terminology systems must have richer structures. This study examines whether a Swedish primary health care version of the mono-hierarchical ICD-10 (KSH97-P) may obtain a richer structure using category and chapter mappings from KSH97-P to SNOMED CT and SNOMED CT's structure. Manually-built mappings from KSH97-P's categories and chapters to SNOMED CT's concepts are used as a starting point.

Results

The mappings are manually evaluated using computer-produced information and a small number of mappings are updated. A new and poly-hierarchical chapter division of KSH97-P's categories has been created using the category and chapter mappings and SNOMED CT's generic structure. In the new chapter division, most categories are included in their original chapters. A considerable number of concepts are included in other chapters than their original chapters. Most of these inclusions can be explained by ICD-10's design. KSH97-P's categories are also extended with attributes using the category mappings and SNOMED CT's defining attribute relationships. About three-fourths of all concepts receive an attribute of type Finding site and about half of all concepts receive an attribute of type Associated morphology. Other types of attributes are less common.

Conclusions

It is possible to use mappings from KSH97-P to SNOMED CT and SNOMED CT's structure to enrich KSH97-P's mono-hierarchical structure with a poly-hierarchical chapter division and attributes of type Finding site and Associated morphology. The final mappings are available as additional files for this paper.

Background

Medical terminology systems evolution

There are various types of medical terminology systems to satisfy different needs. To satisfy more needs than what exist today, both Rossi Mori et al. [1] and Cimino [2] ask for an evolution of the medical terminology systems for more flexiblity.

Rossi Mori et al. describe three generations of medical terminology systems [1]. The first generation comprises traditional terminology systems [1]. This generation includes controlled vocabularies, nomenclatures, taxonomies and coding systems which satisfy most needs in paper-based information systems. In this generation, systems typically consist of a list of phrases, a list of codes, a coding scheme and a hierarchy. The role of the coding scheme is to map between phrases and codes [1]. Examples of systems in the first generation are ICD-10, KSH97-P and International Classification of Functioning, Disability and Health (ICF).

The second generation are compositional systems. These systems have a categorical structure, a cross-thesaurus, a structured list of phrases and a knowledge base of dissections [1]. The categorical structure gives a high-level description of the content, i.e. what kinds of concepts are included and how they relate to each other. This can be seen as a framework of slots for which the cross-thesaurus provides a set of labels to be inserted when the content is modelled. By means of the cross-thesaurus, each element in the structured list of phrases is represented according to the categorical structure; these descriptions constitute the knowledge base of dissections. Examples of systems in the second generation are Nomenclature, Properties and Units (NPU), Logical Observation Identifiers, Names and Codes (LOINC), and SNOMED International [1].

The third generation consist of formal systems. In this generation, the systems have a set of symbols and a set of formal rules to manipulate the symbols and these sets can be seen as a set of concepts and a set of relations between the concepts [1]. It is possible to represent each concept in a unique canonical form and a non-canonical expression may be automatically converted to a unique canonical form using an engine. An example of a third generation system is GALEN-IN-USE's surgical procedures [1]. SNOMED CT is evolving towards a third generation system.

One problem in the first generation is that reorganisation of categories in the systems to satisfy different purposes is not supported [1]. Reuse of data organised with first generation systems therefore needs human interpretation of the categories and the environment where the data was originally collected. This problem is smaller in the second generation and even smaller in the third generation. In the second generation, categories can be reorganised according to the information in the knowledge base of dissection. In the third generation, formal rules can be used for reorganisation [1].

Cimino enumerates twelve characteristics of the structure and content in medical terminology systems in "Desiderata for controlled medical vocabularies in the twenty-first century" and these characteristics emerge from earlier vocabulary research [2]. Four of these characteristics are relevant in this study

Poly-hierarchy. Systems need to shift from a strict mono-hierarchy (taxonomy) to a poly-hierarchy [2]. It is impossible to properly represent the real world in a strict mono-hierarchy where each category has only one parent. The categories in the real world can belong to more than one parents [2–5].
Formal definitions. Systems need formal definitions expressed as collections of different kinds of relationships between the concepts [2]. Formal definitions can be used by computers for formal manipulations of the categories which is impossible with unstructured text definitions [2]. One manipulation is to help a user locate a specific category in a terminology system [3]. A similar manipulation is locating where to include new categories in the system's structure [3–5]. Another manipulation is to test whether a pre-coordinated category is equivalent to a set of post-coordinated categories or whether different sets of post-coordinated categories are equivalent [5].
Multiple granularity. Systems need to have concepts of different granularity covering the same area [2]. Different use cases need systems with different granularities depending on the required level of detail of the categories. A multipurpose system therefore needs multiple granularities [2]. One use case is abstracting information in health records to allow compilations of health records' contents. Another use case represents sufficient detail of the information in health records in order to use the information in for example direct patient care, decision support and quality assurance [5].
Multiple consistent views. Systems need to be able to consistently present their content in different views [2]. Some use cases require a simple structure of the system's categories and others need a richer structure. The kind of structure depends on the required level of detail and required type of information of the categories [2, 3]. To present equivalent information independent of the view used, the views need to be consistent [2, 3].

Information reduction using medical terminology systems

Straub et al. have a different opinion than Rossi Mori et al. and Cimino as presented above [6]. They argue that the different kinds of medical terminology systems have different purposes and need to co-exist. Medical terminology systems with fewer categories and a semantic model with more restrictions, such as a hierarchical tree, provide useful information reduction or simplification for cases where a richer medical terminology system provides too much information [6].

Hierarchical trees are disjunctive and unidirectional [6]. Disjunctive means that all categories on one level are mutually exclusive and unidirectional signifies all hierarchical relations only go in one direction. Unfortunately, diseases are not disjunctive and unidirectionally related to each other and therefore it is not possible to construct a hierarchical tree of diseases based on the diseases' characteristics [6]. If a hierarchical tree is still constructed from disease categories, for which Straub et al. think there are good reasons, the hierarchical tree is artificial. The drawback of artificial hierarchical trees is the structure is arbitrary. This means that in the construction of the tree, some information is hidden -- the information on which the hierarchy is not based [6].

Modelling of health problems

ICD-10 [7] is primarily intended for statistical reporting and administrative tasks such as disease monitoring and quality assurance [8]. Although neither based on nor intended as a model of health problems, but pragmatically developed from the admittedly arbitrary structure proposed by William Farr in 1855 [9, 10], the ICD classifications are by far the most used terminology systems in electronic health records [11].

Farr's structure, which is reflected in how diseases are divided into chapters in the ICD-10, groups diseases into five sets [9, 10]:

epidemic diseases
constitutional or general diseases
local diseases arranged by site
developmental diseases
injuries

The presentation of ICD-10 [7, 9] focuses on the role as a member of a family of classifications rather than the internal structure. In ICF, one part of the introduction describes a conceptual framework for the classification [12]; that kind of model does not exist for ICD-10.

While ICD classifications are mono-hierarchical, the International Classification of Primary Care (ICPC) [13], originally published in 1987 and later in a second [14] and a revised second edition [15], is bi-axial, consisting of chapters and components. Here, a patient's reason for encounter, health problems to be taken care of and interventions are classified and coded according to a chapter structure. The chapter structure is based on body systems and problem areas and a set of components specifying the nature of the phenomenon coded such as a complaint, procedure or disease.

The move towards the third generation of terminology systems with formal definitions of disorders has proven to be a challenging task [16]. This is especially valid if diagnostic criteria are to be taken into account as is the case for psychiatric diagnoses in the Diagnostic and Statistical Manual for Mental Disorders (DSM) as well as ICD [17–20]. Version 3 of the Read Codes, a constituent of SNOMED CT, presented a template-based mechanism with attributes and values for basic semantic operations on items [21, 22]. A set of categories describing completeness of definitions was developed as a by-product in the process of disorder definition [21]. However, Version 3 of the Read Codes is still a second generation terminology system [1].

Héja et al. have presented work on formal definitions of the ICD-10 based on the GALEN [23] and DOLCE [24] formalisms with the main objective of providing a knowledge-based coding support tool. They found that although lexical processing [23] as well as existing terminology resources [24] may assist formal representation, ICD categories themselves--owing to the historical development rooted in epidemiological considerations--deviate from what is expected in contemporary ontology engineering [23]. The result is a need to distinguish the meaning of categories from the structure of the classification, which essentially was the underlying rationale in the early modelling work reported by Petersson et al. [25]. Such pitfalls of pragmatic classifications have also been reported in surgery, an area that modelling-wise is usually considered more straightforward than the domain of diseases [26].

Alecu et al. created a grouping of the categories in the World Health Organisation - Adverse Reaction Terminology (WHO-ART) based on mappings between WHO-ART and SNOMED CT in the Unified Medical Language System (UMLS) Metathesaurus [27]. More specifically, they used synonym relations between WHO-ART categories and SNOMED CT concepts, creating synonym relations for 85.9% of all categories.

As pointed out by Rossi Mori et al. [1], and demonstrated by Alecu et al. [27], second and third generation systems can augment first generation systems with easier re-organisation and maintenance and with harmonisation and cross-referencing of different first generation systems. The description of the categorical structure could also be used for systematic comparison of terminology systems such as ICD, ICPC and SNOMED CT. Ingenerf and Giere argue along the same line when they explore the different roles of statistical classifications and formal concept representation systems, deducing the need for co-existence and the former being linked to the latter [28]. The empirical results described above indicate these merely theoretical assertions require considerable thought before they are realised, which is consistent with the finding that little evidence, other than theoretical, exists on the usefulness of SNOMED in clinical practice [29].

Objective

The primary health care terminology system "Klassifikation av sjukdomar och hälsoproblem 1997 Primärvård" (KSH97-P) is based on the International Statistical Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10). The general objective is to explore whether mappings from KSH97-P to SNOMED CT and SNOMED CT's structure can be used to enrich KSH97-P's mono-hierarchical structure. The enrichment thereby hypothetically provides useful multiple views of a disease panorama as coded with a traditional disease classification. The objective contrasts with the related work by Héja et al. [23, 24] presented above, where the objectives were to develop a new formal concept representation system of ICD categories, but is in line with the intentions of Alecu et al. [27]. The results are discussed in relation to improvements of medical terminology systems as presented in the background.

The first specific question is whether SNOMED CT's poly-hierarchical generic structure can be used to add a multiple chapter division to KSH97-P's categories where each category may belong to more than one chapter. The second specific question is whether SNOMED CT's defining attribute relationships can be used to add attributes to KSH97-P's categories.

Methods

A glossary with explanations of the used terms is included at the end of this paper.

SNOMED CT

SNOMED CT is a clinical terminology intended for clinical documentation and reporting [30]. In other words, SNOMED CT covers both abstraction and representation [5]. It consists of concepts, descriptions and relationships [30].

Here, a concept is a clinical meaning and is identified by a unique number. Associated with each concept are two or more descriptions, which are human readable terms, and information about the terms [30].

Relationships link concepts to each other and are of different relationship types [30]. The generic relationship type Is a relates subtypes to supertypes and is always a defining relationship. All concepts, except for the root concept, have at least one Is a relation to a supertype concept [30]. The other relationship types that are defining relationships are the defining attribute relationships. The defining relationships logically represent a concept by establishing relationships between the concepts [30].

A concept in SNOMED CT can either be fully defined or primitive [30]. A fully defined concept is modelled as described above so it is possible to distinguish the concept from the other concepts through its relationships with other concepts. Primitive concepts lack one or more relationship(s) to be able to fully distinguish from other concepts using the concept's relationships [30]. There is also a concept model that controls which types of concepts can be related to which types of relations [30].

Concepts in SNOMED CT can be retired from active to inactive concepts [30]. Inactive concepts have historical relationships that relate the inactive concepts to active concepts. The historical relationships can be used to point out active concepts that replace inactive concepts [30].

KSH97-P

The Swedish National Board of Health and Welfare has worked out a primary health care version of the International Statistical Classification of Diseases and Related Health Problems (ICD-10) [31] (in Swedish, Klassifikation av sjukdomar och hälsoproblem 1997 Primärvård (KSH97-P)). Codes and rubrics (in both Swedish and English) of KSH97-P together with mappings to ICD-10 can be downloaded from their Web site [32].

KSH97-P contains 972 categories which concern diseases and health-related problems common in primary health care [31]. Most categories in KSH97-P correspond to categories in the three or four-character levels in ICD-10. Some categories in KSH97-P correspond to two or more similar categories in ICD-10. Some categories in ICD-10 which are less frequently used in primary health care have been merged with related unspecified categories in ICD-10 to corresponding categories with broader coverage in KSH97-P [31]. Rubrics in KSH97-P are as close to the Swedish translation of ICD-10 as possible [31].

Examples of KSH97-P categories and corresponding ICD-10 categories are [31]

KSH97-P category A00- Cholera corresponds to the ICD-10 category A00 Cholera.
KSH97-P category J45-P Asthma corresponds to the two ICD-10 categories J45 Asthma and J46 Status asthmaticus.
KSH97-P category H669P Otitis media, unspecified corresponds to the ICD-10 category H66.4 Suppurative otitis media, unspecified, which is more specific than the category in KSH97-P. H669P also corresponds to the ICD-10 category H66.9 Otitis media, unspecified, which is equally specific as the category in KSH97-P.

KSH97-P has the same chapter division as ICD-10. The exceptions are that ICD-10 chapter XX External causes of morbidity and mortality is left out from KSH97-P [31] and chapter XXII Codes for special purposes is left out in both the Swedish version of ICD-10 [33] and KSH97-P [31]. The rubric and number of categories in each chapter are included in Table 1.

Table 1 KSH97-P and KSH97-P mappings

Full size table

KSH97-P mixes categories related to ICD-10 categories in both three and four-character levels. Therefore, the National Board of Health and Welfare recommends to only compile statistics on the chapter level or to use customised groups of categories [31].

As described above, ICD-10, and thus KSH97-P [31], uses multiple principles for chapter division [7]. Some chapters contain categories related to a specific organ system and other chapters contain diseases with some specific aetiology. There are also chapters containing categories related to pregnancy, childbirth and the puerperium; the perinatal period; symptoms and partially specified cases; and important factors for contact with the health care system [7]. The preface to KSH97-P states these different kinds of chapter divisions may imply practical problems because it is not evident to which chapter a specific disease or condition belongs [31].

In ICD-10, and thus KSH97-P [31], a category can be only included in one chapter [7]. For those categories in which it would be possible to include more than one chapter, a decision has been made about into which chapter to include the category. This is demonstrated in ICD-10 by the excludes remarks on the chapter level. An excludes remark means that the categories in the remark could have been included in the chapter, but are instead included in other specified chapters [7]. Table 2 summarises the excludes remarks on the chapter level for three-character level exclusions [7]. The excludes remarks for four-character level exclusions on the chapter level are omitted because they only contain six categories [7].

Table 2 Exclusions of three-character categories in ICD-10

Full size table

Three-dimensional structure of KSH97-P

To transform KSH97-P from a first generation system to a second generation system, a three-dimensional additional structure was added to KSH97-P in a previous research project [34]. In the three-dimensional structure, each category was categorised according to location, origin and type [25].

Baseline category mapping

A baseline category mapping from KSH97-P's categories to SNOMED CT's concepts is used. The first phase of the mapping process is described in a reliability study where mapping was done by two coders [35]. KSH97-P was randomly divided into three sets of categories, used in three mapping sequences. Mapping was done independently by the coders and mapping rules were developed and agreed upon between the sequences. In the last round, mapping was completed through consensus decisions, following the mapping rules and striving to achieve a result with "completely concordant" mappings for each category. In the mapping, disorder and finding concepts were given priority and there was no use of navigational concepts [35]. The version used was the releases of SNOMED CT from January and July 2006. In summary, 14 (1%) of the 972 categories in KSH97-P did not have a matched concept in SNOMED CT, 888 (91%) were mapped to one concept, 64 (7%) were mapped to two concepts, and 6 (1%) were mapped to three concepts. Of the 958 mapped categories, 938 (98%) categories were mapped to clinical finding concepts and 20 (2%) categories were mapped to procedural concepts.

Examples of baseline category mappings are

KSH97-P category A00- Cholera is mapped to the SNOMED CT clinical finding concept Cholera.
KSH97-P category R252 Cramp and spasm is mapped to the SNOMED CT clinical finding concept Cramp and the clinical finding concept Spasm.
KSH97-P category D38- Neoplasm of uncertain or unknown behaviour of middle ear and respiratory and intrathoracic organs is mapped to the SNOMED CT clinical finding concept Neoplasm of intrathoracic organs and the clinical finding concept Neoplasm of middle ear and the clinical finding concept Neoplasm of respiratory tract.
KSH97-P category Z000 General medical examination is mapped to the SNOMED CT procedure concept General examination of patient.

Methods used

A flow chart of the used methods is presented in Figure 1.

Initial chapter mapping

Our study also needs a mapping from KSH97-P's chapters to SNOMED CT's concepts. The initial chapter mapping is therefore constructed during this study by the same persons mentioned above (Vikström et al. [35]).

The chapters are mapped to SNOMED CT's concepts based on the meaning of the chapter's rubric and a general assessment of both the chapter's content in ICD-10, using the international WHO-version of ICD-10 [7], and the subset of categories present in each chapter in KSH97-P. The same rules used for the category mapping and the excludes remarks in ICD-10 are considered as rules that do not exist in SNOMED CT. An example of an excludes remark is certain localized infections that should not be included in chapter I Certain infectious and parasitic diseases in ICD-10. Adequate mapping demanded good concordance between the rubric's meaning and the concept's meaning in SNOMED CT. For example, Neoplastic disease is considered a good match for chapter II Neoplasms. A concept could be considered as a reasonable match although it does not have relations to all categories from a certain chapter or it has relations to some categories from another chapter. An example is Obesity that is in chapter IV of ICD-10 but is not related to any of the mapped concepts in SNOMED CT as it is located directly under the Disease concept. The mapping is made to the SNOMED CT release January 2007.

Chapters XVIII Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified and XXI Factors influencing health status and contact with health services are assessed as unable to map to SNOMED CT's concepts. The combination of different symptoms and abnormal clinical and laboratory findings in chapter XVIII's rubric are not considered to be a clinical concept, but a collection of different phenomena in a rubric that do not map to a concept or post-coordinated expression of manageable size in SNOMED CT. Not elsewhere classified is generally difficult to map to SNOMED CT, because the negation entails that the meaning of all other KSH97-P chapters has to be excluded in the mapped SNOMED CT concepts. Chapter XXI's rubric is likewise considered difficult to interpret as a clinical concept that could be mapped to a concept or post-coordinated expression of manageable size in SNOMED CT. This is especially due to the combination with the many categories in the chapter that describe procedures, not conditions and factors.

In summary, 2 (10%) of the 20 chapters in KSH97-P did not have a matched concept in SNOMED CT, 14 (70%) were mapped to one concept, 1 (5%) was mapped to two concepts, 2 (10%) were mapped to three concepts, and 1 (5%) was mapped to four concepts.

Examples of initial chapter mappings are

KSH97-P chapter I Certain infectious and parasitic diseases is mapped to the SNOMED CT concept Infectious disease.
KSH97-P chapter XIII Diseases of the musculoskeletal system and connective tissue is mapped to the SNOMED CT concept Disorder of musculoskeletal system and the concept Disorder of connective tissue.

Mappings update

To improve the baseline category mapping and initial chapter mapping, the mappings are converted to the same SNOMED CT release, the category mappings and the chapter mappings are compared manually and statistical chapter mappings are calculated and compared with the manual mappings. All these steps are described below.

Release conversion

To be able to use only one release of SNOMED CT in this study, the baseline category mapping is transformed to SNOMED CT release January 2007 UK Edition. The transformation is done by keeping mappings to active concepts. For mappings to inactive concepts, a manual inspection of SNOMED CT's historical relationships from inactive concepts to active concepts is performed. When the manual inspection shows that the historical relationships replace the inactive concepts with suitable active concepts, then the active concepts are used for the new mapping. When the active concepts are not suitable for the mappings, new mappings are constructed manually.

Examples of updates during the release conversions are

KSH97-P category E108P Insulin-dependent diabetes mellitus with complications has its map updated from the inactive SNOMED CT concept Type I diabetes mellitus with complication to the active concept Disorder associated with type I diabetes mellitus using the historical relationships.
KSH97-P category M549P Dorsalgia NOS has its map updated from the inactive SNOMED CT concept Back pain to the active concept Backache using the historical relationships.

Manual comparison of category and chapter mappings

To check that the category and chapter mappings are not unintentionally mapped to different hierarchies in SNOMED CT, the category and chapter mappings are compared as described below.

SNOMED CT concepts to which any of the chapter mappings maps are collected in a set. For each concept in the set, the concepts' descendants are recognised and added to the set. The categories which do not map to any of the concepts in the set are then manually inspected. During the manual inspection, the categories' mappings are inspected together with relevant chapter mappings and the categories' and chapters' mappings are updated if suitable.

An example of an update during the manual comparison of category and chapter mappings is

KSH97-P chapter II Neoplasms has its map updated from the SNOMED CT concept Neoplastic disease to the concept Neoplasm and/or hamartoma to better cover the categories in the chapter.

Statistical chapter mapping

A statistical chapter mapping is created for comparison with the manual chapter mapping. The statistical chapter mapping prefers concepts where the descendants are targets of many categories in the same chapter but few categories from other chapters. The creation of the mapping is described below.

The statistical chapter mapping is based on two quantities calculated for each combination of a chapter in KSH97-P and a concept in SNOMED CT (n times m possible instances, where n is the number of chapters and m is the number of concepts):

categories current chapter (c): the number of categories in the current KSH97-P chapter that are mapped to the current SNOMED CT concept or any of its descendants.
categories other chapters (o): the number of categories in other chapters than the current KSH97-P chapter that are mapped to the current SNOMED CT concept or any of its descendants.

For each combination of a chapter in KSH97-P and a concept in SNOMED CT where c > 0, the following score is calculated:

In other words, the calculations above determine the number of "correct" categories weighted with the compactness of "correct" categories in proportion to all categories.

For each chapter in KSH97-P, all SNOMED CT concepts are then ranked. The concept with the highest score is ranked as the best statistical chapter mapping and the concept with the second highest score is ranked as the second best statistical chapter mapping et cetera.

Examples of statistical chapter mappings are

For KSH97-P chapter I Certain infectious and parasitic diseases, the best statistical chapter mapping is mapped to the SNOMED CT concept Infectious disease, the second best to Bacterial infectious disease, the third best to Infection by site, the fourth best to Viral disease and the fifth best to Disease due to Gram-negative bacteria.
For KSH97-P chapter XVIII Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified, the best statistical chapter mapping is mapped to the SNOMED CT concept Clinical history and observation findings, the second best to General finding of observation of patient, the third best to Clinical finding, the fourth best to Neurological finding and the fifth best to Finding by method.

Comparison of manual and statistical chapter mappings

A further check that the category and chapter mappings are not unintentionally mapped to multiple hierarchies in SNOMED CT is performed by comparison of the manual chapter mapping and the statistical chapter mapping as described below.

For each chapter, the manual chapter mapping is compared with the statistical chapter mappings. If a highly ranked statistical chapter mapping subsumes more of the concepts mapped from categories in the chapter than the manual mapping and the statistical chapter mapping is in line with the mapping rules, then the manual mapping is updated.

Final mappings

The final mappings, which are the results of the mapping updates described above, are used in the rest of this study. The final category mapping is included as Additional file 1 and the final chapter mapping is included as Additional file 2. A summary of KSH97-P and the final mappings is included in Table 1.

Multiple chapter division

To examine whether the poly-hierarchical Is a relationships of SNOMED CT can be used to replace KSH97-P's mono-hierarchical chapter division with a poly-hierarchical chapter division, KSH97-P's categories are divided into a multiple chapter division using SNOMED CT's Is a relationships. The multiple chapter division is generated using the algorithm described and exemplified below.

For each category, the mapped concepts are extracted together with their ancestors to a mapped set. This creates one mapped set for each mapped KSH97-P concept. If one or more chapters are mapped to any of the concepts in the mapped set, the category related to the mapped set is assumed to belong to these chapter(s)--regardless of what chapter they originally belong to. This means that each category may belong to zero, one or more new chapter(s).

In the example below, the multiple chapter division algorithm is applied to the category A00- Cholera. The algorithm is illustrated in Figure 2.

The algorithm begins by creating a mapped set from an empty set. First, the category A00- Cholera, which is shown as a red ellipse in Figure 2, and its mapping are used to locate the concept(s) the category is mapped to. The algorithm finds that the category A00- Cholera is mapped to the concept Cholera and the concept Cholera is therefore added to the mapped set. All ancestors to the concept Cholera are then also added into the mapped set. The resulting mapped set consists of the concepts shown in black rectangles in Figure 2.

The algorithm then uses the mapped set to evaluate if any chapter(s) maps to any concept(s) in the mapped set. The algorithm finds that chapter I Certain infectious and parasitic diseases maps to the concept Infectious disease in the mapped set, and chapter XI Diseases of the digestive system maps to the concept Disorder of digestive system in the mapped set. The category A00- Cholera is therefore assumed to belong to the chapters I Certain infectious and parasitic diseases and XI Diseases of the digestive system according to the multiple chapter division. These chapters are shown as green shaded ellipses in Figure 2.

Additional attributes

To examine whether the defining attribute relationships of SNOMED CT can extend KSH97-P categories with attributes, a list of additional attributes is created. The additional attributes are generated using the algorithm described and exemplified below.

For each category the mapped concepts are extracted together with their ancestors to a mapped set. This creates one mapped set for each mapped KSH97-P concept. (The mapped sets are created in the same way as for the multiple chapter division.) Then all defining attribute relationships from concepts in the mapped set are followed and the target concepts are included in a specific attribute value set for each relationship type. In each attribute value set, the concepts that are ancestors of another concept in the same attribute value set are removed. The remaining concepts in each attribute value set constitute attribute values of the respective attribute types for that category.

In the example below, the additional attributes algorithm is applied to the category A00- Cholera. The algorithm is illustrated in Figure 2.

The algorithm begins by creating a mapped set from an empty set. First, the category A00- Cholera, which is shown as a red ellipse in Figure 2, and its mapping are used to locate the concept(s) the category is mapped to. The algorithm finds that the category A00- Cholera is mapped to the concept Cholera and the concept Cholera is therefore added to the mapped set. All ancestors to the concept Cholera are then also added into the mapped set. The resulting mapped set consists of the concepts shown in black rectangles in Figure 2. (This step is the same as in the multiple chapter division algorithm.)

The additional attributes algorithm uses the mapped set to follow each defining attribute relationship and include all the target concepts in different attribute value set according to which attribute type they are related. The referred concepts shown as blue shaded rectangles in the left of Figure 2 are included in the value set of type Causative agent. The referred concepts shown as blue shaded rectangles in the upper right area of Figure 2 are included in the value set of type Finding site. The referred concept Transudate shown as a blue shaded rectangle in the lower right area of Figure 2 is included in the value set of type Associated morphology. Then, in each value set, the concepts that are supertypes of other concepts in the same value set are removed from the value set. In the value set of type Causative agent, only the concept Vibrio cholerae is left, in the value set of type Finding site only the concept Intestinal structure remains and the value set of type Associated morphology only contains one concept so that value set is unchanged. The category A00- Cholera then is assumed to have an attribute of type Causative agent with value Vibrio cholera, an attribute of type Finding site with value Intestinal structure and an attribute of type Associated morphology with value Transudate.

Even if many categories have attributes of a specific attribute type, the usefulness of these attributes can be of limited value if many attributes share the same attribute value. For example, it is of limited use to know that most categories have attributes of the attribute type Finding site with the attribute value Body structure. We measure the distribution of the attribute values as the proportion of categories that relate attributes of the same attribute type to the same attribute value.

Fully defined and primitive ancestors

The quality of the multiple chapter division and additional attributes is dependent on how completely modelled the concepts that are mapped from KSH97-P's categories and these concept's ancestors are. (These concepts are the concepts in the mapped sets.) The mapped concepts and their ancestors are therefore extracted and the number of fully defined concepts and primitive concepts are counted. The numbers of outgoing defining relationships from fully defined and primitive concepts are also counted. The proportion of fully defined concepts in SNOMED CT in total are also counted.

Examples of fully defined concepts and primitive concepts are

The concept Digestive system finding is a fully defined concept and is therefore fully defined by its defining relationships' types and targets listed below
- ○ Finding site; Structure of digestive system
- ○ Is a; Finding by site
The concept Accidental poisoning is a primitive concept and is therefore not fully defined by its defining relationships' types and targets listed below
- ○ Is a; Poisoning
The concept Cholera is a fully defined concept and is therefore fully defined by its defining relationships' types and targets listed below
- ○ Associated morphology; Transudate
- ○ Causative agent; Vibrio cholerae
- ○ Finding site; Intestinal structure
- ○ Is a; Infection due to Vibrio
- ○ Is a; Intestinal infectious disease due to Gram-negative bacteria

Computational environment

The computational methods described above are performed in a relational database management system (PostgreSQL). SNOMED CT, KSH97-P and the mappings are stored in tables and the computations are executed by SQL queries.