Matching biomedical ontologies based on formal concept analysis

Zhao, Mengyi; Zhang, Songmao; Li, Weizhuo; Chen, Guowei

doi:10.1186/s13326-018-0178-9

Research
Open access
Published: 19 March 2018

Matching biomedical ontologies based on formal concept analysis

Mengyi Zhao^1,2,
Songmao Zhang¹,
Weizhuo Li^1,2 &
…
Guowei Chen^1,2

Journal of Biomedical Semantics volume 9, Article number: 11 (2018) Cite this article

5134 Accesses
37 Citations
Metrics details

Abstract

Background

The goal of ontology matching is to identify correspondences between entities from different yet overlapping ontologies so as to facilitate semantic integration, reuse and interoperability. As a well developed mathematical model for analyzing individuals and structuring concepts, Formal Concept Analysis (FCA) has been applied to ontology matching (OM) tasks since the beginning of OM research, whereas ontological knowledge exploited in FCA-based methods is limited. This motivates the study in this paper, i.e., to empower FCA with as much as ontological knowledge as possible for identifying mappings across ontologies.

Methods

We propose a method based on Formal Concept Analysis to identify and validate mappings across ontologies, including one-to-one mappings, complex mappings and correspondences between object properties. Our method, called FCA-Map, incrementally generates a total of five types of formal contexts and extracts mappings from the lattices derived. First, the token-based formal context describes how class names, labels and synonyms share lexical tokens, leading to lexical mappings (anchors) across ontologies. Second, the relation-based formal context describes how classes are in taxonomic, partonomic and disjoint relationships with the anchors, leading to positive and negative structural evidence for validating the lexical matching. Third, the positive relation-based context can be used to discover structural mappings. Afterwards, the property-based formal context describes how object properties are used in axioms to connect anchor classes across ontologies, leading to property mappings. Last, the restriction-based formal context describes co-occurrence of classes across ontologies in anonymous ancestors of anchors, from which extended structural mappings and complex mappings can be identified.

Results

Evaluation on the Anatomy, the Large Biomedical Ontologies, and the Disease and Phenotype track of the 2016 Ontology Alignment Evaluation Initiative campaign demonstrates the effectiveness of FCA-Map and its competitiveness with the top-ranked systems. FCA-Map can achieve a better balance between precision and recall for large-scale domain ontologies through constructing multiple FCA structures, whereas it performs unsatisfactorily for smaller-sized ontologies with less lexical and semantic expressions.

Conclusions

Compared with other FCA-based OM systems, the study in this paper is more comprehensive as an attempt to push the envelope of the Formal Concept Analysis formalism in ontology matching tasks. Five types of formal contexts are constructed incrementally, and their derived concept lattices are used to cluster the commonalities among classes at lexical and structural level, respectively. Experiments on large, real-world domain ontologies show promising results and reveal the power of FCA.

Background

Ontologies aim to model domain conceptualizations so that applications built upon them can interoperate with each other by sharing the same meanings. Such knowledge sharing and reuse can be severely hindered by the fact that ontologies for the same domain are often developed for various purposes, differing in coverage, granularity, naming, structure and many other aspects. Ontology matching (OM) techniques aim to alleviate the heterogeneity by identifying correspondences across ontologies. Ontology matching can be performed at the element level and the structure level [1]. The former considers ontology classes and their instances independently, such as string-based and language-based techniques, whereas the latter exploits relations among entities, including graph-based and taxonomy-based techniques. Most ontology matching systems [2–8] adopt both element and structure level techniques to achieve better performance.

Life sciences is one of the most successful application areas of the Semantic Web technology, and many biomedical ontologies have been developed and utilized in real-world applications. These ontologies cover different yet overlapping domains and are often of large scale, including, for example, the Foundational Model of Anatomy (FMA) [9] and Adult Mouse Anatomy (MA) [10] for anatomy, National Cancer Institute Thesaurus (NCI) [11] for disease, and Systematized Nomenclature of Medicine-Clinical Terms (SNOMED-CT) [12] for clinical medicine. Moreover, efforts such as the Unified Medical Language System (UMLS) [13] integrate various biomedical systems so as to enhance their reuse and interoperability. For such biomedical domain ontologies, the annual Ontology Evaluation Alignment Initiative (OAEI) [14] sets three competition tracks, the Anatomy, the Large Biomedical Ontologies, and the Disease and Phenotype, which have attracted many state-of-the-art ontology matching systems [2–4, 7, 8] to challenge.

Among the first batch of OM algorithms and tools proposed in the early 2000s, FCA-Merge [15] distinguished in using the Formal Concept Analysis (FCA) formalism to derive mappings from classes sharing textual documents as their individuals. Proposed by Rudolf Wille [16], FCA is a well developed mathematical model for analyzing individuals and structuring concepts. FCA starts with a formal context consisting of a set of objects, a set of attributes, and their binary relations. Concept lattice, or Galois lattice, can be computed based on formal context, where each node represents a formal concept composed of a subset of objects (extent) with their common attributes (intent). The extent and the intent of a formal concept uniquely determine each other in the lattice. Moreover, the lattice represents a concept hierarchy where one formal concept becomes sub-concept of the other if its objects are contained in the latter.

Both ontologies and FCA aim at modeling “concepts” in hierarchical structures. The purpose of an ontology is to represent “a shared understanding of the domain of interest” [17] that can be queried and reasoned upon in an automated way. On the other hand, FCA is a conceptual clustering technique with solid mathematical foundations, allowing to derive concept hierarchies from datasets. Ontologies and FCA can complement each other, as analyzed in [18] from an application point of view. FCA can naturally be applied to constructing ontologies in ontology engineering [19–21], and is also widely used in data analysis, information retrieval, and knowledge discovery.

Following the steps of FCA-Merge, several OM systems continued to use FCA as well as its alternative formalisms, exploiting different entities as the sets of objects and attributes for constructing formal contexts [22–26]. FCA-OntMerge [23], for example, utilizes the classes of ontologies and their attributes to form its formal context, whereas in [22] the formal context is composed of ontology classes as objects and terms of a domain-specific thesaurus as attributes. Different types of formal contexts decide the information used for ontology matching, and we observed that some intrinsic and essential knowledge of ontology has not been involved yet, including both textual information within classes (e.g., class labels and synonyms) and relationships among classes (e.g., ISA, sibling, disjointedness relations, and properties and axioms).

This motivated the study in this paper, i.e., empowering FCA with as much as ontological information as possible for identifying and validating mappings across ontologies. Our method, called FCA-Map, incrementally generates a total of five types of formal contexts and extracts mappings from the lattices derived. First, the token-based formal context describes how class names, labels and synonyms share lexical tokens, leading to lexical mappings (anchors) across ontologies. Second, the relation-based formal context describes how classes are in taxonomic, partonomic and disjoint relationships with the anchors, leading to positive and negative structural evidence for validating the lexical matching. Third, after conflict repairing, the positive relation-based context can be used to discover structural mappings. Afterwards, the property-based formal context describes how object properties are used in axioms to connect anchor classes across ontologies, leading to property mappings. Last, the restriction-based formal context describes co-occurrence of classes across ontologies in anonymous ancestors of anchors, from which extended structural one-to-one mappings and complex mappings can be identified.

We participated in the three OAEI 2016 tracks related to the biomedical domain, and the results demonstrate the effectiveness of FCA-Map and its competitiveness with the OAEI top-ranked OM systems. FCA-Map is one of the three winners of the Disease and Phenotype track of the OAEI 2016 campaign. Our method is suitable for aligning large-scale domain ontologies with rich lexical and structural knowledge, due to a comprehensive construction of multiple FCA structures using names, hierarchies, properties, and axioms. This requires that ontologies provide meaningful lexical symbols and terms for classes, deep taxonomic hierarchies, and a large number of classes and expressive logical axioms specifying restrictions on properties linking classes. Such conditions can be satisfied by many ontologies in the biomedical domain, for which FCA-Map is effective and succeeds in discovering mappings that are missed by other OM systems.

The rest of the paper is organized as follows. We first introduce the basic definitions and characteristics of FCA. An overview of the FCA-Map method is presented, followed by five sections describing the five types of formal contexts and the derivation of mappings in detail. The evaluation section presents a comprehensive group of experiments, including the respective empirical results of the five steps as well as step-wise comparisons with counterparts. The evaluation also includes comparisons with OAEI 2016 top-ranked systems and previous FCA-based OM systems. Finally, we analyze in-depth the advantages and limitations of FCA-Map in contrast with other OM systems and FCA-based systems, and discuss the future work, followed by a conclusion.

Preliminaries

Formal Concept Analysis (FCA) is a mathematical theory of data analysis based on applied lattice and order theory. FCA constructs formal contexts for objects and their attributes, and then derives concept hierarchical structures which constitute lattices. Formal context is defined as a triple $\mathbb {K}:=(G,M,I)$, where G is a set of objects, M a set of attributes, and I a binary relation between G and M in which gIm holds, i.e., (g,m)∈I, reads: object g has attribute m [27]. Formal contexts are often illustrated in binary tables, as exemplified by Table 1, where rows correspond to objects, columns to attributes, and a cell is marked with “ ×” if the object in its row has the attribute in its column. In Table 1, the marked cell represents that the animal listed in the row possesses the corresponding feature in the column.

Table 1 An example formal context $\mathbb {K}_{e}$

Matching biomedical ontologies based on formal concept analysis

Abstract

Background

Methods

Results

Conclusions

Background

Preliminaries

Definition 1

Methods

Constructing the token-based formal context to acquire lexical anchors

Definition 2

Constructing the relation-based formal context to validate lexical anchors

Definition 3

Definition 4

Constructing the positive relation-based formal context to discover structural matches

Constructing the property-based formal context to acquire property mappings

Definition 5

Constructing the restriction-based formal context to acquire extended and complex mappings

Definition 6

Results

The results of the token-based lexical matching

A comparison with TFIDF

The results of the relation-based structural validation

A comparison with the incoherence detection and repairing

The results of the positive relation-based structural matching

Comparing with another structural matching method

The results of the property matching

The results of the restriction-based structural matching

A comparison with the OAEI 2016 top-ranked systems

A comparison with FCA-Merge

Discussion

Comparing with the OAEI 2016 top-ranked systems

Comparing with previous FCA-based OM systems

Identifying complex mappings

Limitations and future work

Conclusion

Notes

Abbreviations

References

Acknowledgements

Funding

Availability of data and materials

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Journal of Biomedical Semantics

Contact us