AML overview
AML is an ontology matching system originally developed to tackle the challenges of matching large biomedical ontologies [11], as its namesake and predecessor AgreementMaker [19] was not designed to handle ontologies of this size. While AML’s scope has since expanded, biomedical ontologies have remained one of the main drives behind its continued development.
AML’s ontology matching pipeline is divided into three phases: ontology loading, matching, and filtering. The pipeline is illustrated in Fig. 1.
In the ontology loading phase, the input ontologies are loaded using the OWL API [25], then parsed into AML’s data structures [11]. The most important of these are the Lexicon, which stores all the lexical information of an ontology in normalized form, and the RelationshipMap, which stores the structural information.
In the matching phase, AML’s various matching algorithms (or matchers) are executed and combined. These include [11, 12, 26]:
-
The LexicalMatcher, which finds literal full-name matches between the Lexicon entries of two ontologies.
-
The WordMatcher, which finds matches between entities by computing the word overlap between their Lexicon entries.
-
The StringMatcher, which finds matches between entities by computing the string similarity between their Lexicon entries using the ISub metric [27].
-
The ThesaurusMatcher, which find literal full-name matches involving synonyms inferred from an automatically generated thesaurus, as we will detail in the next subsection.
-
The MediatingMatcher, which employs the LexicalMatcher to align each of the input ontologies to a third background ontology, and then intersects those alignments to derive an alignment between the input ontologies.
-
The XRefMatcher, which is analogous to the MediatingMatcher, but relies primarily on OBO [10] cross-references between the background ontology and the input ontologies.
-
The LogicalDefMatcher, which matches classes that have equal or corresponding OBO [10] logical definitions, as we will detail in a subsequent subsection.
In the filtering phase, AML applies algorithms that remove problem-causing mapping candidates from the preliminary alignment to generate the final alignment. The problems that are addressed include cardinality conflicts (i.e., cases where a class of one ontology is mapped to more than one class of the other ontology) and logical conflicts (i.e., cases where two or more mappings cause the input ontologies to become unsatisfiable when merged via those mappings).
Cardinality conflicts are resolved using the heuristic Selector algorithm, which selects mappings in descending order of similarity score in one of its three modes: ‘strict’, in which all cardinality conflicts are resolved; ‘permissive’, which accepts cardinality conflicts in the case of similarity score ties; and ‘hybrid’, which accepts conflicting pairs of mappings with high similarity score (above 0.75) and otherwise behaves as the ‘permissive’ mode [11]. Logical conflicts are resolved by the Repairer algorithm [23].
Handling large ontologies
There are three key strategies implemented by AML and other efficient ontology matching systems to match large ontologies: hash-based searching, parallelization, and search space reduction. Additionally, large ontologies also pose problems with respect to the memory requirements of the similarity matrix.
Hash-based searching
The hash-based searching strategy is the most critical strategy for scalability, as it effectively reduces the time complexity of the matching problem from quadratic to linear. This strategy relies on using data structures based on HashMaps, with inverted indices, to store the lexical information of the ontologies. By inverted indices, we mean that rather than having the class ids as keys, their lexical attributes (e.g., the various labels and synonyms, or the words these contain) are used as keys and the values are the sets of ids of the classes that have each attribute. This enables matching systems to simply check whether each lexical attribute of one ontology occurs in the other, rather than making pairwise comparisons of the classes of the two ontologies. Since the attributes are HashMap keys, and HashMap access normally has O(1) time complexity, the hash-based searching strategy has O(n) complexity overall, where n is the number of lexical attributes in the ontology with the least attributes. By contrast, the traditional pairwise matching strategy has O(mn) complexity where m and n are the number of lexical attributes in the two ontologies to match.
The one limitation of hash-based searching is that it is usually restricted to finding equal attributes—at least when default Java String hash keys are used, as is the case in AML. Thus, it can be employed for literal full-name matches (LexicalMatcher), for matches based on overlapping words (WordMatcher), or even overlapping n-grams (not implemented in AML), but not for traditional string similarity comparisons (StringMatcher). Moreover, the effectiveness of the hash-based searching strategy hinges heavily on normalizing the lexical attributes a priori, in order to maximize the number of equal entries found.
In the case of AML, lexical attributes are normalized upon entry in the Lexicon, during the ontology loading stage. This normalization consists in removing all non-word non-digit characters (except parentheses and dash), inserting white spaces where capitalization is found within words (e.g., “hasPart” becomes “has Part”), and finally converting all characters to lower case. However, because biomedical ontologies may include special formulas (chemical or otherwise), AML uses patterns to detect whether a lexical attribute is a normal word-based name or a formula. In the latter case, the only normalization done is the replacement of underscores with white spaces.
Parallelization
Parallelization is a common strategy for improving computational efficiency that exploits the multi-core architecture of modern CPUs. In the context of ontology matching, it typically consists on distributing the computational load by the available cores by either running different (matching) algorithms in parallel or dividing an algorithm into a set of tasks and running those in parallel. While parallelization does not affect the computational complexity of the underlying algorithms, it can reduce their execution time by a factor of up to N, where N is the number of available CPU cores.
AML’s StringMatcher and Repairer algorithms are both implemented for parallelization via subdivision into parallel tasks, given that they are the two main bottlenecks in AML’s matching pipeline. AML’s remaining matching and filtering algorithms are not implemented for parallelization because they have linear complexity and run in at most a few seconds for even the largest ontologies, so the gain in parallelizing them would be negligible to AML’s total run time.
Search space reduction
Under search space reduction, we include the two families of strategies that aim to reduce the search space of the ontology matching problem—partitioning and pruning—as well as the strategy that aims to reduce the scale of the alignment repair problem—modularization.
Partitioning or blocking consists in dividing the ontologies into (usually vertical) partitions or blocks in order to transform a single large matching problem into several smaller ones [28]. Its simplest application is to reduce the memory requirements of the matching task, as is the case in AML’s WordMatcher algorithm. However, it can also be used to reduce the search space of the matching problem by determining which blocks have a significant overlap (typically using a hash-based searching strategy) and attempting to match only those [29]. In this application, it can improve not only the efficiency but also the effectiveness of the matching process, by excluding false positives.
Pruning encompasses any strategy that dynamically avoids comparing parts of the ontologies without partitioning them beforehand [28]. The most common of these strategies is precisely hash-based searching, as it effectively only makes comparisons between entities that have equal HashMap indices (be they names, words, or n-grams). In addition to this form of pruning, AML employs another form called local matching when applying traditional pairwise matching algorithms (such as the StringMatcher) to large ontologies. This strategy consists of matching entities only in the neighborhood of mapped entities found using more efficient (and reliable) hash-based search algorithms. Like blocking, it not only improves computational efficiency but can also help filter false positives.
Modularization consists of identifying the classes that are semantically relevant for determining whether an alignment is coherent in order to reduce the search space of the repair problem. It is akin to partitioning, but is carried out after the matching stage, and contemplates both the input ontologies and the alignment between them. To enable modularization and reduce the complexity of the repair problem, repair algorithms tend to consider simplifications of the Description Logic of OWL—for instance, the repair algorithms of both AML and LogMap are based on propositional logic [13, 23]. AML’s modularization reduces the search space of the repair problem both with regard to the classes that must be tested for satisfiability (since most tests are logically redundant) and with regard to the classes that must be searched (only those with multiple parents, or involved in mappings or logical restrictions) [23].
Similarity matrices
Another consideration that is critical for matching large ontologies is that the memory requirement of a similarity matrix between two ontologies scales quadratically with their size. For example, for the FMA-SNOMED whole task of the OAEI large biomedical ontologies track, the similarity matrix would require an unwieldy 72 GB RAM if similarity scores were stored with 8 Byte precision. The strategy that AML and other efficient matching systems employ to circumvent this problem is to store a sparse matrix with only the meaningful similarity scores (i.e., those above a certain threshold, such as 0.5). In the case of AML, this matrix is stored in the form of both a list of mapping candidates, to enable sorting and selection, and a HashMap-based table, to enable efficient searching. Each of AML’s matchers produces one such sparse matrix, or preliminary alignment, which can be combined with others either by simple union (keeping the highest score for the same mapping) or hierarchically (by adding only mappings from a less precise matcher that don’t conflict with those of more precise matchers).
Handling the rich vocabulary of biomedical ontologies
Processing lexical annotations
AML, like most ontology matching systems that perform well in the biomedical domain, takes into account a wide range of lexical annotations from biomedical ontologies. Namely, AML stores in the Lexicon the local names (when not alphanumeric codes), labels, and all annotations with properties corresponding to labels or synonyms (e.g., “prefLabel”, “hasExactSynonym”, “FULLSYN”). The various annotations are condensed into four lexical categories: ‘localName’, ‘label’, ‘exactSynonym’, and ‘otherSynonym’. While this mapping is automatic, it covers the large majority of the annotation properties presently in use in biomedical ontologies and thesauri.
One strategy that, to the best of our knowledge, solely AML employs is that it assigns different numeric weights to each of its lexical categories, and uses these weights to score each mapping of lexical origin. The weighting scheme employed by AML is fixed, meaning that each lexical category is given a predetermined weight that reflect its expected reliability. This approach helps improve the effectiveness of AML’s Selector as it leads to less similarity ties and to mappings based on more reliable annotations being scored higher than those based on less reliable ones.
Inferring new synonyms
AML employs several strategies for automatically generating new synonyms, with the goal of improving the coverage and effectiveness of its hash-based searching algorithms. Having more synonyms increases the likelihood that corresponding concepts are described using equal lexical entries, and thus will tend to increase recall, but may also decrease precision.
One strategy AML employs is to automatically generate synonyms for classes by removing stop words from their names, using a predefined stop word list, as well as by removing name portions within parentheses. For example, for the SNOMED lexical entry “structure of nervous system”, AML generates the synonym “nervous system” by removing the leading stop words “structure” and “of”, and adds this synonym to Lexicon assigned to all classes for which the original entry was assigned. Analogously, for the NCI lexical entry “mixed mesodermal (mullerian) tumor”, AML generates the synonym “mixed mesodermal tumor” by removing the section within parentheses.
Another strategy AML employs for synonym generation consists in generating a thesaurus by comparing the various annotations of each class, and then using this thesaurus to generate new synonyms [12, 18]. For example, given a lexical analysis of the annotations ’stomach serosa’ and ’gastric serosa’ for Mouse Gross Anatomy Ontology (MA) class MA_0001626, AML would add to its thesaurus that ’stomach’ and ’gastric’ are synonymous words. It would then use this information to generate new synonyms for lexical entries containing either of the words by replacing it with the other. In order to contain the loss in precision that this strategy tends to generate, AML employs it in a dedicated matching algorithm, the ThesaurusMatcher, which finds only exact matches involving synonyms generated by the thesaurus.
Finally, AML can also use background knowledge sources to generate synonyms, but this strategy is detailed in the next subsection.
Exploiting background knowledge
Background knowledge selection
The problem of automatically identifying relevant sources of background knowledge has been the subject of several studies [8, 9]. Most rely on analyzing the background knowledge sources to determine their overlap with the input ontologies, yet overlap does not imply usefulness. A background knowledge source is only useful if it contains (lexical or structural) knowledge not contained in the input ontologies and which is relevant to match them, or in other words, if we can find new mappings by using it (assuming it is reliable, and thus the mappings will mostly be correct). Given that, when employing a hash-based search algorithm, the difference in cost between computing a background knowledge alignment and computing an overlap is negligible, we might as well do the former and obtain a more direct measure of usefulness.
These are the foundations of AML’s algorithm for automatic selection of background knowledge sources [8]. This algorithm employs the concept of mapping gain, defined as the relative number of new mappings that an alignment would add to another alignment, as measure of usefulness. In a first stage, it uses the mapping gain over the baseline LexicalMatcher alignment to measure the individual usefulness of each candidate background knowledge source, and preselect them. In a second stage, it iterates through the preselected sources in descending order of individual mapping gain, recomputes the mapping gain over the current baseline alignment, and if significant, adds that background knowledge alignment to the baseline. Thus, it can not only identify the most promising individual background knowledge source, but also select a near-optimal combination of multiple background knowledge sources.
Information sources
Like most matching systems, AML relies primarily on the lexical information of background knowledge ontologies (MediatingMatcher). However, when OBO cross-references are available, it can use them instead of or in addition to the lexical information via its XRefMatcher [26]. Cross-references are essentially manually-curated mappings between an OBO ontology and others, listed in the ontology itself. For example, the UBERON class UBERON_0001275 (“pubis”) includes cross-references (via annotation property “hasDbXRef”) to FMA class 16595 (“pubis”) and NCI class C33423 (“pubic bone”). AML’s XRefMatcher employs these cross-references instead of performing lexical matches between the input ontologies and the background knowledge ontology, then like the MediatingMatcher, intersects the background knowledge alignments to derive an alignment between the two input ontologies. In the example above, if we were matching FMA to NCI using UBERON as a background knowledge source, it would map the FMA class to the NCI class because they are referenced by the same UBERON class.
Cross-references do not necessarily correspond to equivalence relations; all that is implied is a close semantic overlap. However, the same could also be said of ontology mappings: even if formally equivalence is always implied, the strictness with which it is meant varies from mapping to mapping. Thus, we found cross-references to be more reliable than literal lexical matches for inferring background knowledge mappings. For this reason, AML’s XRefMatcher supersedes its MediatingMatcher, as it uses cross-references when these are available, but complements them with lexical matches when the latter provide at least twice the coverage of the input ontology. Thus, it contemplates cases such as cross-references only being available for one of the input ontologies, as well as being available for both but only covering part of them.
Background knowledge usage
In addition to the traditional use of background knowledge ontologies as mediators, AML can also use them for lexical expansion, i.e., to generate new synonyms in the input ontologies. This strategy consists in adding, for each class of each of the ontologies to match that has a correspondence to a class of the background knowledge ontology, all the lexical entries of the latter as new synonyms. These correspondences must first be established by mapping the input ontologies to the background knowledge ontology, via either the MediatingMatcher or the XRefMatcher.
Given that the problem of handling large ontologies is compounded when using background knowledge ontologies, as not one but three matching tasks are required, the lexical expansion strategy enables AML to harness the knowledge contained in background knowledge ontologies more efficiently. It makes no difference from the use of background knowledge ontologies as mediators with regard to finding full-name matches, but it allows for partial matches to be indirectly derived from the background knowledge ontology with a single use (rather than three) of either the WordMatcher or the StringMatcher. However, deriving indirect partial matches can lead to a significant decrease in precision, meaning that this strategy can be less reliable than the mediating strategy.
Using logical definitions
AML has recently begun exploring the use of the logical definitions encoded in OBO Foundry ontologies [10] for ontology matching [12]. Logical definitions (or cross-products) correspond to composite mappings, where a class of one ontology is declared as equivalent to the intersection of two or more other classes of different ontologies. For example, the Human Phenotype Ontology (HP) [30] class HP_0000892 (“bifid ribs”) corresponds to Phenotypic Quality Ontology [31] class PATO_0000403 (“cleft”) inhering in the UBERON class UBERON_0002228 (“rib”) with modifier PATO_0000460 (“abnormal”), as depicted in Fig. 2. They are not strictly background knowledge in the sense that they are included in the ontologies themselves, but they do correspond to mappings to external ontologies. AML’s LogicalDefMatcher maps classes that have identical logical definitions. Continuing from the previous example, it would detect that Mammalian Phenotype Ontology (MP) [32] class MP_0000153 (“rib bifurcation”) has the exact same logical definition as HP_0000892 and thus map the two classes, as shown in Fig. 2. This is an example of a mapping that could not be found through lexical or structural matching approaches, but which logical definitions enable us to find.
Evaluation
Datasets
The datasets used in this study were the OAEI 2016 datasets from the Anatomy, Large Biomedical Ontologies, and Disease and Phenotype tracks [14]:
-
The Anatomy track consists of matching the Mouse Gross Anatomy Ontology [33] with the portion of the NCI Thesaurus [4] describing the human anatomy. It is evaluated using a manually curated reference alignment.
-
The Large Biomedical Ontologies track features six matching tasks that consist in the pairwise matching of FMA [3], NCI [4], and SNOMED [5] in two modalities: small overlapping fragments, and whole ontologies. The evaluation is based on reference alignments derived automatically from the UMLS Metathesaurus [17].
-
The Disease and Phenotype track includes two tasks, one consisting in mapping the Human Disease Ontology (DOID) [34] to the Orphanet and Rare Diseases Ontology (ORDO), and another consisting of mapping the Human Phenotype Ontology (HP) [30] to the Mammalian Phenotype Ontology (MP) [32]. The evaluation carried out in the OAEI 2016 was primarily based on consensus alignments that include all mappings found by either 2 or 3 participating matching systems.
Settings
To evaluate the impact of the various challenges of matching biomedical ontologies and the strategies for tackling them, we conducted a number of tests, which are further detailed in the “Results” section.
All tests were carried out in a personal computer with an Intel i5-4570 CPU @ 3.20GHz, with 10GB RAM allocated to Java, and Windows 7 64-bit operating system. Except were otherwise noted, the StringMatcher was run concurrently on 4 CPU threads, and all other matching algorithms were run using a single CPU thread.
When AML’s complete matching pipeline is mentioned, it refers to the matching pipeline employed for the OAEI 2016 [12]. The sources of background knowledge available to AML were also the same as it used in the OAEI 2016: the Uber Anatomy Ontology (UBERON) [2], the Human Disease Ontology (DOID) [34], and the Medical Subject Headings (MeSH) [35].
Tests where only the run time was being assessed were carried out in all datasets. Tests where the F-measure was being assessed were carried out in only the Anatomy and Large Biomedical Ontologies datasets (except where otherwise noted) since a consensus alignment, as used in the evaluation of the Disease and Phenotype track, was deemed insufficiently accurate for the purpose of this study.
In the final test of this study, we performed a manual evaluation of the mappings found uniquely through logical definitions from the HP-MP task (as logical definitions are only available for the ontologies in this task). These mappings were produced with older versions of the logical definitions of the HP ontology, which mapped to the FMA rather than to UBERON. Thus to derive HP-MP mappings based on logical definitions, the cross-references between UBERON and FMA were used to provide correspondences between the logical definitions, when the definitions were otherwise identical.