The first question that emerges from these results is why are there so few matches between such rich ontologies and real-life anatomical annotations from the user community? The tools we used to evaluate the ontologies provided complementary approaches to matching the data; as a result, a variety of issues were uncovered.
Some of the discrepancies are accounted for by the fact that the tools were designed for strict matching to reduce the number of false positives during automated lexical processing. However, the actual mismatches between the terms as the users provide them, and as they are represented in the ontologies accounted for much of the low hit rate. The strictness of the matching algorithm is a by-product of the fact that we wanted a low rate of false positives due to the known heterogenous nature of the input. The aim is to automate these processes of mapping in future and for the use cases we have, low false positives is important. In future work we can extend the use of methods to use more fuzzy matching techniques e.g., Levenshtein distance-based methods, which will improve recall for adjectival forms.
The most obvious issues with the data that would block matching to the ontologies were standardized by preprocessing before the test was run. However, even after preprocessing, there were recurring problems that interfered with successful matching.
Most of the mismatches were due to annotations containing composite terms, where one or both of the terms, taken separately, actually would have matched. For example, Liver/Kidney does not match any entity in either ontology, but Liver alone and Kidney alone matched in both. Sometimes only one of the terms would have matched; for example, in Acetabulum and pelvic soft tissues, while pelvic soft tissues is too vague and would have required clarification, Acetabulum could have been an exact match. Tools that are able to split up the user’s annotations into individual anatomical terms, or even mine the user’s annotations for new anatomical terms, as well as the provision of searchable cross-products for compositional terms, can resolve some of these problems.
Sometimes it was unclear whether the composite term was actually referring to two different entities, or whether it was simply a redundancy, for example, Adrenal cortex, adrenal gland. Breaking down that composite term would have accomplished either one or two matches, depending on whether the user intended two different entities, or was using Adrenal gland redundantly to modify Adrenal cortex.
Occasionally the user would refer to a cell when it was obvious the sample referred to a tissue (example: Adipocyte vs. Adipose tissue), and whilst the cell name the user entered did not match, the intended tissue type actually would have.
Entities in anatomical ontologies are nouns for the most part, so when the user entered an adjective referring to the anatomical structure, such as Abdominal for Abdomen, or Arterial for Artery, the adjective would not match the ontology, yet was closely related to a term that was present.
Shorthand, such as Antrum for Pyloric antrum, or Both ventricles for Left ventricle of heart and Right ventricle of heart also kept the matching rate artificially low. Most of those examples could be expanded to full names of entities in the anatomy ontologies, but out of context, it was impossible to determine what a few meant, such as Ventral or 11 different tissues. If a human curator cannot know what the annotation means, mapping becomes an impossible task for any automatic or curated tool.
Occasionally match failures occurred from the users’ utilization of named processes implying an anatomical location, such as Colon pinch biopsy, BA (bronchioalveolar) lavage, or Bone marrow, flushed from femur. The latter could be a composite term as well, although breaking it down into entities would require the tool to know how to deal with “flushed from”, in order to find the boundary between them.
Sometimes a mismatch would occur because the user made use of a synonym for a term that was actually in the ontology, but synonyms were missing. Sometimes an omission from an ontology was quite surprising—for example, the term Anterior tibialis was missing from Uberon and FMA. Addition of synonyms to both would improve utility.
The FMA and Uberon are designed for different purposes, so there is no consensus between them as to exactly what entities belong in an anatomical ontology. For example, Alveolar macrophage is included as a discrete entity in FMA, where Uberon does not contain it, and explicitly regards it as a composite to be generated from Alveolus and Macrophage, rather than belonging in the ontology itself. These differing definitions based on design decisions, while not apparent to the user, have an impact on whether that user’s terms can be expected to match terms in the ontologies being used.
Uberon handles embryological and non-human anatomical entities better than the FMA does. For this reason, Uberon performed relatively better in programmatically matching the heterogeneous data from this community.
Mutual mismatch issues
The implicit assumption of the tools is that the mapping between list terms and ontology terms should be 1:1. This meant that there were sometimes approximate matches on a closely-related term, even in the absence of an exact match. For example, sometimes there was a superset or subset relationship approximate match, such as Abdominal fat and Abdominal fat pad, or Fascia and Connective tissue. Other proposed matches crossed levels of abstraction, for example, Right lung as opposed to Lung.
Additionally, a few cases of quantitative annotations, such as 75% kidney, 25% liver were present in the annotations; these will be ruled out of scope in future testing, as very few present ontologies can handle quantitative data. However, their presence does indicate a currently-unmet user need in data annotation.
Ontology classes tend to be expressed in the singular whilst annotations are written in singular and plural. Both tools did a better job of matching the singular terms than they did the same terms in plural; given the variation in working styles, tools that access ontologies for real-world applications will need to be better at dealing with singular-plural variation.
We have established that, although we were able to map almost half of the terms from the use cases to the ontologies, the process required a great deal of time, effort, and manual curation. There remains a vast gap between the way users use anatomical terms in free text annotation and the way they are represented in two of the richest anatomical ontologies. This exercise provided preliminary insight into the following issues:
Which terms are available in which source(s)? Uberon was able to match more embryological and non-human terms in our data than the FMA did. This fact indicates the effect that the design decisions and scope of each ontology will have on users desiring to match their annotations with an ontology.
Which areas require concentration in ontology development in order to obtain as much coverage as other areas have? Although they are the most difficult to represent rigorously in an ontology, and are consequently underrepresented in anatomical ontologies, brain terms and embryological terms make up the bulk of the annotation data. As a consequence, species-specific and cross-species anatomical ontologies need to represent that data, meaning that the particular difficulties of representing it need to be addressed.
What maps, what does not map, and why? Simple terms such as Liver, Kidney, Adrenal gland, and Retina match very well in each ontology. Compositional terms, such as Alveolar macrophage, for example, tend to map poorly in each ontology, because of lexical similarity and granularity issues. Some of the compositional terms are to be expected in sampling (such as Bone marrow from femur); others are simply an artifact of combining samples or acquiring samples that cross multiple structures (Liver, kidney, adrenal gland, adrenal cortex); there needs to be a way of dealing with each scenario.
What duplications and errors are our tools able to determine in the ontologies used in the comparison, and what suggestions would we make for additions and modifications to the source ontologies? We found omissions of surprisingly common terms and synonyms utilized in the user community, such as Anterior tibialis for "tibialis anterior". We also found that Uberon had the term Ureter twice, with two separate IDs, and requested a consolidation.
What suggestions do we want to make to the tool developers for functionality that would make it easier for users to obtain better matches? Currently there is a large risk of false positives for matching. Lemmatisation and stemming would prevent some of the need for pre-processing, but would exacerbate the tendency to false positives. We would suggest tools refined more precisely to the specific needs of the anatomy domain. Misspellings, hybrid terms with elements of multiple languages, and other flawed input data present a challenge to the canonical terms in ontologies. The input data needs to be cleaned up and made consistent, a task that was done manually for this study, but which is prohibitive for scaling.
The requirements for cross-products in many annotation use cases. Terms such as Bone marrow from femur failed to match, even though both Bone marrow and Femur were in the ontologies.
These insights will also inform our future efforts in developing and refining ontology-matching tools. Although the purpose of this study was not to evaluate the tools per se, some interesting findings emerged that will provide a basis for future refinement of the tools. Ontology Mapper was far more sensitive to potential mappings than the anatomist was, tending toward false positives such as proposing to match Perisoteum of ilium with Whole embryo. This was an artifact of its Double Metaphone algorithm. Zooma did not provide any false positives, as a result of its exact matching, but made many false negative errors. The discrepancy between Zooma's exact matches and the anatomist's is an interesting finding. The numbers should have been the same. Possible reasons for the discrepancy include human error, a possible bug in the code, or the presence of non-printing characters in the data, recognisable by Zooma but invisible to the anatomist. We will follow up on this in future work. Another issue for future work is aligning matches along semantic content rather than the lexical matching that sufficed for this exploratory study. The lexical matching approach worked for such general vertebrate structures as Liver or Kidney, but for annotations containing more detailed accounts of species-specific structures (e.g., zebrafish Frontal bone, rat Anterior prostate), this approach will need to be refined in order to avoid another source of false positives.
In order for ontologies to realize their potential, they need to be used. The user must perceive the benefit from their use, whether that benefit takes the form of ease of data entry, time saved, and replacement of manual inspection with automation. The current state of anatomical ontologies leaves a gap between the needs of the user and which ontologies are available. There is a real and growing need for tools such as Zooma, Ontology Mapper, and others that can complement the functions of ontologies in bridging that gap, removing barriers between the ontology and the community of users it is intended to serve.