Merge process
The goal was to maintain meaningful anatomical groupings for use by biologist end-users, as well as to enhance the description logic axiomatization to support more advanced reasoning that had been limited or absent in the original four source ontologies. This merger consisted of two phases. First, classes from one or more source ontologies that had similar text definitions, labels, synonyms to Uberon classes, or that already had equivalence or taxon-specific subclass relations to Uberon (see Methods), were manually compared. Where needed, the inclusion criteria in the Uberon class were broadened, or in some cases, narrowed based on more expert definitions coming from one of the source ontologies. Second, classes from any source ontology that were not represented in Uberon were placed in an OWL-formatted extension ontology, which imports the core Uberon and extends classes in the core. As an example of the first situation, the class ‘pectoral girdle skeleton’ was previously present in Uberon, as well as in VSAO [16] and a few other source ontologies. Because the definitions were fully expert-vetted, the central representation in Uberon was adjusted to conform to VSAO when these classes were imported. In contrast and as an example of the second situation, the class ‘extracleithrum’ was not previously represented in Uberon, and thus was given a new identifier in the Uberon namespace (UBERON:4200022)1 and placed in the extension ontology (see Figure 1).This arrangement enabled distribution of editing rights for the more taxonomically specific classes to the domain experts. The contribution and evolution of each source ontology is highlighted in Figure 2, where the pre-merge cross-references (Xrefs) are highlighted, as well as the temporal relationships between these source ontologies and Uberon. An example of how the various classes relate to one another is shown in Figure 1.
Challenges
Class labels
There were a number of challenges in expanding the scope of Uberon with teleost fishes and amphibian content in particular. One challenge was that class labels that were unambiguous in the scope of one ontology became ambiguous when merged into Uberon. For example, the AAO class ‘manubrium’ (AAO:0000680), which represents a cranial structure, could potentially cause confusion with ‘manubrium of sternum’ (UBERON:0002205), an unrelated structure; thus the former was renamed ‘manubrium of hyale’ (UBERON:3000680).
Superclasses and relations
Another challenge in merging multiple ontologies was that numerous classes lacked a superclass or other relationships. This was evident after the merge by the large number of classes appearing at the root or directly below high-level nodes. These were easy targets for correction, and domain experts were consulted for their correct placement in the ontology. For example, ‘postminimus’ (UBERON:3010205) was originally placed only as a subClassOf the high-level node ‘anatomical entity’ in the AAO. This class is now asserted as a subClassOf ‘cartilage element’ (UBERON:0007844) and part_of some ‘tarsal skeleton’ (UBERON:0009879). Additional file 1: Table S1 holds a list of object properties and example uses.
Reconciling terminological differences from different domains
The process of integrating AOs built for different purposes also highlighted inconsistencies in terminology between and within zoological and/or medical nomenclatures. A good illustration of this problem comes from the terms used for different regions of the limb. In the comparative morphology literature, “-podium” terms are commonly used in reference to the vertebrate appendicular skeleton [25, 26]. These terms were introduced by Haeckel [27] in his 1895 treatise, “Systematische Phylogenie,” to refer to skeletal elements and their developmental anlagen; they were not originally intended to refer to composite bone/flesh limb segments. In the developmental literature, however, authors sometimes use these terms to refer to “limb segments” (e.g. [26]), usually in the context of the limb buds [26–30]. To reconcile these different uses, we label terms such as ‘skeleton of manual acropodium’ to refer to the skeleton and ‘manual acropodium region’ to refer to the limb segment. Another challenge is the use of some terms (such as acropodium) to refer to different structures in different contexts. In some papers “acropodium” only refers to the phalanges (e.g. [31]), but in others it refers to the entire manus skeleton excluding the mesopodium (e.g. [32, 33]). In keeping with the definitions created by Haeckel we decided to use ‘acropodium’ (or ‘acropodial skeleton’) for the phalanges and introduce a new term, the ‘digitopodium’, for the metapodium/acropodium complex (Figure 3).
Documenting and implementing design patterns
Another challenge in arriving at a cohesive unified ontology was the difference in modeling strategies, design patterns, and terminological conventions across the source ontologies. An example of this was the representation of joints in different ontologies prior to the merge. In TAO, classes representing different skeletal elements had relationships to specific joints; for example, both the ‘quadrate and the ‘anguloarticular’ had overlaps relationships to ‘quadrate-anguloarticular joint’. In Uberon, the dependency is reversed – for any given skeletal element there is no assumption of a relationship to a particular joint class, but joints are assumed to imply the presence of certain elements. For example, in Uberon the ‘quadrate-articular joint’ has two connected_to relationships to the ‘quadrate’ and to the ‘articular’ (anguloarticular). In comparing the two styles, we decided to opt for the latter pattern. This was because although the representation in TAO was valid at the level of teleosts, the broader taxonomic diversity represented in Uberon encompasses variation in the pattern of connectivity of skeletal elements across vertebrates. In Uberon, joints are defined by the skeletal elements they connect to rather than by the overlap of various skeletal elements, and hence the modeling pattern is more stable because individual bones may vary in their connectivity across species.
Another example of a difference in modeling pattern is how the various ontologies represented the connection of teeth to skeletal elements. When the ontologies were initially combined, representation of teeth associated with different bones was variably represented using, for example, connected_to, part_of, overlaps, etc. For example, the relationship of teeth to jaws is represented via an attachment relationship (attaches_to) in Uberon, in comparison to a part_of relationship in TAO. Further, many fishes have teeth in places that would be unusual for a mammal, such as in the pharynx (see Figure 4). We therefore documented a set of core design and modeling patterns [34] that would be applied throughout the ontology, and we proceeded to unify the combined ontology along these lines. As a result, all the teeth in Uberon, whether associated with the mandible and maxilla as in mammals or with the vomer as in fishes, have the same relationship (attaches_to) to supporting structures. This is a major advantage for users wishing to query over the combined ontology and provides a modeling pattern that is more stable across taxa.
A final example that illustrates the need for a common design pattern is the representation of muscles and their attachment. For example, in the course of vertebrate evolution, the jaw closing (jaw adductor) muscles have changed from single, simple muscles to complex, highly differentiated groups of muscles several times. The TAO and the ZFA have a single class, the ‘adductor mandibulae complex’ (TAO:0000311, ZFA:0000311), even though the adductor mandibulae in most teleosts has at least four distinct parts (called A1, A2, A3, Aw). Similarly, in the AAO, there was a single class called ‘jaw muscles’ (AAO:0000247). Amphibians are usually described as having a separate internal and external adductor mandibulae (or levator mandibulae internus and externus, respectively). This broad representation, based on subclass and partonomy alone, was insufficient to describe the complexity. A common design pattern that enabled granular representation and better classification of the individual muscles in the adductor complex that was based additionally on innervation, connectivity, and function was implemented, for example, for the ‘masseter muscle’ (UBERON:0001597):
'capable of part of' some ‘GO:mastication’
'has muscle insertion' some 'mandible'
'has muscle origin' some 'zygomatic arch'
'has muscle origin' some ‘maxilla’
'part of' some ‘cheek’
innervated_by some ‘masseteric nerve' (a branching part of the trigeminal nerve)
As is clear in the aforementioned examples, maintaining a consistent style of modeling across a large ontology is challenging. In our ontological work, we follow the software methodology of documenting common ‘design patterns’ [35], which already has been applied to biological ontologies [36]. For every design pattern a document is created and represented in OWL as part of the ontology itself [34, 37]. In this way, documents are coupled directly to the structures with which they relate, facilitating application. A script is used to generate web pages for every document. The document pertaining to the modeling of joints, for example, is available and includes status, contributors, discussion and a summary of issues as implemented in the ontology [38].
Integration of homologous groupings in vHOG
The vHOG integration presented some specific challenges, as it represents groupings of homologous structures rather than a formal anatomical description [17]. These manually reviewed groupings allowed the verification and correction of Xrefs from Uberon to ssAOs. Some highly derived organs present in extant taxa are merged in vHOG because they originate from a common ancestral structure. For instance, the swim bladder in ray-finned fish is hypothesized to be homologous to the terrestrial vertebrate lung [39]. While these structures are represented as different terms in Uberon, they are a single term in vHOG, named ‘lung - swim bladder’. Separate terms are maintained in Uberon in such cases, to avoid creating classes representing putative structures or not described in extant species. The information contained in vHOG was exported by the Bgee group to an external annotation file, allowing explicit homology relationships between Uberon entities that are believed to derive from a common ancestral structure. This also allows to formally provide the evidence supporting these assertions. The annotation file is available at [40]. Some of these homology assertions are duplicated in the ontology using the OWL object property ‘homologous_to’, to represent the relationship between homologous entities. However, we are planning to maintain these in a single place as a distinct OWL module derived from the vHOG association file that can be imported and extended separately.
Distinctions based on the developmental state of the same organ are treated similarly. Terms such as ‘future brain’ (embryonic precursor of the brain) and ‘brain’ had been merged in vHOG into a single class, which makes sense from the perspective of evolutionary history. However, these merges in vHOG required disentangling for proper integration into Uberon where such distinctions based on developmental structures are kept separate and classification is based on structure, function, lineage, etc. However, the relations between developmental structures that were grouped in vHOG can still be retrieved from Uberon, using the object properties transformation_of and immediate_transformation_of.
Attribution
Allocation of responsibilities, coordination of editing rights, and attribution were critical parts of the merging process among our different communities. Major contributions consist of additions to Uberon via the tracker, design documents, meetings, and workshops from multiple domain experts. Proposed changes to the core are submitted through the Uberon tracker [41] and vetted by the larger community. Because it is very important to this community and “scientific good practice” to keep track of contributions and to facilitate further discussion regarding design or definitions, a more sophisticated approach to attribution is needed. Contributions to the ontology are marked as metadata in the ontology, at the level of classes, axioms, and the ontology itself, using references to design documents and to individuals using database-cross-reference or ORCID IDs. Design documents also have authors and can be included directly in the ontology as instances (see above) via an import file, as well as online and linked via the Uberon.org website.