An ontology for major histocompatibility restriction
© Vita et al. 2016
Received: 29 September 2015
Accepted: 3 January 2016
Published: 11 January 2016
MHC molecules are a highly diverse family of proteins that play a key role in cellular immune recognition. Over time, different techniques and terminologies have been developed to identify the specific type(s) of MHC molecule involved in a specific immune recognition context. No consistent nomenclature exists across different vertebrate species.
To correctly represent MHC related data in The Immune Epitope Database (IEDB), we built upon a previously established MHC ontology and created an ontology to represent MHC molecules as they relate to immunological experiments.
This ontology models MHC protein chains from 16 species, deals with different approaches used to identify MHC, such as direct sequencing verses serotyping, relates engineered MHC molecules to naturally occurring ones, connects genetic loci, alleles, protein chains and multi-chain proteins, and establishes evidence codes for MHC restriction. Where available, this work is based on existing ontologies from the OBO foundry.
Overall, representing MHC molecules provides a challenging and practically important test case for ontology building, and could serve as an example of how to integrate other ontology building efforts into web resources.
KeywordsMajor histocompatibility complex Ontology MHC Immune epitope
The Immune Epitope Database (www.iedb.org) presents thousands of published experiments describing the recognition of immune epitopes by antibodies, T cells, or MHC molecules . The data contained in the IEDB is primarily derived through manual curation of published literature, but also includes some directly submitted data, primarily from NIAID funded epitope discovery contracts . The goal of the current work was to represent MHC data as they are utilized by immunologists to meet the needs of the IEDB users. We collected user input at workshops, conferences and the IEDB help system regarding how they wanted to retrieve data from the IEDB regarding MHC restriction. These requests were used to identify goals for this ontology project and the final ontology was evaluated if it could answer these requests. As shown in Additional file 1: Table S1, an example of such a request was to be able to query for epitopes restricted by MHC molecules with serotype ‘A2’ and retrieve not only serotyped results but also those where the restriction is finer mapped e.g. to MHC molecule A*02:01 which has serotype A2. We set out to logically represent the relationships between the genes encoding MHC, the haplotypes linking together groups of genes in specific species, and the individual proteins comprising MHC complexes, in order to present immunological data in an exact way and to improve the functionality of our website. Our work builds on MaHCO , an ontology for MHC developed for the StemNet project, using the well-established MHC nomenclature resources of the international ImMunoGeneTics information system (IMGT, http://www.imgt.org) for human data and The Immuno Polymorphism Database (IPD, http://www.ebi.ac.uk/ipd) for non-human species. It contains 118 terms for MHC across human, mouse, and dog. We were encouraged by the success of MaHCO in expressing official nomenclature using logical definitions. However, we needed to extend it for the purpose of the IEDB to include data from a growing list of 16 species, as well as data about MHC protein complexes (not just MHC alleles), haplotypes and serotypes. Thus, our current work goes beyond MaHCO, and we have utilized this opportunity to also enhance the integration with other ontological frameworks.
We used the template feature of the open source ROBOT ontology tool  to specify the content of our ontology in a number of tables. Most of the tables correspond to a single “branch” of the ontology hierarchy, in which the classes have a consistent logical structure, e.g. gene loci, protein chains, mutant MHC molecules, haplotypes, etc. The OWL representation of our ontology is generated directly from the tables using ROBOT. This method enforces the ontology design patterns we have chosen for each branch, and makes certain editing tasks easier than with tools such as Protégé.
Results and discussion
Our MHC Restriction Ontology (MRO) is available in a preliminary state at https://github.com/IEDB/MRO. It is based on existing ontology terms, including: ‘material entity’ from the Basic Formal Ontology (BFO) , ‘protein complex’ from The Gene Ontology (GO) , ‘protein’ from The Protein Ontology (PRO) , ‘organism’ from The Ontology for Biomedical Investigations (OBI) , ‘genetic locus’ from The Reagent Ontology (REO) , ‘has part’, ‘in taxon’, and ‘gene product of’ from The Relation Ontology (RO) . The NCBI Taxonomy was used to refer to each species . Although it is not yet complete, we strive to conform to Open Biological and Biomedical Ontologies (OBO)  standards. MRO currently contains 1750 classes and nearly 9000 axioms, including more than 2100 logical axioms. Its DL expressivity is “ALEI”, and the HermiT reasoner  completes reasoning in less than 10 seconds on a recent laptop.
Synonyms were also included, as immunologists often utilize synonyms that are either abbreviations or based on previous states of the nomenclature. The current MHC nomenclatures for various species have been revised through several iterations. In order to ensure accuracy and remain up to date with the latest nomenclature, we referred to the well-established MHC nomenclature resources of the IMGT and IPD. For specific species where the literature was most formidable, such as chicken, cattle, and horse, we collaborated with experts in these fields. These experts reviewed the encoded hierarchy by determining whether the inferred parentage hierarchy in their area of expertise reflected their input.
The data in the ontology drives the Allele Finder on the IEDB website, available at http://goo.gl/r8Tgrz, an interactive application that allows users to browse MHC restriction data in a hierarchical format. We evaluated the ability of MRO to meet the needs of IEDB users, as shown in Additional file 1: Table S1, and found it to meet our initial goals. Currently the use of the ontology is behind the scenes, but we have requested namespace and permanent identifiers from The Open Biomedical Ontologies (OBO). As soon as these identifiers are in place, they will be utilized and displayed on the IEDB website to allow users to link out to the ontology.
In MHC binding and elution assays, the exact MHC molecule studied is typically known; however this is often not the case for T cell assays. When a T cell responds to an epitope, the identity of the MHC molecule presenting the epitope may not be known at all, it may be narrowed down to a subset of all possible molecules or it may be exactly identified. In the context of T cell assays, the MHC restriction can be determined by the genetic background of the host, conditions of the experiment, or the biological process being measured; therefore we represent MHC molecules at a variety of levels and specify the rationale behind the determined restriction using evidence codes.
The serotype of an MHC molecule, defined by antibody staining patterns, is relevant in immunology as this was the method of choice to identify MHC molecules until quite recently. In contrast to molecular definitions of MHC molecules based on their specific nucleotide or amino acid sequence, serotyping classifies MHC molecules based entirely on antibody binding patterns to the MHC molecule. These patterns are linked to the panel of antibodies used. Changing the antibody panel changes the serotype of a molecule. This can result in “serotype splits” where MHC molecules that were previously considered identical by one antibody panel, are later found to actually be two different molecules by a different antibody panel. To reflect this extrinsic nature of serotyping, we refer to serotypes as information entities rather than physical entities. Alternatively, the concept of serotype could also be modeled as collections of binding dispositions, but we chose what we thought was the simpler approach. MHC for all 16 species currently having MHC data in the IEDB are modeled to give users the ability to browse the tree in multiple ways and search IEDB data broadly, by entire MHC class, for example, or narrowly by a specific MHC protein chain. As new MHC molecules are encountered, they can be easily incorporated into this ontology.
In conclusion, we formally represented MHC data building on established ontologies in order to represent MHC restrictions as required by immunologists. Accordingly, we modeled MHC molecules as a protein complex of two chains and established the relationships between the genes encoding these proteins, the haplotypes expressed by specific species, and the MHC classes. Traditional serotype information was also related to specific MHC molecules. Precise MHC restriction was conveyed, as well as inferred MHC restriction and also the experimental evidence upon which the restriction was established. We will continue to formalize this work and will release a completed interoperable ontology later this year. Thus, MHC data in the IEDB is now presented to its users in a hierarchical format which simplifies searching the data and additionally instructs users on the inherent relationships between MHC genes and MHC restriction.
Major histocompatibility complex
The Immune Epitope Database
Antigen presenting cell
Human leukocyte antigen
Immuno Polymorphism Database
- MRO MHC:
Basic Formal Ontology
Ontology for Biomedical Investigations
The Open Biomedical Ontologies
We wish to thank Kirsten Fischer Lindahl and Lutz Walter for their kind assistance with the mouse and rat MHC molecule nomenclatures, respectively.
The Immune Epitope Database and Analysis Project is funded by the National Institutes of Health [HHSN272201200010C].
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Vita R, Overton JA, Greenbaum JA, Ponomarenko J, Clark JD, Cantrell JR, et al. The immune epitope database (IEDB) 3.0. Nucleic Acids Res. 2015;43(Database issue):D405–12.View ArticleGoogle Scholar
- Sette A, Fleri W, Peters B, Sathiamurthy M, Bui HH, Wilson S. A roadmap for the immunomics of category A–C pathogens. Immunity. 2005;22(2):155–61.View ArticleGoogle Scholar
- DeLuca DS, Beisswanger E, Wermter J, Horn PA, Hahn U, Blasczyk R. MaHCO: an ontology of the major histocompatibility complex for immunoinformatic applications and text mining. Bioinformatics. 2009;25(16):2064–70.View ArticleGoogle Scholar
- Overton JA, Dietze H, Essaid S, Osumi-Sutherland D, Mungall CJ. ROBOT: A command-line tool for ontology development. Lisbon, Portuga: 5th International Conference on Biomedical Ontology; 2015. July 29.Google Scholar
- Simon J, Dos Santos M, Fielding J, Smith B. Formal ontology for natural language processing and the integration of biomedical databases. Int J Med Inform. 2006;75(3–4):224–31.View ArticleGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Na Genet. 2000;25(1):25–9.View ArticleGoogle Scholar
- Natale DA, Arighi CN, Barker WC, Blake JA, Bult CJ, Caudy M, et al. The Protein Ontology: a structured representation of protein forms and complexes. Nucleic Acids Res. 2011;39:D539–45.View ArticleGoogle Scholar
- Brinkman RR, Courtot M, Derom D, Fostel JM, He Y, Lord P, et al. OBI consortium. Modeling biomedical experimental processes with OBI. J Biomed Semantics. 2010;1:S7.View ArticleGoogle Scholar
- Brush MH, Vasilevsky N, Torniai C, Johnson T, Shaffer C, Haendel MA. Developing a Reagent Application Ontology within the OBO Foundry Framework. Buffalo, NY: Proceedings of the International Conference on Biomedical Ontology; 2011.Google Scholar
- Smith B, Ceusters W, Klagges B, Köhler J, Kumar A, Lomax J, et al. Relations in biomedical ontologies. Genome Biol. 2005;6(5):R46.View ArticleGoogle Scholar
- Sayers EW, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2009;37:D5–15.View ArticleGoogle Scholar
- Smith B, Ashburner M, Rosse C, Bard C, Bug W, Ceusters W, et al. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol. 2007;25:1251–5.View ArticleGoogle Scholar
- Glimm B, Horrocks I, Motik B, Stoilos G, Wang Z. HermiT: an OWL 2 reasoner. J Automated Reasoning. 2014;53(3):245–69.View ArticleGoogle Scholar
- Chibucos MC, Mungall CJ, Balakrishnan R, Christie KR, Huntley RP, White O, Blake JA, Lewis SE, Giglio M. Standardized description of scientific evidence using the Evidence Ontology (ECO). Database. 2014: 1–11.