The mouse pathology ontology, MPATH; structure and applications
© Schofield et al.; licensee BioMed Central Ltd. 2013
Received: 4 February 2013
Accepted: 19 August 2013
Published: 13 September 2013
Skip to main content
© Schofield et al.; licensee BioMed Central Ltd. 2013
Received: 4 February 2013
Accepted: 19 August 2013
Published: 13 September 2013
The capture and use of disease-related anatomic pathology data for both model organism phenotyping and human clinical practice requires a relatively simple nomenclature and coding system that can be integrated into data collection platforms (such as computerized medical record-keeping systems) to enable the pathologist to rapidly screen and accurately record observations. The MPATH ontology was originally constructed in 2,000 by a committee of pathologists for the annotation of rodent histopathology images, but is now widely used for coding and analysis of disease and phenotype data for rodents, humans and zebrafish.
MPATH is divided into two main branches describing pathological processes and structures based on traditional histopathological principles. It does not aim to include definitive diagnoses, which would generally be regarded as disease concepts. It contains 888 core pathology terms in an almost exclusively is_a hierarchy nine layers deep. Currently, 86% of the terms have textual definitions and contain relationships as well as logical axioms to other ontologies such the Gene Ontology.
MPATH was originally devised for the annotation of histopathological images from mice but is now being used much more widely in the recording of diagnostic and phenotypic data from both mice and humans, and in the construction of logical definitions for phenotype and disease ontologies. We discuss the use of MPATH to generate cross-products with qualifiers derived from a subset of the Phenotype and Trait Ontology (PATO) and its application to large-scale high-throughput phenotyping studies. MPATH provides a largely species-agnostic ontology for the descriptions of anatomic pathology, which can be applied to most amniotes and is now finding extensive use in species other than mice. It enables investigators to interrogate large datasets at a variety of depths, use semantic analysis to identify the relations between diseases in different species and integrate pathology data with other data types, such as pharmacogenomics.
Since the late eighteenth century when achromatic lenses and reliable histological stains began to be available, investigators of anatomic pathology, and particularly in the mid -nineteenth century the innovators of cellular pathology such as Rudolf Virchow, developed and applied terminologies to describe their observations [1, 2]. These depended on the “school” to which the pathologists belonged, but more importantly on the etiologic or mechanistic paradigm in which they were working . One of the great achievements of the nineteenth century was the recognition of the universality of pathological processes and entities and their occurrence in multiple species as recognisable manifestations of the same underlying processes . It was, nevertheless, a century before broadly accepted and rationally structured pathology terminologies were developed (e.g. ). The development of pathology terminologies has to an extent occurred independently of disease terminologies and nosologies, partly as a result of the much longer history of classifying diseases, and partly due to the inherited preconceptions of the nature of disease in clinical medicine.
The distinction between pathological and clinical descriptions of disease, disorders and predispositions is still not satisfactorily resolved. However, in recent years there have been attempts to rationalise the definitions of these concepts  and their relation to each other as part of a broadly applicable model of disease other than an unstructured collection of manifestations or phenotypes which are found in that class of individuals forming the basis of a diagnosis. Issues about severity, time course, organ involvement etc. are beginning to be addressed, but it is remarkable that even treating diseases as a “bag“ of phenotypes has been shown to provide a powerful approach in establishing the relationships between diseases, and the presence of related diseases in different organisms [7–10]. What has recently been identified as important, nevertheless, is that the tissue-specific resolution of the recording of lesions, and the ability to record the pattern of disease within an individual, has proved vital for GWAS mapping of predisposing genetic variants in inbred strains of mice allowing each class of lesion to be analysed in isolation [11, 12].
The discipline of pathology may be broken down into clinical and anatomic pathology, the former is concerned with clinical chemistry, hematology, clinical microbiology and emerging sub-specialities such as molecular diagnostics and proteomics. The latter, which forms the domain of MPATH, deals with the histological, histochemical or immunohistochemical observations of alterations in tissue composition or architecture. Both branches of the medical specialty, which are increasingly merging, may be viewed as aspects of phenotyping, and both provide subtypes of the clinical signs associated with ongoing disease processes, the results of developmental abnormalities, or the historical presence of disease.
The universality of the repertoire of responses to underlying genetic or extrinsic insults means that gross and histopathologically-defined phenotypes are some of the most useful phenotypes for relating diseases between different species, and constitute some of the most species-agnostic phenotype descriptors. This makes a pathologic term-based ontology a crucial tool in experimental and clinical phenotype data capture .
The development of systematic human pathologic nomenclatures has been driven by the efforts of the American College of Pathologists, initially with the development of the pathology specific nomenclature (SNOP) over 40 years ago  to the current SNOMED –CT with cross references to UMLS, the NCI thesaurus and other terminologies. The ICD , now in its 11th revision and the associated ICD-O v-3 for cancer, also contains descriptions of many pathological lesions.
The other driver for pathologic terminology standardisation has been coding of lesions from toxicopathology. The American Society of Toxicopathology (STP) working with Registry of Industrial Toxicology Animal-data (RITA) database group in Europe has produced several internationally accepted nomenclature systems, particularly focusing on proliferative lesions. Recently, the STP has undertaken a major harmonization exercise for rodent pathology – the INHAND (International Harmonization of Nomenclature and Diagnostic Criteria for Lesions in Rats and Mice) initiative . So far this group has reported on the hepaticobiliary, respiratory, nervous and urinary systems [17–20]. For some time the National Cancer Institute’s Mouse Models of Human Cancer consortium (MMHCC) has been examining the classification of tumours in genetically engineered mice. MMHCC has produced a consensus base terminology for neoplasias of the major organ systems that have been presented in a series of papers over the last decade .
Despite the huge value of these resources, none is currently constructed as an ontology with meaningful axioms to support inference and automated reasoning, and to that end we developed MPATH to describe lesions that arise in laboratory mice.
The MPATH ontology was constructed ab initio by a group of clinical and veterinary pathologists in 2,000 and has since been revised and augmented by an evolving group of US and European pathologists on a regular basis. It is clear from more than a decade of experience that expert input and manual curation are essential to generate an accurate and functional resource. One strategy for building the ontology has been to integrate it into large-scale phenotyping and diagnostic programs so that the pathologists use it on a daily basis and have fields to add missing terms or synonyms that they are more familiar with thereby constantly increasing its coverage and utilitarian value.
MPATH is largely congruent with the upper level Ontology for General Medical Science (OGMS) , founded on the Basic Formal Ontology (BFO). Pathological bodily process (OGMS:0000061) and pathological anatomical structure (OGMS:0000077) are broadly mappable to the upper levels of MPATH; MPATH:603 (pathological anatomical entity) and MPATH: 596 (pathological process) respectively. However, more detailed mapping is difficult. For example the MPATH experts view congenital malformations as pathological anatomical structures, whereas OGMS views them as distinct, and similarly MPATH views inflammation as a pathological process whereas OGMS does not include this as a pathological bodily process. Until such discrepancies are resolved, integration of MPATH into the OGMS framework will be problematical.
From the point of view of application, the most important mappings for MPATH are to the Human Phenotype Ontology (125), the Mammalian Phenotype ontology (111), the Disease Ontology (231), SNOMED-CT (867) and the NCIt (566), reflecting the emphasis on the domain of anatomic pathology rather than disease.
Currently, 86% of classes have textual definitions. Each class is in the mouse pathology namespace and is uniquely identified by a URI of the form: http://purl.obolibrary.org/OBO/MPATH_n. The main ontology is available in both the OBO Flatfile Format and the Web Ontology Language (OWL). MPATH is housed in a subversion repository and is made available via OBO registry, Bioportal (http://purl.bioontology.org/ontology/MPATH) and on the project’s website http://mpath.googlecode.com/. MPATH contains relationships and other logical axioms to other ontologies such the Gene Ontology (GO) , Cell Type ontology (CL)  and the Phenotype And Trait Ontology (PATO) . For example, the MPATH term transitional cell metaplasia (MPATH:172) represents a metaplastic response of the transitional epithelium, for example in the bladder to give squamous metaplasia and glandular metaplasia. To allow computational access to these relations, we use the derives-from relation and relate metaplasia (MPATH:549) (an MPATH term that denotes an abnormal transformation of a differentiated adult cell or tissue of one kind into a differentiated tissue of another kind) with the CL term transitional epithelial cell (CL:0000244).
Traditionally pathologists have relied on a narrative form of recording their definitive diagnoses, making use of morphologic, etiologic, and disease-based terms that collectively provide a diagnosis useful for clinical patient management. This is particularly important for non–neoplastic lesions where it can be complex to capture important subtleties of distribution, severity, microscopic sub-type and anatomical location for example. Whilst this is the gold standard, it is not possible to compute on data recorded in this way and it is very difficult to tabulate and quantitatively analyse the collected information. There are strong arguments, mainly from experience in toxicologic pathology, that a descriptive (anatomic) rather than diagnostic coding is the most objective and useful way to code pathology-based observations. This is particularly relevant to examination of mutant mice where traditional etiologic or summative diagnostic terms are simply not available because of the novelty of the lesion or its presentation. This is particularly the case where mice are manipulated to model human conditions that have not been previously seen, for example lung or mammary tumours [11, 25, 26] which have not previously been reported to occur spontaneously in mice. In many cases, a disease diagnosis implies a particular pathogenesis or etiology based on the spontaneous disease, which is not appropriate for the disease caused by genetic and sometimes both genetic and external challenge combined. This latter issue is of particular concern to practicing pathologists and in the development of MPATH we have been urged to include some diagnostic terms as well as descriptive anatomic ones.
Many tissue responses are common to multiple anatomical sites and as far as possible the verbosity (ontology “bloat”) of specifying a particular response in multiple tissues has been avoided, with the additional topographical or anatomical information for description coming from an anatomy ontology, generally the MA  or EMAP ontologies  for the mouse, however, there is often an intrinsic anatomical element embedded in the term or traditional pathology includes information about the cell type or tissue of origin. This is most frequent with the neoplasias and we felt that such terms were best included in their familiar form. Figure 1B shows how anatomically predicated classes such as hepatocellular carcinoma have multiple parents, providing relations in this case to both carcinoma and hepatic tumor superclasses. Most observations made by pathologists using MPATH are, nevertheless, cross-products using a combination of an MPATH term and an anatomical (MA) or cell type (CL) [24, 29] component. This strategy provides all of the necessary coverage.
Examples of pathology term qualifiers now incorporated into PATO
Lesion dependent; often size, number and characteristics.
Extremely acute and aggressive
Beginning abruptly with marked intensity
Between acute and chronic
Slow progress and long continuance
Coexistence of chronic process and superimposed acute process
Single well delineated lesion
Single lesion with expansion into surrounding tissue
Multifocal to coalescing
Multiple lesions some interconnecting with each other
No appreciable pattern
Not circumscribed or limited
Affecting all regions without specificity of distribution
Confined to one side only
Involving both sides
Relating to a segment
The strategy adopted was originally designed to describe histopathology images for the Pathbase mouse pathology database , but lends itself readily to a wide range of coding applications. The MPATH strategy has been adopted by two major high-throughput studies. A combination of MPATH and PATO is being used for the capture of pathology data from the genome-wide mutant mouse phenotyping project, KOMP2 run as part of the International Mouse Phenotyping Consortium , where the MPATH approach is being used in the primary phenotyping pipeline by the Toronto Centre for Phenogenomics and other centres carrying out histopathology. MPATH has also been adopted for the MoDIS database  to capture and analyse pathology data from a massive aging study which has systematically phenotyped 31 of the most important inbred mouse strains. Complete necropsies of mice were carried out at 12 and 20 months of age (cross-sectional study) and moribund mice in the life span (longitudinal study). Nearly 2,000 mice were necropsied, generating more than 50,000 slides . Lesion incidence and severity data for all organs is now being applied in highly successful GWAS studies of age-associated disease .
MPATH has proved to be additionally useful in dealing with the recoding of multi-species legacy data from non-standard nomenclatures, permitting integration of otherwise siloed data. Examples are the European Radiological archive (ERA) database where 6,700 human diagnoses were recoded from ICD-8 and the Klinischer Diagnosenschleussel  to MPATH/FMA , and with the Northwestern University Janus radiobiology database (http://janus.northwestern.edu/janus2/), who have coded 50,000 individual mouse records to MPATH to link the two datasets. Recently the ontology has been applied to zebrafish phenotype data in the Zfin database  indicating a useful application of MPATH to non-mammalian species which could be developed further.
The PATO framework was built with the intention of providing an integration platform for phenotype data between species and between data types . According to the PATO framework, phenotype data can be described by utilising species-specific ontologies (such as the various anatomy ontologies) or species-agnostic ontologies such as GO with the various qualities provided by the PATO ontology in order to describe affected entities in a phenotype manifestation. PATO can be used for annotation either directly in a so-called post-composed (post-coordinated) manner or for providing logical definitions (equivalence axioms) to ontologies containing a set of precomposed (pre-coordinated) phenotype terms [22, 37–39]. For further discussion see .
Rather than using a pre-composed phenotype ontology such as MP  or HPO , phenotypes may be described using the Entity–Quality (EQ) formalism. In the EQ method, a phenotype is characterized by an affected Entity and a Quality (from PATO) that specifies how the entity is affected. The affected entity can either be a biological function or process such as specified in GO, or an anatomical entity. The phylogenetic conservation, at least within the amniotes, of most histopathologic lesions or processes makes MPATH an important core ontology in writing logical definitions and we have used it extensively in defining classes in the major pre-composed phenotype ontologies and MPATH is an important component ontology of our recently developed semantic approaches to comparative phenomics – PhenomeNET and Mousefinder [8, 9].
Composition of logical definitions is a time-consuming task for which there are currently several approaches to automation using class label segmentation, entity recognition and lexical matching to core ontologies. This approach can be useful for suggesting definitions where the class label is a composite of for example, anatomy and process (MA + GO). Automated decomposition of unilexical terms such as are found in the neoplasias is much more difficult though approaches with text mining definitions from other ontologies such as NCIt for lexically matching labels may be useful to expert curators in establishing more simple definitions for these classes.
Whilst MPATH was originally designed to support rodent, and particularly mouse, pathology the extensive overlap with human pathology means that most of the terms may be used in a human context and linked to the foundational model of anatomy (FMA)  as the anatomy ontology. Extending MPATH to become a mammalian pathology ontology encompassing human pathology is a major undertaking, but we have established that the current structure and upper level classes would readily support the inclusion of human terminology. Initially we will import terms for neoplasias from the CINEAS codes (Central Information System for Hereditary Diseases and Synonyms; http://www.cineas.org/; Prof Rolf Sijmons, pers, comm). SNOMED-CT, UMLS and ICD-O v3 will be mined for terms not currently in MPATH which relate to anatomic pathology. Terms already covered by existing ontologies such as Disease Ontology (DO)  may be referenced using MIREOT . DO classifies diseases largely by anatomical site and not by disease process or class, and overlaps only slightly with MPATH as it is concerned with summative diagnostic entities for the main part. For example there is no “inflammation” superclass in DO for the tissue specific inflammatory conditions described.
Use of MPATH to construct logical definitions for DO classes would potentially add a further dimension to the richness and applicability of DO.
The power of the description of pathological lesions to discriminate between diseases and therefore between models of human disease is substantial. We recently estimated the information content (IC) of pre-composed MP ontology terms used to code phenotypes in the EUMODIC mouse phenotyping pipeline , which included or excluded anatomic pathology descriptions, using their logical definitions. Pathology-related phenotypes were shown to have a significantly greater discriminatory power than other in vivo assays, strongly supporting the use of these assays in the development of mouse models of human diseases .
Further development and application of MPATH will inevitably depend on community engagement and we encourage anyone with an interest to provide feedback.
Central Information System for Hereditary Diseases and Synonyms
EMouse atlas project
European radiobiology archive
European mouse disease clinic
Foundational model of anatomy
Genome-wide association study
Human phenotype ontology
International classification of disease
International classification of disease – oncology
International Mouse Phenotyping Consortium
International Harmonization of Nomenclature and Diagnostic Criteria for Lesions in Rats and Mice
Knockout mouse project 2
Mouse anatomy ontology
Minimum information to reference an external ontology term
Mouse Models of Human Cancer consortium
Mouse pathology ontology
Mouse phenotype ontology
National Cancer Institute Thesaurus
Open biological ontology
Phenotype and trait ontology
Registry of Industrial Toxicology Animal-data
Systematized nomenclature of medicine clinical terms
Society for toxicopathology
Unified medical language system.
The authors would like to thank those who have contributed to the development and application of MPATH over the years. This work was funded by the European Commission. Contract QLRI-1999-00320, the Ellison Medical Foundation, National Institutes of Health AG25707, for the Shock Aging Center, CA89713, and AR056635 and AR063781 to JPS, and HG004838-04 to PNS.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.