Overview of ontology contents
MicrO (version 1.3, released on March 23, 2016) consists of ~2550 classes (plus thousands of synonyms) derived from text contained in the taxonomic descriptions of diverse prokaryotic taxa that span the archaeal and bacterial domains of life. MicrO incorporates more than 12,000 additional relevant terms from 19 other ontologies in the OBO Foundry Library and these imported terms are connected to MicrO classes using a large number of logical axioms (over 24,130, with 5,446 specific to MicrO). The largest categories of classes in the ontology include assays (enzymatic, metabolic, and phenotypic assays), microbiological culture media and media ingredients, and prokaryotic qualities (including colony morphologies, shapes, and sizes). Other types of classes (such as those describing prokaryotic cell and cell parts) are scattered and nested within GO classes. Finally, a handful of classes in MicrO are scattered in various other parts of the ontology. The large-scale architecture of classes of material entities, processes, and qualities in MicrO, and how they nest in other ontologies, is shown in Additional file 1: Figures S1-S3.
Prokaryotic chemical entities
A large number of new chemical classes (>750) were entered into ChEBI as a result of MicrO development. New ChEBI classes include minerals (including sulfide minerals), stains/dyes, metabolic substrates, lipids, inorganic chemicals, and antibiotics. In addition, requests were made to add synonyms (188) to existing and new ChEBI classes. Many microbiologically specific chemical mixtures, however, were retained under MicrO. These were categorized into ‘defined inorganic chemical mixture’ (62 classes), ‘undefined inorganic chemical mixture’ (4 classes), ‘defined organic chemical mixture’ (29 classes), and ‘undefined organic chemical mixture’ (121 classes; Additional file 1: Figure S4). Examples of defined inorganic chemical mixtures include ‘trace elements solution SL-6’ and ‘modified MJ synthetic sea water’. Examples of undefined inorganic chemical mixtures, used as ingredients in microbiological culture media, include ‘filtered aged seawater’ and ‘sea salt’. Examples of defined organic chemical mixtures include ‘Balch vitamin solution’, ‘dried bovine hemoglobin’, and ‘hemin solution’. Examples of undefined organic chemical mixtures include ‘clarified rumen fluid’, ‘ox bile salts’, ‘egg yolk oil’, ‘laked rabbit blood’, and ‘inspissated serum’. Additional classes were created for complex mixtures that were produced from hydrous, enzymatic, or chemical extraction of other material entities (e.g., ‘yeast extract’, ‘proteose peptone’, ‘casamino acids’, ‘crude oil extract’, and ’casein hydrolysate’).
Culture media recipes
Microbiological culture media recipes (~910 classes) were included, under the parent class OBI:’culture medium’ (Fig. 1). Annotations include the recipe, the citation or web link to the recipe, and synonyms of the class. Logical axioms included the chemical ingredients used for each medium (connecting MicrO terms to ChEBI terms). Value Partitions were created to categorize different types of culture media. For example, one Value Partition is related to the pH of the medium; whether it was strongly acidic (pH <4), moderately acidic (pH 4–5.5), slightly acidic (pH 5.5–6.5), near neutral pH (pH 6.5–7.5), slightly alkaline (pH 7.5–8.5), moderately alkaline (pH 8.5–10.0), or strongly alkaline (pH >10.0). Another Value Partition related to the salinity of the medium using salinity values that are commonly used in biology; whether it was freshwater (<0.05 % salts), brackish (0.05–3.0 %), marine (3.0–5.0 %), or hypersaline (> 5.0 %). A third Value Partition related to the redox (the oxidation-reduction potential) of the medium; whether it was oxidizing (oxygen or air were present and not containing reducing agents), mildly reducing (containing organosulfides or thiosulfate), or strongly reducing (containing cysteine, glutathione, 2-mercaptoethanol, dithiothreitol, sodium sulfide, hydrogen sulfide, dithionite, or titanium citrate). Covering axioms were put in place for each of the Value Partitions. The logical axioms that were created were designed to facilitate future studies that rely on the logical inference power of the ontology to gain higher-order knowledge of microbial taxa based on the chemical composition of their growth media, such as studies seeking to identify correlations between phylogeny and culture medium chemistry [45]. Finally, the logical axioms put in place can help fill out the knowledge gap of MicroPIE. For instance, taxonomic descriptions will often state the type of media in which an organism is capable of growing. The logical inference power made possible by the ontology allows MicroPIE to immediately compute the chemical conditions under which that particular organism is capable of growing (even if given only the names of the culture medium).
Assays
A large number of classes (~570) describe microbiological diagnostic assays, under the parent class OBI:‘assay’. Assays include cell staining assays, commercial suites of diagnostic assays (e.g., API microbial ID test kits, Biolog, RapID, and VITEK), salinity, pH and redox assays), a large number of organic carbon metabolism assays (including organic acid alkalinization assays, organic carbon assimilation assays, organic carbon fermentation assays, and organic carbon fermentation/oxidation assays), milk reactivity assays, motility assays, hemadsorption/hemagglutination/hemolysis assays, coagulase assays, growth response assays (including growth response to various antibiotics, inorganic chemicals, and organic chemicals), and finally a large number of specific enzymatic assays (e.g. ‘beta-galactosidase assay’, ‘catalase assay’, ‘lecithinase assay’, ‘pyruvate decarboxylase assay’).
Assays, with axioms connecting substrates, products, and enzymatic activities were important to have in the ontology, because most prokaryotic taxonomic descriptions describe the outcomes of particular assays performed on the particular isolate being described and logical axioms for this set of classes tended to be more complex. The assays are logically connected to chemical entities (e.g. ‘is an assay for the metabolic product’ some ‘hydrogen sulfide’ and ‘is an assay using the culture medium’ some ‘sulfide indole motility agar’) and processes (e.g., ‘is an assay for the biological process of’ some ‘cell motility’ and ‘is an assay for the enzymatic activity of’ some ‘tryptophanase activity’; Fig. 2 and Additional file 1: Figure S5). Logical axioms also include the enzymatic substrates (some of which are colorimetric compounds, such as ‘5-bromo-4-chloro-3-indolyl beta-D-galactoside’) and products, and the culture medium used to perform the test (e.g., ‘is an assay using the culture medium’ some ‘sulfide indole motility agar’).
Sometimes, taxonomic descriptions will report lists of enzymatic reactions that were tested and provided a positive or negative test result (e.g., positive for valine arylamidase), while other times they will report lists of the substrates hydrolyzed or not hydrolyzed (e.g., L-valine-2-naphthylamide hydrolyzed). The structure of the ontology connects these two concepts and recognizes that they both relate to the same enzymatic trait (in this case, valine arylamidase activity, assayed using the L-valine arylamidase assay). This is accomplished by including the assay substrates (in this case L-valine-2-naphthylamide) as a substrate in the logical axiom for the valine arylamidase assay class.
Prokaryotic qualities
Several classes (97) were created to describe prokaryotic qualities. These include prokaryotic cell part qualities (such as ‘gas vacuole quality’, ‘thylakoid quality’, ‘Gram stain quality’, and ‘prokaryotic cell wall lysis susceptibility’), prokaryotic cell qualities (such as ‘cell granulation’, ‘cell pigmentation’, ‘cell size quality’, and ‘flagellar quality’), and ‘prokaryotic colony quality’. Classes also included prokaryotic metabolic qualities (‘aerobic’, ‘microaerophilic’, ‘aerotolerant’, ‘obligately aerobic’, ‘photofermentative’, ‘chemolithoautotrophic’, ‘photoorganoheterotrophic’, etc.) and prokaryotic physiological qualities (including ‘barophilic’, ‘obligately barophilic’, ‘barotolerant’, and ‘requires magnesium for growth’).
Prokaryotic cell and cellular components
Many new classes (255) were placed under the parent ‘prokaryotic cell’ including ‘flagellated cell (with subclasses including ‘multiply flagellated’, ‘amphilophotrichous cell’, ‘amphitrichous cell’, ‘lophotrichous cell’, and ‘peritrichous cell’), ‘gas vacuolated cell’, ‘granulated cell’, nanocytes, and ‘pigmented cell’. Classes under ‘morphologically distinct prokaryotic cell’ include ‘bacilloid cell’, ‘cuboidal cell’, ‘pear-shaped cell’, and ‘prosthecate cell’. Classes under ‘prokaryotic differentiated cell’ include ‘hormogonium’, ‘central endospore’, lateral endospore’, ‘subterminal endospore’, ’basal heterocyte’, and ‘terminal heterocyte’. Classes under ‘prokaryotic metabolically differentiated cell’ include ‘autotroph’, ‘obligate aerobe’, and ‘chemoorganoheterotroph’. Classes under ‘prokaryotic physiologically differentiated cell’ include ‘acidophile’, ‘obligate barophile’, thermophile, and ‘facultative halophile’. Classes under ‘differentiated cyanobacterial filament part’ include ‘conical apical cell’, ‘tapered by apical narrowing’, ‘isopolar metameric’, ‘multiseriate filament’, and ‘subterminal meristematic zones’.
Classes (49) were created to describe prokaryotic colonies. The structural organization of classes relating to colonies with distinct morphologies, sizes, and shapes, mirrored the class organization of ‘morphology’, ‘size’, and ‘shape’ in PATO (Additional file 1: Figure S6). This helped to facilitate the construction of logical axioms between classes in MicrO and PATO. For example, under the parent class ‘prokaryotic colony’ were placed the classes ‘morphologically distinct colony’, ‘physically distinct colony’, and ‘colony having distinct process quality’. ‘Morphologically distinct colony’ is logically defined as ‘prokaryotic colony’ and ‘has morphology’ some ‘PATO:morphology’.
MicrO classes of cell parts (~128 classes) include ‘pseudopeptidoglycan-based cell wall’, ‘teichoic acid-based cell wall’, ‘sheath’, and ‘proteinaceous sheath’. Additional prokaryotic cell parts include ‘cyanobacterial filament part’, ‘filament branch’, trichome, ‘heteropolar trichome’, ‘tapered trichome’, ‘isopolar trichome’, ‘trichome part’, ‘apical cell’, ‘basal heterocyte’, ‘medial cell’, ‘necritic cell’, etc. Under ‘cyanobacterial filament’, classes include ‘multi-trichomous filament’, ‘multiseriate filament’, ‘biseriate filament’, and ‘uniseriate filament’. Our plan is to submit term requests for relevant classes of cell parts that should belong in GO.
Prokaryotic biological processes
Finally, 41 classes were created that defined prokaryotic biological processes (lithotrophy, mixotrophy, anaerobic respiration using various electron acceptors and donors). These classes are embedded into GO classes, and may be expanded upon and incorporated into GO in the future. Logical axioms connect these biological processes with chemical entities (e.g. ‘uses electron acceptor’ some ‘nitrate’, ‘uses carbon source’ some ‘organic molecular entity’), other processes (e.g., ‘has part’ some ‘phototrophy’ and ‘has part’ some ‘heterotrophy’), and biological entities (e.g., ‘is prokaryotic metabolic process occurring in’ some ‘mixotroph’).
Object and datatype properties
In order to connect classes in MicrO to those in external ontologies, we imported object properties from IAO, OBI, RO, and Uberon. We also created ~77 new object and datatype properties to relate microbial-specific classes to one another (Additional file 1: Table S2). Many of the new Object Properties are nested within OBI or RO parent classes. New object properties were assigned definitions and (when possible) domains and ranges.
Application and future directions
Microbial diversity is vast. Our ontology did not focus on pathogenic phenotypes (such as hosts, target organs, and diseases). These are areas that will need further ontology integration with other existing ontologies (for example, with OMP, the Disease Ontology, Infectious Disease Ontology, the Pathogenic Disease Ontology, and the Human Disease Ontology) [46–48]. MicrO also did not focus on microbial habitats. Development of ENVO is ongoing and the incorporation of microbial habitats into ENVO is a potential fruitful new approach for integrating MicrO with ENVO. Also, there are a number of new prokaryote-focused ontologies in development focusing on microbial metagenomic metadata and microbial habitats/environments (such as MEOWL; Microbial Environments described using OWL; https://github.com/hurwitzlab/meowl). These can be incorporated into MicrO and formal logical axiom linkages added to further increase axiomization of microbial terms. Finally, our ontology did not cover traits associated with microbial eukaryotes.
In the near future, we plan to incorporate MicrO into our developing NLP program (MicroPIE), and in doing so will greatly increase the computing power of MicroPIE. Currently, MicroPIE relies on term lists, which treat each term as an individual entity. MicroPIE cannot determine that the terms ‘rod’, ‘bacillus’, ‘bacilli’, ‘elongated cocci’, and ‘short cylinders’ are all synonyms for the same concept (a bacillus shape). MicrO, with its controlled vocabulary, logical axioms, and annotations including synonyms, can inform NLP programs like MicroPIE that these are indeed the same class, and hence streamline the functionality of the algorithm. The ontology will help MicroPIE recognize that terms such as ‘mixotroph’ and ‘mixotrophic’ all point to the same concept (the ability to carry out process of mixotrophy). The ontology will also reduce confusion in facilitating the identification of synonymous concepts when it comes to the varied reporting of the results of prokaryotic diagnostic assays (as discussed above).
Because of the logical inference power provided by the ontology, MicrO will allow algorithms like MicroPIE to infer new information about a microbial taxon that is not explicitly stated in the taxonomic description. For example, if an organism metabolizes glucose and is photosynthetic, MicrO-enabled MicroPIE can infer that it is a photoorganotroph. If an organism grows at 89 °C, MicrO-enabled MicroPIE can infer that it is a hyperthermophile (given that the logical definition for a hyperthermophile in MicrO constrains an organism’s optimal growth temperature to being above 85 °C). If an organism has akinetes, MicrO-enabled MicroPIE will be able to infer that it is in the Nostocales or Stigonematales (two Orders in the Cyanobacteria). These inferred character states can help to populate cells of a matrix that can be quite sparse when NLP is used to extract literal characters from text.
Additionally, MicrO will be able to support a future generation of bioinformatics capabilities for the microbiological community. For example, because MicrO connects phenotypic information and diagnostic assays with the enzymatic activities in GO, it could be used to support future work aimed at connecting microbial phenotypes with genotypes (i.e., the gene content in genomes). Exciting new tools and approaches for connecting phenotypes with genotypes are being developed for metazoans [49–51]. These tools could be adapted and expanded to similarly function with microbial taxa and microbial genomes in the future, given that the field of microbiology now has a rich ontology. In this manner, MicrO could be a useful tool for other researchers in the field of metagenomics and evolution of microbial phenotypic traits.