Using description logics to evaluate the consistency of drug-class membership relations in NDF-RT
© Winnenburg et al.; licensee BioMed Central. 2015
Received: 16 December 2014
Accepted: 27 February 2015
Published: 28 March 2015
The NDF-RT (National Drug File Reference Terminology) is an ontology, which describes drugs and their properties and supports computerized physician order entry systems. NDF-RT’s classes are mostly specified using only necessary conditions and lack sufficient conditions, making its use limited until recently, when asserted drug-class relations were added. The addition of these asserted drug-class relations presents an opportunity to compare them with drug-class relations that can be inferred using the properties of drugs and drug classes in NDF-RT.
We enriched NDF-RT’s drug-classes with sufficient conditions, added property equivalences, and then used an OWL reasoner to infer drug-class membership relations. We compared the inferred class relations to the recently added asserted relations derived from FDA Structured Product Labels.
The inferred and asserted relations only match in about 50% of the cases, due to incompleteness of the drug descriptions and quality issues in the class definitions.
This investigation quantifies and categorizes the disparities between asserted and inferred drug-class relations and illustrates issues with class definitions and drug descriptions. In addition, it serves as an example of the benefits DL can add to ontology development and evaluation.
KeywordsOntology Description logics Quality assurance National drug file-reference terminology
We rely on ontologies throughout biomedicine, from the life sciences to the clinic . As Electronic Health Record adoption increases in the clinic, so too will the reliance on the ontologies that facilitate their meaningful use. Clinical decision support and analytics are functions supported by ontologies. For example, computerized physician order entry (CPOE) systems typically leverage drug ontologies to ensure that patients are safely prescribed drugs in accordance with clinical guidelines (e.g., ).
An example of such an ontology is the National Drug File-Reference Terminology (NDF-RT), an extension to the drug formulary used by the Veterans Administration and developed using a description logics (DL) formalism. It provides a rich description of pharmacologic classes in reference to properties, such as mechanism of action, physiologic effect, chemical structure and therapeutic intent. NDF-RT can be leveraged to prevent a patient allergic to penicillin drugs from being prescribed amoxicillin, a penicillin antibacterial.
However, NDF-RT only specifies necessary conditions for class membership to the pharmacologic classes, but not sufficient conditions. (In DL parlance, these classes are “primitive”, not defined.) As a consequence, a DL reasoner is unable to classify automatically drugs as members of a given pharmacologic class, even when both drugs and pharmacologic classes are described in terms of the same properties. The inability to classify drugs into their classes limits the usefulness of NDF-RT in systems like CPOE that rely on such information.
In previous work, where we overcame this limitation by augmenting the pharmacologic classes with necessary and sufficient conditions, we found that we could infer drug-class membership relations effectively . Specifically, we demonstrated the use of a modified version of NDF-RT for clinical decision purposes (patient classification). One limitation of this work was that we did not evaluate the inferred drug-class membership relations beyond our proof-of-concept application.
NDF-RT recently integrated authoritative drug-class membership assertions extracted from the Structured Product Labels (package inserts) by the Food and Drug Administration (FDA), along with a specification of the drugs in terms of the same properties used for specifying the classes. These assertions remove the drug-class membership limitation we highlighted earlier, instead providing explicit drug-class membership relations that do not rely on DL reasoning. But precisely because these asserted drug-class relations have been made independently of the logical definitions of the classes, there is the possibility for the asserted and inferred drug-class membership relations to be inconsistent.
The objective of this work is to evaluate the consistency of the drug-class membership relations that were inferred from the pharmacologic class definitions and drug descriptions, against the newly asserted, authoritative drug-class membership relations. This evaluation is also an indirect contribution to the assessment of the class definitions and the drug descriptions in terms of completeness and consistency (i.e., agreement between information sources).
NDF-RT drugs and classes
The National Drug File Reference Terminology (NDF-RT) is a resource developed by the Department of Veterans Affairs (VA), Veterans Health Administration, as an extension of the VA National Drug File . Like other modern biomedical terminologies, NDF-RT is developed using description logics and is available in native XML format. The version used in this study is the latest version available, dated November 3, 2014, downloaded from , from which we derived our augmented representation.
This version covers 7,287 active moieties (DRUG_KIND, level = ingredient), as well as 543 Established Pharmacologic Classes (EPCs) specified in reference to some of the properties of the active moieties. NDF-RT now contains several sources of relations between drugs and their properties. The April 2014 version of NDF-RT introduced a new set of relations between drugs and their properties originating from the class indexing file released as part of DailyMed, identified by the suffix “FDASPL”. Moreover, this version also introduced authoritative drug-class membership assertions from the same source. Finally, NDF-RT also provides a specification of the EPCs in reference to the same properties used for describing the drugs themselves, provided by “Federal Medication Terminologies subject matter experts” and identified by the suffix “FMTSME”. In this work, we focus on the drug-property assertions from FDASPL, class-property assertions from FMTSME, and drug-class assertions provided by the FDA.
In short, Description Logics (DL) are a set of logical constructs with which one can develop ontologies. Krötzsch and colleagues provide a more formal introduction to DL . Like other knowledge representation methods, DL allows one to specify, in a computable fashion, the entities (i.e., classes) that exist in a given domain and the relationships (i.e., relations) between them. In comparison to older methods of knowledge representation, DL ensures common, unambiguous semantics so that the ontology’s interpretation is consistent across software and users. This consistent logical underpinning enables the use of reasoners, which are programs that compute (i.e., infer) the logical entailments (i.e., conclusions) of a given ontology. For example, if Alprostadil has physiologic effect Venous dilation and Venous dilation is-a Vasodilation, a reasoner concludes that Alprostadil has physiologic effect Vasodilation. A typical approach to developing ontologies with DL is to specify a set of properties that each class has (e.g., Penicillin antibacterial has ingredient Penicillin and treats or prevents Bacterial infection; Antiseptic treats or prevents Bacterial infection) and then infer the additional relations among classes. With a set of specified classes, a reasoner can then classify them into an inferred hierarchy. In our example, the inferred hierarchy would show that Penicillin antibacterial is-a Antiseptic. In the context of this study, NDF-RT uses this same approach, specifying EPCs in terms of their properties. Unlike the example above, however, pharmacologic classes in NDF-RT (EPCs) are “primitive”, in that they only specify the necessary conditions of class membership, and therefore prevent a reasoner from constructing a useful inferred hierarchy. Later, we describe how we enrich NDF-RT with sufficient conditions so that we can take full advantage of a reasoner.
In this work, we use OWL, the web ontology language, a web standard for developing ontologies that leverages DL. OWL is the de facto standard for biomedical ontologies and there is a suite of tools for developing OWL ontologies, including development environments such as Protégé  and reasoners such as HermiT .
In addition to being used as a framework for building ontologies, DL has been shown to be useful for reasoning with biomedical entities, including protein phosphatases  and penetrating injuries . However, to our knowledge, DL reasoning has not yet been applied to the automatic classification of drugs, except for our previous work on anti-coagulants .
NDF-RT is used frequently as a resource for standardizing pharmacologic classes (e.g., [11,12]). However, investigators generally use the drug properties as classes (e.g., drugs that have the physiologic effect “decreased coagulation activity” for anti-coagulants), rather than the EPCs. Moreover, only asserted relations are used in most investigations, as opposed to inferred drug-class relations.
The specific contribution of this paper is the augmentation of the logical definitions of pharmacologic classes in NDF-RT to enable the automatic inference of drug-class membership relations using a DL reasoner. We substantially extend our previous work on anticoagulants, by generalizing it to all pharmacologic classes and providing a comparison to authoritative, asserted drug-class relations from the FDA.
Our approach to evaluating inferred drug-class membership relations in NDF-RT is summarized as follows. First, we converted the NDF-RT data from their original format (XML) to a DL format (OWL). This conversion process augments the EPCs with necessary and sufficient conditions. These conditions allowed a DL reasoner to classify drugs into their respective classes using the class definitions and the properties of drugs. We created two OWL datasets. One, used as a gold standard, only contains the asserted, authoritative drug-class relations. In contrast, these asserted relations have been removed from the second dataset, so that only inferred drug-class relations were present after the reasoner runs (i.e., inferred by the reasoner). We ran a DL reasoner and then compared inferred and asserted drug-class relations from the perspective of drugs and from that of classes.
In order to restrict this investigation to clinically significant drugs, we mapped all NDF-RT ingredients to RxNorm and required that ingredients be linked to clinical drugs. We further normalized all ingredients to base ingredients in RxNorm, to abstract away from minor differences in ingredients, including salts, esters and complexes, which rarely affect drug-class membership. In practice, we mapped the “precise ingredients” in RxNorm (e.g., albuterol sulfate) to their base ingredient (albuterol). Multi-ingredient drugs were ignored, because there is often more variability in their classification.
Augmenting pharmacologic classes with sufficient conditions
In order to produce the two OWL datasets used for comparing asserted and inferred drug-class relations, we started by creating a “baseline” OWL representation from the original XML dataset, which we used as our asserted dataset (dataset “A”). Next, as previously described in , we transformed the primitive EPCs into defined classes by taking the existing set of properties for each class (i.e., necessary conditions) and using them to “define” the class. In particular, all properties are folded into a single owl:equivalentClass (≡) axiom, thereby specifying necessary and sufficient conditions of each class. For the purpose of this work, we focus on the three main properties used for the description of the drugs (mechanism of action, physiologic effect and chemical structure). Additionally, we leveraged the therapeutic intent relations (may_treat and may_prevent) present in NDF-RT, because many EPCs refer to them in their definitions. These relations link drugs and EPCs to disease entities.
has_MoA_FMTSME ≡ has_MoA_FDASPL (for mechanism of action),
has_PE_FMTSME ≡ has_PE_FDASPL (for physiologic effect),
has_Chemical_Structure_FMTSME ≡ has_Chemical_Structure_FDASPL,
may_treat_FMTSME ≡ may_treat_NDFRT, and
may_prevent_FMTSME ≡ may_prevent_NDFRT.
Inferring relations between drugs and EPCs
A secondary benefit of the classification with an OWL reasoner is that it creates a hierarchy of the pharmacologic classes themselves, based on their logical definitions. For example, beta2-Adrenergic Agonist [EPC] (N0000175779) is inferred to be a subclass of beta-Adrenergic Agonist [EPC] (N0000175555), because the definition of beta2-Adrenergic Agonist [EPC] shown earlier is more specific than that of beta-Adrenergic Agonist [EPC] ('Pharmaceutical Preparations' and (has_MoA_FMTSME some 'Adrenergic beta-Agonists [MoA]')). For this reason, we reclassified both OWL datasets, although no inferred drug-class relations were generated in dataset “A”.
Comparing asserted and inferred drug-class relations
We compared asserted (dataset “A”) and inferred (dataset “I”) drug-class relations from the perspective of drugs and pharmacologic classes, respectively. In both cases, we issued queries against the OWL datasets (after reclassification). For each drug, we queried its set of pharmacologic classes in each dataset and determined which classes are common to both datasets vs. specific to one dataset. For example, the drug albuterol (N0000147099) has the same class in both datasets, beta2-Adrenergic Agonist [EPC] (N0000175779). In contrast, the drug hydrochlorothiazide (N0000145995) has an asserted relation to Thiazide Diuretic [EPC] (N0000175419), but an inferred relation to Thiazide-like Diuretic [EPC] (N0000175420). For each pharmacologic class, we queried its set of drugs in each dataset and determined which drugs are common to both datasets vs. specific to one dataset. In order to consider higher-level classes to which drugs are not direct members, we used the transitive closure of the hierarchical relation rdfs:subClassOf. As a consequence, a given class will have as members not only its direct drugs, but also the members of all its subclasses. For example, in both the “A” and “I” datasets, the class beta-Adrenergic Agonist [EPC] has the base ingredient albuterol as an indirect member through its subclass class beta2-Adrenergic Agonist [EPC]. Of note, the salt ingredient albuterol sulfate is ignored as a result of the normalization to RxNorm base ingredients described earlier.
The modifications described above were performed using an XSL (eXtensible Stylesheet Language) transformation. The resulting OWL file was classified with HermiT 1.2.2 . Protégé 5.0 was used for visualization purposes . The OWL file containing the inferences computed by the reasoner was loaded in the open source triple store Virtuoso 7.10 . The query language SPARQL was used for querying drug-class relations
Asserted and inferred drug-class relations
Drug-class relations (direct), drug perspective
Drugs related to drug classes
Drugs with identical sets of classes for the asserted and inferred drug-class relations
Drugs with compatible sets of classes (each class from the asserted is identical to or hierarchically related to a class in the inferred set)
Drugs with additional drug-class relations in the asserted set only
Drugs with additional drug-class relations in the inferred set only
Drugs with additional drug-class relations in both the asserted and inferred set
Drugs with asserted drug-class relations only (no inferred relations)
Drugs with inferred drug-class relations only (no asserted relations)
Total number of related drugs
Drug-class relations (direct and indirect), class perspective
Drug classes related to drugs
Classes with identical sets of drugs for the asserted and inferred drug-class relations
Classes with additional drug-class relations in the asserted set only
Classes with additional drug-class relations in the inferred set only
Classes with additional drug-class relations in both the asserted and inferred set
Classes with asserted drug-class relations only (no inferred relations)
Classes with inferred drug-class relations only (no asserted relations)
Total number of related classes
Perspective of drugs
For each drug, we compare the set of (direct) pharmacologic classes in datasets “A” and “I”. The various types of differences observed between asserted and inferred drug-class relations are presented in Table 1. The largest category corresponds to drugs with identical sets of asserted and inferred drug-class relations (50%). For example, the drug imatinib has the same class Kinase Inhibitor [EPC] in both datasets. Drugs with asserted drug-class relations, but lacking inferred drug-class relations represent 23% of the cases. For example, the drug losartan has the class Angiotensin 2 Receptor Blocker [EPC] in dataset “A”, but no class in dataset “I”.
Perspective of pharmacologic classes
For each pharmacologic class, we compare the set of (direct and indirect) drug members in datasets “A” and “I”. The various types of differences observed between asserted and inferred drug-class relations are presented in Table 2. As we observed for drugs, the largest category corresponds to EPCs with identical sets of asserted and inferred drug-class relations (52%). For example, the class Monoamine Oxidase Inhibitor [EPC] has the same five drugs in both datasets, including isocarboxazid and rasagiline. EPCs with asserted drug-class relations, but lacking inferred drug-class relations also represent about 27% of the cases. For example, the class Quinolone Antibacterial [EPC] has eight drugs in dataset “A”, including ofloxacin and levofloxacin, but no members in dataset “I”.
Disparities between asserted and inferred drug-class relations
As mentioned in the results, the largest category of disparity is represented by missing inferred drug-class relations, including cases where there are no inferred relations at all and cases where inferred relations only cover part of the asserted relations. Missing inferences should not be interpreted as an inherent failure of the OWL reasoner to identify drug-class relations, but rather as issues with the completeness and quality of class definitions and drug descriptions (see below for details). For example, the reason why the drug lurasidone, a drug indicated for the treatment of schizophrenia, has an asserted, but not inferred drug-class relation to Atypical Antipsychotic [EPC] is because the therapeutic intent of lurasidone (Schizophrenia and Disorders with Psychotic Features) is not described in the dataset. In fact, there is no drug property asserted for lurasidone by FDASPL. Another example is the drug ofloxacin mentioned earlier. In this case, the asserted EPC (Quinolone Antimicrobial [EPC]) is not inferred because its definition includes both may_treat Infectious Diseases and may_prevent Infectious Diseases, while the drug description only includes treatment, not prevention (e.g., may_treat 'Klebsiella Infections). Similarly, the description of the drug ipilimumab is too underspecified to match the definition of its asserted class, CTLA-4-directed Blocking Antibody [EPC]. In addition to has_MoA CTLA-4-directed Antibody Interactions, which is in the drug description, the EPC also makes references to the physiologic effect (has_PE Increased Immunologic Activity and has_PE Increased T Lymphocyte Activation).
Inferences with no corresponding asserted relations
The number of cases (156 drugs and 43 classes) where inferred drug-class relations are found when there is no asserted drug-class relation (or a different asserted drug-class relation) is interesting as it can help detect potentially missing asserted relations. For example, the drug bupropion has a single asserted relation to the structural class Aminoketone [EPC]. However, it has an inferred relation to Norepinephrine Reuptake Inhibitor [EPC] (through its mechanism of action, Norepinephrine Uptake Inhibitors [MoA]). In this case, the set of asserted relations, which we use as our reference, seems to be incomplete. Another example is the drug isosorbide, an anti-angina agent, for which we correctly infer the class Anti-anginal [EPC], while no asserted EPC is present. Here again, the reference is incomplete.
Inconsistent drug-class relations due to granularity differences
Drug-class relations from dataset “A” tend to associate drugs with more specific classes than in dataset “I”. For example, the antibiotic amikacin is associated with Aminoglycoside Antibacterial [EPC] (through asserted relations), but with the less specific Aminoglycoside [EPC] (through inferred relations). The reason here is similar to what was described earlier for the antibiotic ofloxacin, i.e., discrepancy between may_treat and may_prevent vs. only may_treat properties on the side of the EPC and the drug, respectively. As shown in Table 1, we identified 127 drugs for which the classes in sets “A” and “I” are hierarchically related. Of these, there are only 4 cases with an inferred relation to a class that is more specific than the class involved in the asserted relation.
Specific contribution of the therapeutic intent relations
The DailyMed indexing file provided by the FDA (FDASPL) only contains drug descriptions in reference to mechanism of action, physiologic effect and chemical structure, not therapeutic intent. However, many EPC definitions refer to may_treat and may_prevent relations. Therefore, no drug-class relations to these classes can be inferred, because the corresponding relations are missing from the drug descriptions. Therapeutic intent relations are available for the drugs as part of the set of legacy relations provided by NDF-RT (not FDASPL). We used these relations to complement the relations from FDASPL in order to maximize our chances to infer drug-class relations to the EPCs. We assessed the specific contribution of the therapeutic intent relations to the inference of drug-class relations by computing a “baseline” without using the therapeutic intent relations and comparing it to our dataset “I”.
Specific contributions of enhancement step
# drugs with
Additional in both
Total # drugs
Total # pairs
Drug class perspective
# classes with
Additional in both
Total # classes
Total # pairs
For example, the drug citalopram was only associated with the inferred class Serotonin Reuptake Inhibitor [EPC] in the baseline (based on its mechanism of action), which was also its asserted EPC. In addition, it acquires a relation to Mood Stabilizer [EPC] when using the therapeutic intent relations, resulting in one additional inferred class compared to the asserted class. This example illustrates why the use of therapeutic intent relations does not significantly increase the number of drugs with similar sets of asserted and inferred classes.
Description logics and quality assurance
There is a range of automated ontology quality assurance methods in the literature . The results of this work highlight the usefulness of DL for that task. Here, we enriched the logic in NDF-RT to enable us to evaluate the quality and completeness of new, explicitly-added knowledge. Indeed, such rich logic allows for a quick evaluation at minimal cost. In this work, we had a reference against which to compare. However, when a gold standard is not available, DL reasoners can still check consistency and satisfiability, automatically detecting logical contradictions that usually indicate an error exists in the ontology. For instance, Horridge et al. used reasoning to identify contradictions within ICD-11 . Unfortunately, even considering the benefits of a richly defined ontology, Noy and colleagues confirmed empirically that most biomedical ontologies do not use rich semantics but instead rely mostly on simple hierarchical subsumption relations .
As we rely increasingly on ontologies, it is important to ensure their content is complete and correct. In this work, we developed a methodology to evaluate the content of NDF-RT using description logics. We found that the inferred and asserted relations only matched in about 50% of the cases. Ideally, the asserted and inferred drug-class relations should be identical. Our results suggest that there is an opportunity for quality assurance of NDF-RT content (completeness of the drug descriptions and quality of the class definitions). This work serves as an exemplar of how DL can enhance ontology development and evaluation and shows ontology developers that a little semantics can go a long way.
National drug file – reference terminology
Extensible markup language
Established pharmacologic classes
Extensible stylesheet language
Web ontology language
Anatomic therapeutic chemical classification system
Food and drug administration
SPARQL Protocol and RDF query language
International classification of diseases 11th revision
This work was supported by the Intramural Research Program of the NIH, National Library of Medicine (NLM).
- Bodenreider O, Stevens R. Bio-ontologies: current trends and future directions. Brief Bioinform. 2006;7:256–74.View ArticleGoogle Scholar
- Bright TJ, Yoko Furuya E, Kuperman GJ, Cimino JJ, Bakken S. Development and evaluation of an ontology for guiding appropriate antibiotic prescribing. J Biomed Inform. 2012;45:120–8.View ArticleGoogle Scholar
- Bodenreider O, Mougin F, Burgun A. Automatic determination of anticoagulation status with NDF-RT. In: 13th ISMB'2010 SIG meeting "Bio-ontologies". 2010. p. 140–3.Google Scholar
- Lincoln MJ, Brown SH, Nguyen V, Cromwell T, Carter J, Erlbaum M, et al. U.S. Department of Veterans Affairs Enterprise Reference Terminology strategic overview. Stud Health Technol Inform. 2004;107:391–5.Google Scholar
- National Drug File Reference Terminology (NDF-RT). [http://evs.nci.nih.gov/ftp1/NDF-RT/]
- Krötzsch M, Simancik F, Horrocks I. A Description Logic Primer, arXiv preprint arXiv:12014089. 2012. p. 1–17.Google Scholar
- Protégé. [http://protege.stanford.edu/]
- HermiT. [http://hermit-reasoner.com/]
- Wolstencroft K, Lord P, Tabernero L, Brass A, Stevens R. Protein classification using ontology classification. Bioinformatics. 2006;22:e530–8.View ArticleGoogle Scholar
- Rubin DL, Dameron O, Musen MA: Use of description logic classification to reason about consequences of penetrating injuries. AMIA Annu Symp Proc. 2005;649–653.
- Zhu Q, Jiang G, Wang L, Chute CG. Standardized drug and pharmacological class network construction. Stud Health Technol Inform. 2013;192:1125.Google Scholar
- Wang L, Jiang G, Li D, Liu H. Standardizing drug adverse event reporting data. Stud Health Technol Inform. 2013;192:1101.Google Scholar
- Virtuoso. [http://virtuoso.openlinksw.com/]
- Zhu X, Fan JW, Baorto DM, Weng C, Cimino JJ. A review of auditing methods applied to the content of controlled biomedical terminologies. J Biomed Inform. 2009;42:413–25.View ArticleGoogle Scholar
- Horridge M, Parsia B, Noy NF, Musen MA. Reasoning Based Quality Assurance of Medical Ontologies: A Case Study. AMIA Annu Symp Proc. 2014; 671-80
- Noy NF, Mortensen JM, Alexander PR, Musen MA. Mechanical Turk as an Ontology Engineer? Using Microtasks as a Component of an Ontology Engineering Workflow. Proceedings of the 4th Annual ACM Web Science Conference, 2013; 262-271.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.