Extracting drug-enzyme relation from literature as evidence for drug drug interaction
© Zhang et al. 2016
Received: 15 November 2015
Accepted: 11 February 2016
Published: 7 March 2016
Information about drug–drug interactions (DDIs) is crucial for computational applications such as pharmacovigilance and drug repurposing. However, existing sources of DDIs have the problems of low coverage, low accuracy and low agreement. One common type of DDIs is related to the mechanism of drug metabolism: a DDI relation may be caused by different interactions (e.g., substrate, inhibit) between drugs and enzymes in the drug metabolism process. Thus, information from drug enzyme interactions (DEIs) serves as important supportive evidence for DDIs. Further, potential DDIs present implicitly could be detected by inference and reasoning based on DEIs.
In this article, we propose a hybrid approach to combining machine learning algorithm with trigger words and syntactic patterns, for DEI relation extraction from biomedical literature. The extracted DEI relations are used for reasoning to infer potential DDI relations, based on a defined drug-enzyme ontology incorporating biological knowledge.
Evaluation results demonstrate that the performance of DEI relation extraction is promising, with an F-measure of 84.97 % on the in vivo dataset and 65.58 % on the in vitro dataset. Further, the inferred DDIs achieved a precision of 83.19 % on the in vivo dataset and 70.94 % on the in vitro dataset, respectively. A further examination showed that the overlaps between our inferred DDIs and those present in DrugBank were 42.02 % on the in vivo dataset and 19.23 % on the in vitro dataset, respectively.
This paper proposed an effective approach to extract DEI relations from biomedical literature. Potential DDIs not present in existing knowledge bases were then inferred based on the extracted DEIs, demonstrating the capability of the proposed approach to detect DDIs with scientific evidence for pharmacovigilance and drug repurposing applications.
KeywordsDrug-enzyme interaction Pharmacokinetic drug-drug interactions Semantic graph kernel Ontology-based inference Relation extraction Literature mining
Drug–drug interaction (DDI) is a situation when one drug alters the effect of another drug in a clinically meaningful way . It has been demonstrated as one of the major causes of adverse drug reactions and a threat to public health [2–4]. Existing resources of DDIs include expert-curated knowledge bases such as DiDB (http://www.druginteractioninfo.org/), DrugBank (http:// www.drugbank.ca/), and pharmacy clinical support systems . Significant efforts have been invested to incorporate DDIs into various data sources. However, existing sources suffer from the problems of low coverage , low accuracy  and low agreement .
Under such circumstance, scientific evidence revealing the mechanism behind the drug interactions are necessary to provide support for reliable DDI information . One common type of DDIs is related to the mechanism of drug metabolism. For example, suppose drug A is a substrate of enzyme E, i.e., enzyme E is responsible for the metabolism of drug A. If the enzyme is inhibited or induced by drug B, the metabolism process of the drug A may be affected. Thus, the bioavailability of drug A could be different than expected, potentially causing adverse effect . Therefore, drug-enzyme interactions (DEIs) serve as one type of important supportive evidence for DDIs. Besides, DDIs not explicitly stated in text may be detected by linking and reasoning over DEIs published in different scientific articles.
Since newly reported DEIs are rapidly accumulating in the huge archive of scientific literature , text mining techniques are needed to automatically extract DEIs as supportive scientific evidence for DDIs . One pilot work in this direction is , which tried to extract the relations between drugs and enzymes based on properties of drug metabolism; potential DDIs were then detected by inference and reasoning. In , sentences in PubMed were stored as parse trees in a database, and SQL queries consisting of keywords and simple syntactic and semantic constraints were used to extract DEIs. SemRep , a widely used tool to extract relations from biomedical literature, also uses rule-based methods to extract DEI relations.
One problem with current DEI extraction methods is that their performance tend to be poor , given that sentences in scientific literature tend to be long and have complex structure. Hence, more data-driven, statistical methods such as machine learning algorithms are necessary to automatically improve the performance. Furthermore, no biological knowledge of concept hierarchies is involved in the inference process for DDIs currently. For example, if the drug Delavirdine is an inhibitor of CYP3A , it could be an inhibitor of all enzymes in the subfamily of CYP3A, such as CYP3A4. Potential DDIs between Delavirdine and drugs that are substrates of CYP3A4 could then be inferred. In this way, more implicit potential DDIs may be identified.
In this article, we propose a hybrid approach to extracting DEI relations. First, related drug enzyme pairs are extracted from sentences using the all-path graph kernel based machine-learning algorithm . Specific DEI relation types are then assigned according to trigger words and syntactic patterns. After that, variations of drug and enzyme names are normalized to remove redundant relations. In the last step, inference rules are built based on the drug-enzyme ontology and biological knowledge about mechanisms of drug metabolism and interaction. Using these inference rules, the extracted DEI relations are then used for reasoning and inferring potential DDI relations.
Our approach differs from existing approaches in two ways. First, we propose a hybrid method to improve the performance of DEI relation extraction. Second, we establish an ontology-based inference process, incorporating hierarchical relations between enzymes. Our evaluation results using the DEI corpus  demonstrates that our proposed approach outperforms SemRep significantly. Moreover, implicit DDI relations are inferred with supportive evidence from DEIs, which may contribute to existing DDI knowledge bases such as DrugBank.
Two DEI datasets, consisting of in vivo studies and in vitro studies, were used in this study. Our method involves three steps. First, related drug-enzyme pairs were extracted using an all-path graph kernel based machine-learning model. Different relation types were then assigned based on the trigger words and syntactic patterns. Second, variations of drug and enzyme names were normalized to remove redundant relations. In the last step, inference rules were built on the basis of drug-enzyme ontology and biological knowledge about mechanisms of drug metabolism and interaction. Using these inference rules, the extracted DEI relations were used for reasoning about potential DDI relations.
Example sentences with drug enzyme relations from literature
Sentence with drug enzyme interaction
Rifampin (INN, rifampicin) is a potent inducer of CYP3A4 and some other CYP enzymes.
Rifalazil-32-hydroxylation in microsomes was completely inhibited by CYP3A4-specific inhibitors (fluconazole, ketoconazole, miconazole, troleandomycin) and drugs metabolized by CYP3A4 such as cyclosporin A and clarithromycin, indicating that the enzyme responsible for the rifalazil-32-hydroxylation is CYP3A4.
Statistics of drug enzyme relation datasets
Our relation extraction method consisted of three steps. First, we represented sentences with dependency-based syntactic structures. Second, all-path graph kernels describing the syntactic connections within the sentences were generated from those representations. A Support Vector Machine (SVM) classifier was trained based on the graph kernels to generate a predictive model and to identify if the candidate drug-enzyme pair was related. In the last step, trigger words and syntactic patterns of different mechanisms of metabolism, i.e., “substrate”, “inhibitor”, “inducer”, were used for specific DEI relation assignment.
Sentences with candidate DEI pairs were represented by the dependency syntactic structure. For generalization, specific drug/enzyme names in a candidate DEI pair were replaced with “Drug”/“Enzyme” in a preprocessing step. For example, CYP2C9 and sildenafil in S 1 were replaced with Enzyme1 and Drug1.
S 1 : CYP2C9 exhibited substantial sildenafil N-demethylase activity.
All-path graph kernel
A graph kernel calculates the similarity between two input graphs by comparing the relations between common vertices. The weights of the relations are calculated using all possible paths between each pair of vertices. Our method follows the all-paths graph kernel proposed by Airola et al. . The kernel represented the target pair using graph matrices based on two sub-graphs. The first sub-graph represented the structure of a sentence using the dependency graph; the second sub-graph represented the word sequence in the sentence, and each of its word vertices contained its lemma, its relative position to the target pair and its POS; all edges received a weight of 0.9 as in  (please see Fig. 1(b)).
Relation type assignment
Trigger words and syntactic patterns of different DEI relation types
Trigger words & syntactic patterns
Drug … mediated/catalyzed/metabolized by Enzyme
Enzyme … responsible for/contribute to Drug metabolism
Drug … an inhibitor of Enzyme
Enzyme inhibitor (Drug)
Enzyme inhibit Drug …activity
Drug … as a potent inducer of Enzyme
In the DEI datasets employed in this study, the drug names were recognized using DrugBank and regular expressions of various drug metabolites; enzyme names were recognized using regular expressions of various forms of enzymes . Many variations of drugs and enzymes were annotated in the dataset. For example, “CBZ” is an abbreviation of the drug “Carbamazepine”. Both “P4503A4” and “3A4” were mentions of the enzyme “CYP3A4”. Hence, drug names and enzyme names were first normalized to reduce relation redundancy before the reasoning step. Drug names were normalized to concepts in Unified Medical Language System (UMLS)  using MetaMap . Enzyme names were normalized to CYP450 enzymes, as defined in the human cytochrome P450 allele nomenclature database, http://www.cypalleles.ki.se/. The number of extracted DEIs were reduced accordingly.
Knowledge representation and reasoning
Drug-enzyme ontology definition
Logic facts definition for drug drug interaction inference
Drug enzyme relation
isSubstrateOf (d, e)
Drug d is metabolized by enzyme e
isInhibitorOf (d, e)
Drug d inhibits the activity of enzyme e
isInducerOf (d, e)
Drug d induces the activity of enzyme e
Enzyme enzyme relation
isAncestorOf (e1, e2)
Enzyme e1 is an ancestor of enzyme e2 in the enzyme family
Drug drug relation
Drug d1 and drug d2 have an interaction
Drug enzyme ontology based inference
Rule 1: isSubstrateOf (d1, e) and isInhibitorOf (d2, e) - > DDI (d1, d2)
Rule 2: isSubstrateOf (d1, e) and isInducerOf (d2, e) - > DDI (d1, d2)
Rule 3: isSubstrateOf (d1, e1) and isAncestorOf (e1, e2) - > isSubstrateOf (d1, e2)
Rule 1 and Rule 2 encode the knowledge that if a given drug d1 is a substrate of enzyme e, and drug d2 is an inhibitor/inducer of enzyme e, then drug d1 and d2 have a potential interaction. Rule 3 defines that the isSubstrateOf relation can be inherited by a descendant enzyme from its ancestors. Similar rules of inheritance were then defined for the other drug-enzyme relations based on the enzyme hierarchical relations. The reasoner HermiT was employed for DDI relation inference, which could check consistency of ontologies, compute the classification hierarchy, and explain inferences (Horrocks, et al., 2012). The ontology can be downloaded from https://sbmi.uth.edu/ontology/files/DEIOntology.owl.
Machine learning (ML) algorithm
SVM algorithms are the dominant ML methods (Segura-Bedmar et al., 2013) among the existing DDI systems. This study used the sparse version of RLS, also known as the least squares SVM, to learn the DEI prediction model based on the all-path graph kernel .
POS-tags and dependency trees of the datasets were generated by Stanford parser . We used the standard evaluation measures (Precision, Recall and F- measure) to evaluate the performance. We evaluated the performance of our system on each test dataset after training on the corresponding training dataset. Because our datasets were imbalanced with much more ‘NDEI’ relations then “DEI” relations, the same candidate drug-enzyme pair present in multiple instances may be classified as ‘DEI’ in one instance and as ‘NDEI’ in another. In this case, we treated this candidate DEI pair as a true ‘DEI’ pair to enhance the precision. Hence, the performance evaluation of relation extraction was carried out at the entity-level instead of the sentence level.
Comparison of DEI relation extraction performance between the all-path graph kernel based model (GraphKernel) with the model of java simple relation extraction (JSRE) . JSRE is another state-of-the-art relation extraction model. It has demonstrated comparable performance with the all-path graph kernel based model in protein-protein interaction relation extraction [14, 23]. Different kernel options and parameters provided by JSRE were examined by 10-fold cross validation on the training datasets. The optimal performance of JSRE was used for comparison in our study, which was achieved by employing the shallow linguistic context kernel with default parameters. Further comparison was made with the existing knowledge base SemMedDB of literature relations, which was built using the SemRep system . To select relations between drugs and genes from SemMedDB, PMIDs were used as one of the query constraints, to ensure that the selected relations were within the same publications as the test datasets.
Comparison of generated DDI relations with DrugBank: for each drug, we looked into the overlap between the generated DDI relations with the DrugBank. Specfically, novel DDI relations generated in our study were examined by checking their supportive evidence.
Results and discussion
Performance of drug-enzyme relation extraction
Drug enzyme relation extraction performance
Drug enzyme relation assignment performance
Performance of drug-drug interaction inference
Performance of drug drug relation inference
DEIs are important supportive evidence for DDIs. This study applied a hybrid approach for DEI relation extraction from biomedical literature. Reasoning was then conducted on the extracted DEIs to infer potential DDI relations, by incorporating biological knowledge into drug-enzyme ontology. Evaluation results demonstrated the effectiveness of our approach: potential DDIs were inferred with reliable precisions (in vivo: 80.30 %; in vitro: 72.09 %), indicating its capability to detect DDIs with scientific evidence.
The model of GraphKernel obtained much higher precision and lower recall than JSRE (Table 5). This demonstrated that GraphKernel and JSRE have advantages of different aspects on the DEI datasets. One potential explanation could be the essential kernel difference between these two models. JSRE only relies on shallow linguistic features of text, such as tokens, POS and lemmas, while GraphKernel combines shallow linguistic features with more complex structural syntactic features. Thus, the constraints of JSRE were relatively relaxed on the text in comparison with GraphKernel, leading to the high recall of JSRE and the higher precision of GraphKernel. Overall, GraphKernel outperformed JSRE significantly on the in vivo dataset (F 1 : 84.97 % vs. 78.50 %), with a slightly lower F 1 on the in vitro dataset (F 1 : 65.58 % vs. 66.20 %). This indicates that there is room for further improvement in the relation extraction from the in vitro dataset.
As shown in Table 5, our approach outperformed SemRep significantly in terms of DEI relation extraction. One possible reason could be that SemRep is a general information extraction tool for biomedical literature, which is not focused on the DEI relation. On the other hand, our model was trained on the datasets dedicated to DEI relations. Another possible reason is that instead of using rule-based methods as in SemRep, our study applied statistical machine-learning model first to recognize related drug-enzyme pairs to remove false positive DEI relation pairs and to improve the performance. As an illustration, in the sentence “the possibility of in vivo drug interaction of azelastine and other drugs that are mainly metabolized by CYP2D6”, the candidate relation pair of azelastine and CYP2D6 matches the pattern of the isSubstrateOf relation. However, it is a false positive relation and is removed in the first step by the statistical model.
Examples of inferred drug drug interactions and supportive evidence from literature
Drugs with interaction
We performed a study in healthy volunteers to investigate the relative inductive effect of CBZ and OXCZ on CYP3A4 activity using the metabolism of quinidine as a biomarker reaction…We confirm a clinically significant inductive effect of both OXCZ and CBZ. (PMID: 17346248)
Lidocaine is metabolized by cytochrome P450 3A4 (CYP3A4) and CYP1A2 enzymes…We conclude that inhibition of CYP1A2 by fluvoxamine considerably reduces the presystemic metabolism of oral lidocaine… (PMID: 16918719)
Quinidine is eliminated mainly by CYP3A4-mediated metabolism… Itraconazole increases plasma concentrations of oral quinidine, probably by inhibiting the CYP3A4 isozyme during the first-pass and elimination phases of quinidine. (PMID: 9390107)
Involvement of human liver cytochrome P4502B6 in the metabolism of propofol… orphenadrine, a CYP2B6 inhibitor, reduced the rate constant of propofol by liver microsomes by 38 % (P < 0.05)… (PMID: 11298076)
Rifalazil-32-hydroxylation in microsomes was completely inhibited by CYP3A4-specific inhibitors (fluconazole, …) … indicating that the enzyme responsible for the rifalazil-32-hydroxylation is CYP3A4. (PMID: 10923859)
Despite the fact that our proposed method of DEI relation extraction achieved a F 1 of 84.97 % on the in vivo dataset, the F 1 of 65.58 % obtained on the in vitro dataset is still low. Based on our empirical observation, the major reason for the performance difference between these two datasets lied in the essential difference of their linguistic structures, which originated from the difference between the in vivo and in vitro studies. In vivo studies focus on evaluating the effect of an investigational drug on other drugs, by checking the changes of pharmacokinetic parameters. Different from in vivo studies, in vitro studies can qualitatively provide the mechanisms of a potential DDI based on the observation of enzyme kinetics parameters. Thus, sentences in the in vitro dataset contained more drug enzyme interactions; whereas they were also much complex than those in the in vivo dataset, with more multiple clauses, long conjunctive structures and rare patterns. When we looked into the errors of DEI relation extraction, especially in the in vitro dataset, we found that the major causes of false negative instances include conjunctive structures of drugs/enzymes (e.g., “Studies using the CYP3A4 inhibitors ketoconazole, troleandomycin, and erythromycin”), and the rare patterns uncovered by the statistical model (e.g. “Induction of CYP2C9 would explain the increased systemic elimination of glipizide”). On the other hand, the major causes of false positive instances include the inability to catch the context information differentiating between positive and negative relations (e.g., the word “confirm” indicates the uncertainty of the DEI relation in the sentence “… to confirm that fluvoxamine inhibits CYP2C19”), and wrong predictions between drugs and enzymes across multiple clauses, as in the sentence “Greater inhibition was produced by the less selective CYP3A inhibitors parathion, quinidine, and ketoconazole; CYP1A inhibitors were ineffective.”.
The above problems should be addressed in the future to further improve the DEI relation extraction performance. Specifically, additional advanced methods tailored to the in vitro dataset should be explored, including automatic pattern recognition methods to identify conjunctive structures of drugs/enzymes, multiple clauses split before feature extraction, keyword expansion to indicate the uncertainty (e.g., “to determine” and “was examined”).
One limitation of our current work is the size of the annotated corpus. For practical usage, we plan to apply our system to all the related articles in PubMed to obtain a more comprehensive list of DEIs and potential DDIs. Besides, further improvements of our system may need to be conducted after evaluation on a larger DEI corpus. In addition to narrative literature text describing DEIs, tables of DEIs with details of interactions in the published full text articles are another valuable resource to obtain such information that we plan to incorporate. Extracting DEIs from tables is more straightforward and potentially have more accurate results as compared to the text. However, in comparison to accessing titles and abstracts of articles through MedLine, one problem of tables is that the automatic access to full text is limited. Actually, these two resources could be complementary to each other for mining DEIs from biomedical literature. In our future work, methods of mining tables from DEI related articles would be explored. Another drawback of our current approach for DDI relation inference is that the information of specific conditions required for the occurrence of DEIs and DDIs, such as dosages of drugs, was not considered. Information of such conditions is also very critical for supportive evidence for DDI relations, which should be taken into consideration in the next step.
Our study proposes a hybrid approach of combining machine-learning algorithm with rule-based patterns to extract DEIs from biomedical literature, from which potential DDI relations can be inferred by reasoning. Evaluation results demonstrate that the performance of DEI relation extraction outperformed SemRep significantly, with a F-measure of 84.97 % on the in vivo dataset and 65.58 % on the in vitro dataset. Moreover, potential DDIs not present in DrugBank were also inferred, indicating that this proposed approach could be used to detect DDIs supported by scientific evidence of drug metabolism and interaction.
This work was supported by Cancer Prevention & Research Institute of Texas [R1307]; GM10448301, and LM011945.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Goodman LS. Goodman and Gilman’s the Pharmacological Basis of Therapeutics. New York: McGraw-Hill; 1996.Google Scholar
- Hall MJ, DeFrances CJ, Williams SN, Golosinskiy A, Schwartzman A. National hospital discharge survey: 2007 summary. Natl Health Stat Report. 2010;29(29):1–20.Google Scholar
- Niska R, Bhuiya F, Xu J. National hospital ambulatory medical care survey: 2007 emergency department summary. Natl Health Stat Report. 2010;26(26):1–31.Google Scholar
- Becker ML, Kallewaard M, Caspers PWJ, Visser LE, Leufkens HGM, Stricker BH. Hospitalisations and emergency department visits due to drug–drug interactions: a literature review. Pharmacoepidemiol Drug Saf. 2007;16(6):641–51.View ArticleGoogle Scholar
- Saverno KR, Hines LE, Warholak TL, Grizzle AJ, Babits L, Clark C, et al. Ability of pharmacy clinical decision-support software to alert users about clinically important drug–drug interactions. J Am Med Inform Assoc. 2011;18(1):32–7.View ArticleGoogle Scholar
- Percha B, Altman RB. Informatics confronts drug–drug interactions. Trends Pharmacol Sci. 2013;34(3):178–84.View ArticleGoogle Scholar
- Wang LM, Wong M, Lightwood JM, Cheng CM. Black box warning contraindicated comedications: concordance among three major drug interaction screening programs. Ann Pharmacother. 2010;44(1):28–34.View ArticleGoogle Scholar
- Abarca J, Malone DC, Armstrong EP, Grizzle AJ, Hansten PD, Van Bergen RC, et al. Concordance of severity ratings provided in four drug interaction compendia. J Am Pharm Assoc (2003): JAPhA. 2003;44(2):136–41.Google Scholar
- Hines LE, Malone DC, Murphy JE. Recommendations for generating, evaluating, and implementing drug-drug interaction evidence. Pharmacother: J Hum Pharmacol Drug Ther. 2012;32(4):304–13.View ArticleGoogle Scholar
- Tari L, Anwar S, Liang S, Cai J, Baral C. Discovering drug–drug interactions: a text-mining and reasoning approach based on properties of drug metabolism. Bioinformatics. 2010;26(18):i547–53.View ArticleGoogle Scholar
- Herrero-Zazo M, Segura-Bedmar I, Martínez P, Declerck T. The DDI corpus: an annotated corpus with pharmacological substances and drug–drug interactions. J Biomed Inform. 2013;46(5):914–20.View ArticleGoogle Scholar
- Kilicoglu H, Shin D, Fiszman M, Rosemblat G, Rindflesch TC. SemMedDB: a PubMed-scale repository of biomedical semantic predications. Bioinformatics. 2012;28(23):3158–60.View ArticleGoogle Scholar
- Moltke LL, Greenblatt DJ, Granda BW, Giancarlo GM, Duan SX, Daily JP, et al. Inhibition of human cytochrome P450 isoforms by nonnucleoside reverse transcriptase inhibitors. J Clin Pharmacol. 2001;41(1):85–91.View ArticleGoogle Scholar
- Airola A, Pyysalo S, Björne J, Pahikkala T, Ginter F, Salakoski T. All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning. BMC bioinformatics. 2008;9 Suppl 11:S2.View ArticleGoogle Scholar
- Wu H-Y, Karnik S, Subhadarshini A, Wang Z, Philips S, Han X, et al. An integrated pharmacokinetics ontology and corpus for text mining. BMC bioinformatics. 2013;14(1):35.View ArticleGoogle Scholar
- Bunescu RC, Mooney RJ. A Shortest Path Dependency Kernel for Relation Extraction. Proceedings of HLT/EMNLP '05. Vancouver: ACL; 2005.Google Scholar
- Bodenreider O. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004;32 suppl 1:D267–70.View ArticleGoogle Scholar
- Aronson AR, Lang F-M. An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc. 2010;17(3):229–36.View ArticleGoogle Scholar
- McGuinness DL, Van Harmelen F. OWL web ontology language overview. W3C recommendation. 2004;10(10).Google Scholar
- Horridge M, Bechhofer S. The owl api: a java api for owl ontologies. Semantic Web. 2011;2(1):11–21.Google Scholar
- De Marneffe M-C, MacCartney B, Manning CD. Generating Typed Dependency Parses from Phrase Structure Parses. Proceedings of LREC’ 2006. 2006.Google Scholar
- Giuliano C, Lavelli A, Romano L. Exploiting shallow linguistic information for relation extraction from biomedical literature 2006: Citeseer.Google Scholar
- Tikk D, Solt I, Thomas P, Leser U. A detailed error analysis of 13 kernel methods for protein–protein interaction extraction. BMC bioinformatics. 2013;14(1):12.View ArticleGoogle Scholar