Extraction of potential adverse drug events from medical case reports
© Gurulingappa et al.; licensee BioMed Central Ltd. 2012
Received: 22 November 2012
Accepted: 22 November 2012
Published: 20 December 2012
The sheer amount of information about potential adverse drug events publishedin medical case reports pose major challenges for drug safety experts toperform timely monitoring. Efficient strategies for identification andextraction of information about potential adverse drug events fromfree‐text resources are needed to support pharmacovigilance researchand pharmaceutical decision making. Therefore, this work focusses on theadaptation of a machine learning‐based system for the identificationand extraction of potential adverse drug event relations from MEDLINE casereports. It relies on a high quality corpus that was manually annotatedusing an ontology‐driven methodology. Qualitative evaluation of thesystem showed robust results. An experiment with large scale relationextraction from MEDLINE delivered under‐identified potential adversedrug events not reported in drug monographs. Overall, this approach providesa scalable auto‐assistance platform for drug safety professionals toautomatically collect potential adverse drug events communicated asfree‐text data.
Adverse drug effects are a very serious issue that confronts patients, healthcareproviders, regulatory authorities and drug manufacturers. While stringent measuresfor detecting risks associated with drug usage are clinical trials, the wide fieldusage might show additional risks non detectable in the clinical trials due to thelimited number of patients involved. After the marketing approval, undesired effectof drugs are reported to the authorities using so called Spontaneous Adverse EventReporting Systems, that are then timely analyzed to ensure safe use of drugs . A well known problem of pharmacovigilance is however the underreporting, namely the low number of reports that the Authorities receive. Casereports published in the scientific biomedical literature represent an importantresource complementary to the SAERS due to their abundant existence, rapid rate ofgeneration, and valuable information enclosed . Due to their unstructured nature, manual analysis of the scientificliterature is challenging, cumbersome, and labor intensive. In recent years,development of automatic natural language processing (NLP) and informationextraction (IE) techniques have gained large popularity. They include identificationof biomedical named entities, relations between the entities, or events associatedwith them. Noticeable efforts have been invested on mining the potential adversedrug events in different forms of free‐text data. Examples include Wang et.al.  who applied the MedLEE system on discharge summaries to identifymedication events and entities that could be potential adverse drug effects; thesewere detected using the strength of statistical association based on theirco‐occurrences. Leaman et. al.  proposed a lenient NLP model for extracting adverse effects of drugs fromsocial media such as blogs. Gurulingappa et. al.  developed a machine learning‐based system for classifying thesentences in MEDLINE case reports that assert potential adverse drug events.However, according to the author’s knowledge, there is a limited focus onidentification of semantic relationships between drugs and adverse events in text.This is partly due to the unavailability of suitable open access corpora that couldbe used for technology development and benchmarking. Extracting relations betweendrugs and adverse effects can facilitate appropriate indexing, precise searching,visualization, faster information tracing and improve sensitivity of signaldetection in pharmacovigilance. The use of ontology of adverse drug events forautomated signal generation in pharmacovigilance has already been proposed  and its application to information retrieval has been exploited by thesame group few years later in the VIGITERMES project . There, the OntoEIM adverse event ontology was used to extend thedictionary of adverse event entities, normalize queries, and consolidateannotations, achieving 29% precision and 67% recall on MEDLINE abstracts.Automatic extraction of potential adverse drug events from clinical records is anactive area of research . Mining social internet message boards to identify potential adverse drugevents has been reported , whereby in that work the extraction of drug‐event pairs wasdetermined only using co‐occurrence of terms within a window of 20 tokensapart, and the use of machine learning systems was only focused onde‐identification for privacy protection. This work reports on the adaptationof a machine learning‐based system for identifying the relations between drugsand adverse effects in MEDLINE case reports; it relies on an ontology‐drivenmanually annotated corpus that strictly follows semantic annotation guidelinesdeveloped for clinical text . The system has been qualitatively evaluated and studied for its abilityof support real time pharmacovigilance studies.
Counts of entities and relations in ADE‐EXT corpus subsets
Conditions (adverse effect)
Relation extraction workflow
For the identification and extraction of drug‐condition entity pairs thatconstitute a potential adverse event relation, the Java Simple RelationExtraction (JSRE) system  was employed. JSRE provides a re‐trainable and scalablesupervised classification platform that uses Support Vector Machines (SVMs)  with different kernels specially designed for the NLP and relationextraction. All sentences in ADE‐EXT‐TRAIN andADE‐EXT‐TEST containing drug‐condition pairs labelled aseither True or False were transformed into the SRE formatbefore subjecting them to relation extraction. The SRE format is a unique way ofrepresenting data within the JSRE platform where tokens appearing in sentencesare enriched with their parts‐of‐speech tags, lemmas, and flagsindicating if a token is a part of named entity or not. Amongst differentkernels available, the shallow linguistic kernel was thoroughly used since ithas been widely applied and has shown success during similar relation extractiontasks . The ADE‐EXT‐TRAIN was used as data for training andcross‐validation of JSRE whereas the ADE‐EXT‐TEST was used asan independent test set.
Mapping annotation ontology against ontology of adverse events
Results and discussion
Performance evaluation criteria
The performance of relation extraction was evaluated by 10‐foldcross‐validation of the training data. During cross‐validation ofthe training data and final evaluation over the test set, classificationperformances were assessed using the F‐score overTrue‐labelled relations since they represent potential adverseevent relations between drugs and conditions that denote a focused relationclass being studied.
Assessment of relation extraction
Assessment of results of relation extraction
Impact of size of the training set on the performance
Impact of size of the training set on relation extraction
Mapping the ADE annotation ontology to the ontology of adverse events
As clearly shown in Figure 2, both the ADE annotationontology and OAE represent adverse drug reactions using formal ontologicalmethods. In spite of this common goal, the two ontologies use different namingfor the two core entities: a Condition in the ADE annotation ontologycoincide with a drug adverse event in OAE; a Drug in the ADEannotation ontology coincide with a drug‐administration in OAE.The ADE ontology additionally introduce the entity dosage, notspecified in OAE at the time of its development since OAE originally focused onvaccines for which dosing is not an essential medical concept. Both ADE and OAEmodel a causal relationship between Condition or Adverse eventand Drug or Medical intervention, with the latter being thecausal source. The only entity shared by the CLEF annotation ontology with OAEand ADE is the Drug‐or‐device, that coincide with aDrug or Medical intervention.
Use case study: large scale relation extraction
An experiment was conducted in order to understand the real‐world use casescenarios for the extraction of potential adverse drug events from text. This wasperformed by applying the trained extraction tool to the whole MEDLINE andthereafter comparing them to the information present in drug leaflets present in theSIDER  database. Some of the automatically extracted potential adverse drugevents, not present in SIDER, were manually investigated for their validity bycomparison to the Medicines and Healthcare products Regulatory Agency (MHRA) druglabel changes reported in 2009.
Relation extraction from MEDLINE
MEDLINE articles published before 2009 were gathered to form a Medline‐2009corpus. ProMiner was equipped with DrugBank and MedDRA dictionaries for taggingdrugs and conditions occurring in sentences of Medline‐2009. A JSRE modeltrained over the ADE‐TRAIN‐EXT corpus was applied for classificationof relations between drugs and conditions as True or Falsewhere a True relation indicates potential drug‐related adverseevent. As a result of relation extraction, 165680 relations were extractedbetween 1611 drugs and 5079 adverse effects where drugs and adverse effects werenormalized to DrugBank and MedDRA respectively.
Adverse effect extraction from SIDER
Side Effect Resource (SIDER) is a database of adverse drug effects that links 888drugs to 1450 adverse effects. It has been constructed manually from the summaryof product leaflets of each drug. Drugs and their adverse effects were extractedfrom SIDER version 1.01 that contains drug leaflets published before 2009.
MHRA drug label changes
In 2009, the MHRA proposed safety label updates for 26 drugs. These were ofcourse not all the safety label updates that the MHRA identified in 2009, butthose that MHRA decided to give particular visibility through their web site.These new adverse drug effects were manually extracted and they serve as astandard reference for validation of potential adverse drug events automaticallyextracted from Medline‐2009 using the JSRE trained method.
Validation of large scale relation extraction
From the MHRA label change dataset, three drugs were arbitrarily chosen fordeeper investigation. They are Rituximab, Efalizumab, and Natalizumab: threeanti‐neoplastic and immunomodulatory monoclonal antibodies. For the threedrugs of interest, potential adverse drug events were selected from theMedline‐2009 predictions and SIDER. Potential adverse drug eventsextracted from Medline‐2009 that are not reported in SIDER were manuallychecked against the label changes of MHRA.
Potential adverse drug events extracted from MEDLINE not reported indrug leaflets until 2009 and later introduced in packageleaflets
Progressive multifocal leukoencephalopathy
Progressive multifocal leukoencephalopathy
This work reports on the adaptation of a machine learning‐based JSRE system forthe identification and extraction of potential adverse events of drugs in scientificcase reports. A methodology has been discussed to enrich a sparsely annotated corpusand its subsequent use to build classification models. Evaluation of thesystem’s performance showed promising results. A use‐case studyperformed on relation extraction from large scale literature showed thesystem’s ability to capture valid, under‐reported, and novel potentialadverse events not yet present in product leaflets.
The performance of the system can be improved in several ways. In the currentexperiments, only the default features acceptable by JSRE were used. Optimization offeature representation to include additional features for instance from syntacticsentence parse trees may further improve the results. Development of additionalstrategies like post‐processing to classify relations with missing contextualdescriptions can help to recover more relations. Furthermore, extension of handlinginter‐sentence relations needs to be considered in order to further increasecoverage.
The reported experimental results denote the research status on identification fromtext of potential adverse drug events. There are several strategies that are beingfollowed. The authors plan to benchmark the performances of several named entitytaggers against the ADE corpus for the identification of drugs and conditionsmentions in text. The current experiments have been performed on the ADE corpus,since that was the only one available when this work was done, however while writingthis report a new corpus has been published, namely the EU‐ADR corpus . It will be interesting to see if the performance of JSRE on the ADEcorpus will be different compared to the EU‐ADR corpus.
Similarly, benchmarking results of public and commercial relation extraction systemswill be performed  and the practical impact of the information extracted from text onpredicting drug label changes will be studied in detail.
The use of ontologies for driving information extraction has been reported [24, 25]. We plan to explore the use of various available tools (e.g. ODIE,OBCIE,semantixs) using the OAE ontology and compare the performance of the ontologydriven / based methods for information extraction against the method presentedhere.
The current work has demonstrated promising results, it has the potential to reducethe manual reading time, improve the quality of the signal detection process, andtherefore positively contribute to safer use of drugs to the benefit of patients andsociety. We speculate that this work could also pave the road to pharmacovigilanceapplications on social media and multimedia sources too.
Harsha Gurulingappa would like to thank his PhD guide Prof. Dr. MartinHofmann‐Apitius and former colleagues at Fraunhofer Institute SCAI forsupporting the foundational aspects of this work.
- Hauben M, Bate A: Decision support methods for the detection of adverse events inpost‐marketing data. Drug Discov Today. 2009, 14 (7‐8): 343-357. 10.1016/j.drudis.2008.12.012.View ArticleGoogle Scholar
- Vandenbroucke JP: In defense of case reports and case series. Ann Intern Med. 2001, 134 (4): 330-334.View ArticleGoogle Scholar
- Wang X, Hripcsak G, Markatou M, Friedman C: Active computerized pharmacovigilance using natural language processing,statistics, and electronic health records: a feasibility study. J Am Med Inform Assoc. 2009, 16 (3): 328-337. 10.1197/jamia.M3028.View ArticleGoogle Scholar
- Leaman R, Wojtulewicz L, Sullivan R, Skariah A, Yang J, Gonzalez G: Towards internet‐age pharmacovigilance: extracting adverse drugreactions from user posts to health‐related social networks. Proceedings of the 2010 Workshop on Biomedical Natural LanguageProcessing. Edited by: Dina Demner‐Fushman K, Cohen Bretonnel, Ananiadou Sophia, PestianJohn, Tsujii Jun’ichi, Webber Bonnie. 2010, Uppsala, Sweden, 117-125.http://delivery.acm.org/10.1145/1870000/1869976/p117–leaman.pdf,Google Scholar
- Gurulingappa H, Fluck J, Hofmann‐Apitius M, Toldo L: Identification of Adverse Drug Event Assertive Sentences in Medical CaseReports. First International Workshop on Knowledge Discovery and Health CareManagement (KD‐HCM), European Conference on Machine Learning andPrinciples and Practice of Knowledge Discovery in Databases (ECML PKDD). Edited by: Rangwala H, Tagarelli A, Wale N, Karypis G. 2011, Athens, Greece, 16‐27-16‐27.http://www.cs.gmu.edu/hrangwal/kd–hcm/proc/KDHCM11_procs.pdf,Google Scholar
- Henegar C, Bousquet C, Lillo‐Le Louet A, Degoulet P, Jaulent MC: Building an ontology of adverse drug reactions for automated signalgeneration in pharmacovigilance. Comput Biol Med. 2006, 36: 748-767. 10.1016/j.compbiomed.2005.04.009.View ArticleGoogle Scholar
- Delamarre D, Lillo‐Le Louët A, Guillot L, Jamet A, Sadou E, Ouazine T, Burgun A, Jaulent MC: Documentation in pharmacovigilance: using an ontology to extend and normalizePubmed queries. Stud Health Technol Inform. 2010, 160 (Pt 1): 518-522.Google Scholar
- Aramaki E, Miura Y, Tonoike M, Ohkuma T, Masuichi H, Waki K, Ohe K: Extraction of adverse drug effects from clinical records. MEDINFO 2010 ‐ Proceedings of the 13th World Congress on Medicalinformatics, Series: Studies Health Technology Informatics, Volume 160. Edited by: Safran C. 2010, Cape Town, South Africa: IOS Press, 739‐743-739‐743. 10.3233/978. –1–60750–588–4–739,Google Scholar
- Benton A, Ungar L, Hill S, Hennessy S, Mao J, Chung A, Leonard C, Holmes J: Identifying potential adverse effects using the web: A new approach tomedical hypothesis generation. J Biomed Informatics. 2011, 44: 989-996.View ArticleGoogle Scholar
- Roberts A, Gaizauskas R, Hepple M, Demetriou G, Guo Y, Roberts I, Setzer A: Building a semantically annotated corpus of clinical texts. J Biomed Informatics. 2009, 42: 950-966. 10.1016/j.jbi.2008.12.013.View ArticleGoogle Scholar
- Gurulingappa H, Mateen‐Rajput A, Roberts A, Fluck J, Hofmann‐Apitius M, Toldo L: Development of a Benchmark Corpus to Support the Automatic Extraction ofDrug‐related Adverse Effects from Medical Case Reports. J Biomed Informatics. 2012, 45: 885-892. 10.1016/j.jbi.2012.04.008.View ArticleGoogle Scholar
- Hanisch D, Fundel K, Mevissen HT, Zimmer R, Fluck J: ProMiner: rule‐based protein and gene entity recognition. BMC Bioinformatics. 2005, 6 (Suppl 1:S14): 10.1186/1471. [–2105–6–S1–S14]Google Scholar
- Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V, Djoumbou Y, Eisner R, Guo AC, Wishart DS: DrugBank 3.0: a comprehensive resource for ’omics’ research ondrugs. Nucleic Acids Res. 2011, 39 (Database issue): D1035—D1041-10.1093/nar/gkq1126.Google Scholar
- Merrill GH: The MedDRA paradox. Proceedings of the AMIA 2008 Annual Symposium. 2008, Washington, DC, USA, 470-474.http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2655972/pdf/amia–0470–s2008.pdf,Google Scholar
- Giuliano C, Lavelli A, Pighin D, Romano L: FBK‐IRST: Kernel Methods for Semantic Relation Extraction. Proceedings of the Fourth International Workshop on SemanticEvaluations. Edited by: Richard W, Lluís M, Agirre E, Lluís M, Richard W. 2007, Prague, Czech Republic, 141‐144-141‐144.http://aclweb.org/anthology–new/S/S07/S07–1000.pdf,Google Scholar
- Burges C: A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery. 1998, 2: 121‐167-View ArticleGoogle Scholar
- Tikk D, Thomas P, Palaga P, Hakenberg J, Leser U: A comprehensive benchmark of kernel methods to extract protein‐proteininteractions from literature. PLoS Comput Biol. 2010, 6: e1000837-10.1371/journal.pcbi.1000837.MathSciNetView ArticleGoogle Scholar
- Roberts A, Gaizauskas R, Hepple M, Demetriou G, Guo Y, Roberts I, Setzer A: The CLEF corpus: semantic annotation of clinical text. Proceedings of the AMIA Symposium. 2007, Chicago, IL, USA, 625-629.http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2655900/pdf/amia–0625–s2007.pdf,Google Scholar
- Ogren P: Knowtator: a Protégé plug‐in for annotated corpusconstruction. Proceedings of the 2006 conference of the North American chapter of theassociation for computational linguistics on human language technology. Edited by: Moore Robert C, Bilmes Jeff, Chu‐Carroll Jennife, SandersonMark. 2006, New York, NY, USA, 273-275.http://aclweb.org/anthology–new/N/N06/N06–4006.pdf,Google Scholar
- Yongqun H, Zuoshuang X, Sarntivijai S, Toldo L, Ceusters W: AEO: A Realism‐Based Biomedical Ontology for the Representation ofAdverse Events. “Representing Adverse Events” at the International Conference onBiomedical Ontology. Edited by: Courtot M, Goldfain A, Yongqun He O, Ruttenberg A. 2011, NY, USA: Buffalo,http://icbo.buffalo.edu/2011/workshop/adverse–events/docs/papers/HeAEICBO2011_submission.pdf,Google Scholar
- Kuhn M, Campillos M, Letunic I, Jensen LJ, Bork P: A side effect resource to capture phenotypic effects of drugs. Mol Syst Biol. 2010, 6: 343-10.1038/msb.2009.98.View ArticleGoogle Scholar
- van Mulligen E, Fourrier‐Reglat A, Gurwitz D, Molokhia M, Nieto A, Trifiro G, Kors J, Furlong L: The EU‐ADR Corpus: Annotated Drugs, Diseases, Targets, and theirRelationships. J Biomed Informatics. 2012, 45: 879-884. 10.1016/j.jbi.2012.04.004.View ArticleGoogle Scholar
- Toldo L, Gurulingappa H, Mateen‐Rajput A, Kors J, Suri S, Tayrouz Y: Impact of Automatic Detection of Adverse Events on Prediction of Drug LabelChanges. J Pharmacoepidemiology and Drug Saf. 2012, [Submitted],Google Scholar
- Wimalasuriya D, Dou D: Ontology‐based information extraction: an introduction and a survey ofcurrent approaches. J Information Sci. 2010, 36: 306-323. 10.1177/0165551509360123.View ArticleGoogle Scholar
- Pandit S, Honavar V: Ontology‐guided extraction of complex nested relationships. 22nd IEEE International Conference on tools with artificial intelligence(ICTAI). Edited by: Pierre M. 2010, France: Arras, 173-178.http://dx.doi.org/10.1109/ICTAI.2010.98,Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative CommonsAttribution License (http://creativecommons.org/licenses/by/2.0), whichpermits unrestricted use, distribution, and reproduction in any medium, provided theoriginal work is properly cited.