The contributions of this study are multiple fold. First, we developed and applied an ontology-based SciMiner literature mining approach, which was then used to mine the FDA TAC 2017 dataset. It is a huge challenge to identify all ADRs using textual description of ADR case reports. Our MedDRA/OAE-based SciMiner literature mining approach was successfully used to mine the FDA TAC 2017 dataset with a special focus on 53 neuropathy-inducing drugs (NIDs). Our study demonstrates the important role of the MedDRA controlled terminology and ontologies (e.g., ChEBI, OAE, and ODNAE) in the literature mining and further ADR analysis. Second, we constructed an ADR-ADR network and applied centrality analysis to identify the hub ADRs in the network. Third, among the 53 NIDs, our ChEBI-based analysis found three benzimidazole drugs, which formed a drug class effect on 43 ADRs. An OAE analysis of these ADRs further identified many enriched ADR categories. Based on the results, we can hypothesize that the proton-pump inhibition role, common to all the three benzimidazole drugs, might participate in different pathways leading to these ADRs. To our knowledge, our study represents the first of such literature mining-derived ontology-based drug class effect analysis.
The present study is based on a subset of US FDA drug labels, which was included in the 2017 Text Analysis Conference (TAC) Adverse Drug Reaction Extraction from Drug Labels track. We used this data set as a proof of concept as well as to develop a prototype version of ADR-SciMiner. We assumed that if an ADR is mentioned in the file of a drug, it is associated with the drug. However, it is likely that the ADR occurs within a negation or speculation statement such as ‘depression was not observed as an ADR of the drug’ or ‘depression might be an ADR of the drug’. Therefore, more semantic oriented NLP analysis techniques may be developed to identify whether an ADR is really associated with a drug or not.
To identify the most salient ADRs associated with NIDs, we created ADR-ADR networks both specific to NIDs and non-NIDs using a threshold of 50% for association. In other words, two ADRs were connected by an edge, if they co-occur in at least 50% of the NIDs or non-NIDs. Six of the central ADRs in the NID specific network were also included in the non-NID specific network, showing that these are prevalent and commonly occur together both in NID and non-NID cases. The other ADRs in Table 2 are central only in the NID associated network, which might reveal that they are more NID specific. As future work, we plan to extend the network analysis by including the specific drugs to the network as well and creating bipartite drug-ADR networks. The types of relations between drugs and ADRs can be identified by using the Interaction Network Ontology (INO) [24].
Our study identified three benzimidazole drugs (i.e. lansoprazole, pantoprazole, and omeprazole) that induce similar profiles of ADRs. Overall these three drugs have been found safe in terms of their associated ADR reports [33,34,35]. For example, a previous study with 10,008 users of lansoprazole in daily practice indicated that the most frequently reported lansoprazole ADRs were diarrhoea, headache, nausea, skin disorders, dizziness, and generalized abdominal pain/cramps, but no evidence of rare ADRs were found [33]. Current study found many ADRs associated with each of these three drugs, and all these three drugs are associated with 43 ADRs, commonly behavioral and neurological, digestive, muscular, and skin ADRs. A common reason for stopping pantoprazole usage was found to be the diarrhea ADR [34], which is also listed as one of the 43 ADRs.
A previous study suggested that these three drugs have similar profiles to interact with other drugs (most commonly vitamin K antagonist), suggesting a class effect [36]. According to the ODNAE records [14], lansoprazole, omeprazole, and pantoprazole are all associated with neuropathy adverse events. Our study found 43 AEs commonly shared with these three benzimidazole drugs. Interestingly, many of these AEs are also found to be the hubs of the highly enriched NID network from our literature mining data centrality analysis. It is likely that these three benzimidazole drugs, which function as proton-pump inhibitors, use the same or similar pathways to induce neuropathy adverse events.
It is noted that the ontology-based drug class effect study is novel in many aspects compared to its original report [15]. First, compared to the previous report using the drug package insert information, our study uses the data generated from literature mining of FDA provided case report data. Second, given the large size of AE data for each vaccine, we were able to identify many AEs commonly used by a class of drugs, in our case, 43 AEs associated with the three benzimidazole drugs. Our OAE-based analysis was able to further identify the common patterns among these AEs. Such a high throughput study was not reported in the previous package insert document-based studies.
The ADR identification performance is not yet optimal and there is still much room for improvement. The majority of falsely identified ADR terms by SciMiner could be grouped into three types: (1) incorrect mapping of acronyms to ADRs (e.g., ‘all’, as in ‘all patients’, mapped to ‘acute lymphocytic leukaemia’); (2) ADR that may not be caused by the current drug (e.g., ‘caution is needed in patients with diabetes’); and (3) ADRs that occur as discontinuous entities in text (e.g., ‘corneal ulceration’ is an ADR, but does not occur as a continuous text fragment in ‘corneal exposure and ulceration’). Integration of other dictionaries such as SNOMED CT [37] into ADR-SciMiner will be explored to possibly expand the ADR dictionary thus to improve the recall. Identifying whether a term is an acronym for an ADR or not, determining whether an ADR that occurs in a drug label is really caused by that drug, and detecting ADRs that occur as discontinuous text fragments in text require deeper semantic understanding of the sentences by considering the context information (i.e., the surrounding words) of an ADR in text. Our current method is a dictionary and rule-based method, which does not consider the context of an ADR occurrence in text. These challenges can be tackled by using machine learning methods with features that capture context information and utilize the syntactic analysis of the sentences such as their dependency parses.
As future work, we plan to develop machine learning based methods to improve the accuracy of ADR tagging as well as the detection of the associations between ADRs and drugs. We will also extend our approach to include all available structured drug labels in the DailyMed database, maintained by National Institute of Health. DailyMed currently contains listings of 95,513 drugs submitted to the US FDA, about 28,000 of which are prescription drugs for human. Our ontological study of NIDs will be extended using this larger drug label dataset.