Ontology-based time information representation of vaccine adverse events in VAERS for temporal analysis
© Tao et al.; licensee BioMed Central Ltd. 2012
Received: 26 November 2012
Accepted: 27 November 2012
Published: 20 December 2012
The U.S. FDA/CDC Vaccine Adverse Event Reporting System (VAERS) provides a valuable data source for post-vaccination adverse event analyses. The structured data in the system has been widely used, but the information in the write-up narratives is rarely included in these kinds of analyses. In fact, the unstructured nature of the narratives makes the data embedded in them difficult to be used for any further studies.
We developed an ontology-based approach to represent the data in the narratives in a “machine-understandable” way, so that it can be easily queried and further analyzed. Our focus is the time aspect in the data for time trending analysis. The Time Event Ontology (TEO), Ontology of Adverse Events (OAE), and Vaccine Ontology (VO) are leveraged for the semantic representation of this purpose. A VAERS case report is presented as a use case for the ontological representations. The advantages of using our ontology-based Semantic web representation and data analysis are emphasized.
We believe that representing both the structured data and the data from write-up narratives in an integrated, unified, and “machine-understandable” way can improve research for vaccine safety analyses, causality assessments, and retrospective studies.
Effective analyses of time trends for post-vaccine adverse events (AEs) can enhance clinical research in different areas such as vaccine safety analyses, causality assessments, and retrospective studies. The FDA/CDC Vaccine Adverse Event Reporting System (VAERS)  provides a valuable data set for these purposes. VAERS maintains a database for reports of AEs following vaccination. These reports contain both structured data (e.g., gender, age, vaccination date, and onset date), as well as short narratives that usually provide more detailed descriptions of the vaccination, the related events, and their time constraints.
The structured data in the VAERS database have been widely leveraged in different medical analyses for vaccine adverse events [2, 3]. The unstructured nature of the narratives, however, makes the data embedded in them difficult for use in further analyses. These narratives usually contain additional valuable information (e.g., patient ages that were not reported in a structured way, vaccination doses, and durations or time stamps for multiple events following vaccination) that could potentially lead to more effective and concrete clinical analyses, and perhaps important clinical insights.
It is challenging, however, to process this time-related data hidden in the narratives from the AE reports. The VAERS receives 30,000 reports annually. Manually processing these reports is tedious and expensive. Even if the related information has been successfully marked and extracted, the temporal relations needed for time pattern recognition are often not explicitly expressed in the original documents, but rather need to be inferred.
Previous research indicated that Semantic Web tools and technologies provide a viable solution for modeling of heterogeneous data, conducting scalable querying over the data, and inferring new knowledge [7–10]. We believe there are some unique benefits to applying these semantic web techniques to healthcare data: (1) the World Wide Web Consortium (W3C) recommendations provide a shared set of constructs which enable better interoperability between applications that exchange machine-understandable information; (2) the Web Ontology Language (OWL)’s formal semantics offer consistency checking ability for data represented using it; (3) the decidability and computability features of OWL-DL (Description Logic) can provide enough expressiveness for semantically defining concepts and their relationships that can support reasoning; (4) the Rule Interchange Format (RIF) provides a concrete expression language for rules defining clinical guidelines, as well as a standard for the interchanges between rules specified in different rule languages; and (5) the linked data feature of RDF graph can bring information concerning a particular instance together from heterogeneous sources .
The rest of the paper is organized as follows. We first introduce the three ontologies (TEO, OAE, and VO) and how they can be used to represent time in the area of vaccine adverse events. A VAERS case report is then presented as a use case for the ontological representations. The advantages of using our ontology-based Semantic web representation and data analysis are emphasized. Finally we draw conclusions and discuss further improvements.
TEO and time relation reasoninga
Time instants and intervals
<event1> rdf:type Event;
<event1> rdfs:label "vaccinated w/MMR ";
<event1> hasTime <tInst1>;
<tInst1> rdf:type TimeInstant;
<event1>hasOrigTime "July 6th";
Here we created a new OWL individual <event1>, which has type as TEO:Event, and label "vaccinated w/MMR". We further specify it has a timestamp <tInst1>, which has type TEO:TimeInstant. We can keep track of the original expression of this time instant using the property hasOrigTime and have the system normalize the time expression to a standard way and store it using hasNormalizedTime. The system can also infer the level of granularity of the time instant. In this example, it is “day”. Note that in this original text write-up, we do not know the year of the event (we only know July 6th), but the year can usually be inferred from the structured information from the same record.
<durat1> rdf:type Duration;
<event1> rdf:type Event;
<event1> rdfs:label "BRIEF GENERALIZED SEIZURE";
<event2> rdf:type Event;
<event2> rdfs:label "DTP";
<event1> after <event2> ;
<event1> rdf:type Event;
<event1> rdfs:label "vaccination";
<event2> rdf:type Event;
<event2> rdfs:label "she developed a fever";
<event2> after <event1>
<a1> rdf:type owl:Axiom
<a1> owl:annotatedSource <event1>
<a1> owl:annotatedProperty after
<a1> owl:annotatedTarget <event2>
<a1> hasTimeOffset <durat1>
The TEO provides a representation mechanism to model temporal information stated within adverse event reports in a “machine-understandable” way. Sorting out the events on a timeline or answering time-related clinically significant question, however, usually cannot be accomplished by querying the information explicitly stated within the reports. Many times it requires semantic inference to fully answer the time-relevant questions.
The following is an example of narrative text within an adverse event report, which would require semantic reasoning:
“18 month-old vaccinated w/MMR on July 6th. Eighteen days after vaccination she developed a fever of 104 and macular rash of the face, torso & legs. Dx: vasculitis 2 wks later. Patient hospitalized 08-08”.
There are five events within this example: vaccinated w/MMR (event1), fever (event2), macular rash (event3), vasculitis (event4), and patient hospitalization (event5). The durations between the event1 and event2/event3 are explicitly stated, as well as the durations between event2/event3 and event4. The reader, however, needs to infer that patient hospitalization occurred after the vasculitis was diagnosed, which was after the vaccination, the fever, and the rash. A reader can also infer from the text for the actual dates of the fever, the rash, and the vasculitis diagnosis.
The TIMER system provides an Application Programming Interface (API) to infer this kind of temporal relation automatically after the information is represented in the Semantic Web notations.
OAE and VO usage
The OAE is an OWL ontology for representing adverse events. The OAE has been developed by following the OBO Foundry principles including openness, collaboration, and use of a common shared syntax . OAE is aligned with the top ontologies such as Basic Formal Ontology (BFO)  and the Relation Ontology (RO) .
In the current version of OAE, the term ‘adverse event’ is defined as a pathological bodily process (OGMS:0000061) that occurs after a medical intervention and is likely induced by the medical intervention. Examples of medical interventions include vaccination, drug administration, usage of medical devices, and surgery. An adverse event may or may not be caused by a medical intervention. In OAE, we specifically define a term ‘causal adverse event’ as a pathological bodily process that is induced/caused by a medical intervention. Currently, OAE has 2,464 representational units, annotated by means of 981 terms with specific such as OAE identifiers, and the other terms imported from existing ontologies including BFO, RO, and the Ontology of Biomedical Investigations (OBI) . Importing external ontology terms ensures the semantic interoperability among ontologies and avoid duplicated terms to be generated.
The VO is a community-based ontology in the domain of vaccine and vaccination . Like TEO and OAE, VO is developed in OWL and aligned with BFO and RO. VO has classified all existing vaccines licensed for human and animal uses in the U.S.A. and Canada. For each licensed vaccine, VO also includes relevant attributes such as vaccine type, vaccine component (e.g., antigen, adjuvant, and preservative), vaccination route, manufacturer, and the disease and pathogen targeted by the vaccine. All these data are organized in ontological format and shared syntax, supporting automated reasoning and SPARQL query.
MedDRA (http://www.meddramsso.com/) is used as the default controlled vocabulary for describing adverse event terms in VAERS. To better represent adverse events in OAE, we have made a match (cross-reference) between many OAE terms and MedDRA terms. Compared to MedDRA, OAE uses a formal ontology format with machine-parseable logical definitions and structures. OAE also imports vaccine-specific information from VO, making it an ideal platform for analyzing vaccine time events.
In order to make OAE, VO, and TEO work seamlessly, alignments between these three ontologies have been made. All these ontologies use BFO as the top ontology. TEO is imported to OAE as a middle-layer ontology for representation of time. All adverse events in OAE are subclasses of TEO events (i.e., BFO term ‘processual_entity’). Therefore, these adverse event classes in OAE automatically inherit TEO methods for time representation.
OAE and VO modeling
OAE and VO can be used to represent adverse events and related vaccine information in an ontological format. The original case reports use MedDRA to represent symptoms. In OAE, an adverse event (AE) is a pathological bodily process that starts with a medical intervention (e.g., vaccination) and ends with the disscovery of a symptom (e.g., fever), a sign (e.g., increased blood cell count), or a process (e.g., influenza viral infection). The representation of an AE in OAE as such a whole process provides us ways to present various variables that contribute to the adverse events (e.g., patient age and sex, and vaccination dose and route) as well as the time intervals between different subprocesses. For example, we represent the nausea in Figure 3 as ‘nausea AE’ that occurs after a vaccination in a specific vaccine host. The OAE representation shows that the patient had adverse events in different areas (e.g., skin, joint, and digestive systems). It is noted that an ‘influenza AE’ may not be induced by influenza virus. The better term to represent the case is an OAE term ‘influenza like illness AE’ (OAE_0000100). The patient in the case reported was vaccinated with the vaccine Engerix-B, which is semantically defined in VO with a VO ID of VO_0010711. VO provides the hierarchical information as well as the semantic assertions associated for different vaccines. In the future, vaccine information can be directly imported to OAE to support efficient automated reasoning.
TEO modeling of the time events in this case
Visualization of time events
Interpretation of time event analysis results
Many symptoms are shared but may occur at different time points. For example, rash is shared by the first two patients, one at day 3, and the other at day 8. Influenza and arthralgia symptoms also are reported by the first two patients but at different days. The third patient had an injection site reaction on the second day instead of the first day (for the first two patients).
Frequency of a symptom may be different. Nausea occurred twice for the first patient. However, it occurred only once for the second patient and did not happen in the third patient.
Sequence of the adverse events may be different: The first patient had an injection site reaction then arthralgia. The two symptoms also showed up in the second patient in the opposite order.
Our approach provides a way to identify these differences, which can be used for further investigation.
Conclusions and future work
In this paper, we introduce an ontology-based approach for representing time-related information from the VAERS repository. We believe that the ability to represent both the structured data and the data from write-up narratives in an integrated, unified, and “machine-understandable” way can enable research in vaccine safety analyses, causality assessments, and related retrospective studies. For example, the time-based analysis will improve the assessment of the AE causality. If we use a large amount of reports, a strong statistical correlation between vaccination and AEs can be identified with help of the temporal relations between AEs and vaccinations.
Based on the representation mechanisms defined in the ontology, we will implement tools for automatically extracting information such as event names, vaccine names, as well as temporal relationships from the VAERS system and semantically annotate the extracted information with respect to the three ontologies. Our Natural Language Processing team has developed a framework to extract temporal relation from discharge summaries . We are currently working on linking this framework to the TIMER framework for semi-automatic information extraction. In addition, another future direction is to build a tool for statistical analysis on top of the integrated data integrated by representation with ontologies.
aSome information about the TEO introduction is from the TEO web page: http://informatics.mayo.edu/CNTRO/index.php/TEO, which is maintained by Dr. Cui Tao
bNote that in Figure 2, “FLU-LIKE SYMPTOMS” in the write-up was annotated as MedDRA term “influenza” (in the symptom section). This is an inaccurate alignment done by the VAERS database as MedDRA has a more accurate term “influenza like illness” (MedDRA ID: 10022004).
The project was done when Hannah Yang was a visiting student intern in Mayo Clinic.
This project was supported by the National Science Foundation under Grant #0937060 and the National Center for Biomedical Ontologies (NCBO) to C.T. and the NIH–NIAID grant R01AI081062 to Y.H.
- The U.S. FDA/CDC Vaccine Adverse Event Reporting System (VAERS);: Available from: http://vaers.hhs.gov/index
- Haber P: Internet-based reporting to the vaccine adverse event reporting system: a more timely and complete way for providers to support vaccine safety. Pediatrics. 2011, 127 (Suppl 1): S39-44.MathSciNetView Article
- Klepper MJ, Edwards B: Individual case safety reports–how to determine the onset date of an adverse reaction: a survey. Drug Saf. 2011, 34 (4): 299-305. 10.2165/11588490-000000000-00000.View Article
- He Y: AEO: a realism-based biomedical ontology for the representation of adverse events. 2011, International Conference on Biomedical Ontologies (ICBO), Adverse Event Representation Workshop
- Stead WW, Li HS: Computational Technology for Effective Health Care: Immediate Steps and Strategic Directions. 2009
- The Time Event Ontology (TEO) web site: cited 2012; Available from: http://informatics.mayo.edu/CNTRO/index.php/TEO
- Tao C: Time-oriented question answering from clinical narratives using semantic-web techniques. Proceedings of the 9th international semantic web conference on The semantic web - Volume Part II. 2010, Shanghai, China: Springer-Verlag, 241-256.
- Semantic Web Health Care and Life Sciences (HCLS) Interest Group: Available from: http://www.w3.org/2001/sw/hcls/
- SemanticHealth Report: Semantic interoperability for better health and safer healthcare, European Commission. 2009
- Luciano JS: The Translational Medicine Ontology and Knowledge Base: driving personalized medicine by bridging the gap between bench and bedside. J Biomed Semantics. 2011, 2 (Suppl 2): S1-10.1186/2041-1480-2-S2-S1.View Article
- Linking Open Data: Available from: http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
- Tao C: CNTRO: A Semantic Web Ontology for Temporal Relation Inferencing in Clinical Narratives. AMIA Annu Symp Proc. 2010, 2010: 787-91.
- Tao C, Solbrig HR, Chute CG: AMIA Summits Transl Sci Proc. CNTRO 2.0: A Harmonized Semantic Web Ontology for Temporal Relation Inferencing in Clinical Narratives. 2011, 64-68.
- Bittner T, Smith B: Normalizing Medical Ontologies Using Basic Formal Ontology. 2004, Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik: Biometrie und Epidemiologie, Tagungsband der 49.
- Allen JF: Maintaining Knowledge About Temporal Intervals. Communications of the Acm. 1983, 26 (11): 832-843. 10.1145/182.358434.MATHView Article
- Allen JF: Maintaining knowledge about temporal intervals. Readings in qualitative reasoning about physical systems. 1990, Morgan Kaufmann Publishers Inc, 361-372.View Article
- Smith B: The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol. 2007, 25 (11): 1251-5. 10.1038/nbt1346.View Article
- Smith B: Relations in biomedical ontologies. Genome Biology. 2005, 6 (5): R46-10.1186/gb-2005-6-5-r46.View Article
- Brinkman RR: Modeling biomedical experimental processes with OBI. J Biomed Semantics. 2010, 1 (Suppl 1): S7-10.1186/2041-1480-1-S1-S7.View Article
- He Y: The 1st International Conference on Biomedical Ontology(ICBO 2009). VO: Vaccine Ontology. 2009
- Wongsuphasawat K: LifeFlow: visualizing an overview of event sequences. Proceedings of the 2011 annual conference on Human factors in computing systems. 2011, Vancouver, BC, Canada: ACM, 1747-1756.View Article
- Sohn S: I2B2 Competition. Comprehensive Temporal Information Discovery from Discharge Summaries: Medical Events, Time, and Tlink Identification. 2012
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.