Skip to main content

Table 1 Decisions that are made during the process of integrating sources that can influence downstream pharmacovigilance analyses

From: Large-scale adverse effects related to treatment evidence standardization (LAERTES): an open scalable system for linking pharmacovigilance evidence sources with clinical data

Data Type

Feature

Option for variability

Performance questions

Product labels

Product label outcome mention

Named entity performance (PPV and sensitivity)

Do improvements in entity recognition performance improve system recall and precision?

Section location (e.g., anywhere vs specific sections)

Does identifying which sections are more informative than others reduce noise?

Frequency information

Threshold variation

Does incorporation of ADE frequency improve performance? What cut-off should be used?

Pharmacovigilance DBs (e.g. FAERS, MedEffect, VigiBase)

Minimum detectable relative risk

Threshold variation

What is the appropriate cut-off for MDRR? Is it HOI specific?

Database (s) chosen

Does the database influence the value of MDRR for this task?

Risk identification method

Disproportionality metric

What metric (e.g. PRR, EBGM, IC) leads to the best performance? Is it HOI specific?

Number of cases in FAERS

Threshold variation

What is the appropriate cut-off for number of case reports?

Drug Indication DB

Indication listings in FDB

Yes/no and when mentioned

Does using on-label and off-label indication knowledge improve performance?

Indexed literature

Number of relevant publications from the indexed literature

Threshold variation

Is there an appropriate cut-off for number of publications? What is its variability relative to specific HOIs and drugs?

Source of relevant publications from the indexed literature

Varying the combination of sources

Should we be selective about the sources used or chose all of them?

Drug and outcome mention in relevant indexed literature

Named entity performance

Do improvements in entity recognition performance improve system recall and precision?

Main MeSH terms vs supplemental

What is the value of MeSH supplemental terms relative to the primary index terms?

Scientific discourse tag of the location of mention (e.g., intro, methods, results, conclusions)

Does limiting identification of drug-HOI co-mention to specifically tagged text excerpts improve performance?

Publication type label (randomized trial, case report, etc.)

Should the publication type of the drug-HOI co-mention be tracked and possibly weighted to improve performance?

Source of publication type label (Embase, MeSH)

Is one publication type indexing system better than the other for the question answering task, or should they be combined?

Topic of the source publication based on latent semantic indexing

Does the use of tags assigned to text sources by latent semantic indexing improve system performance if used as a feature?

Observational health data (claims + EHR)

Minimum detectable relative risk

Threshold variation

What is the appropriate cut-off for MDRR? Is it HOI specific?

Database (s) chosen

Does the database influence the value of MDRR for this task?

Risk identification method

Analytic method

What method (e.g. disproportionality analysis, self-controlled case series, IC temporal pattern discovery, high-dimensional propensity score) leads to the best performance? Is it HOI specific?

Cohort selection

Patient ethnicity, age, sex, co-morbidities, concurrent medications

Does cohort selection using these features affect model performance? What is the appropriate size and diversity of the cohort to reduce noise and bias?

Drug exposure conditions

Length of exposure, dosage

Does selecting minimum exposure duration criteria and/ or drug dosage information improve performance?

Study replicability

Number of locations for confirming results

How many replicates of the study should be performed at different institutions?

Observation period

Observation duration threshold

Does setting minimum observation period durations improve performance?

  1. PPV: positive predictive value, OMOP: Observational Medical Outcomes Partnership, ADE: adverse drug event, MDRR: minimal detectable reporting ratio, HOI: health outcome of interest, DB: database, FAERS: Food and Drug Administration Adverse Event Reporting System, EBGM: empirical Bayes geometric mean. IC: information component, FDB: First Data Bank (commercial drug knowledge base), EHR: electronic health record