Discovering associations between adverse drug events using pattern structures and ontologies

Background Patient data, such as electronic health records or adverse event reporting systems, constitute an essential resource for studying Adverse Drug Events (ADEs). We explore an original approach to identify frequently associated ADEs in subgroups of patients. Results Because ADEs have complex manifestations, we use formal concept analysis and its pattern structures, a mathematical framework that allows generalization using domain knowledge formalized in medical ontologies. Results obtained with three different settings and two different datasets show that this approach is flexible and allows extraction of association rules at various levels of generalization. Conclusions The chosen approach permits an expressive representation of a patient ADEs. Extracted association rules point to distinct ADEs that occur in a same group of patients, and could serve as a basis for a recommandation system. The proposed representation is flexible and can be extended to make use of additional ontologies and various patient records.


Background
Adverse Drug Events (ADEs) occur unevenly in different groups of patients. Their causes are multiple: genetic, metabolic, interactions with other substances, etc. Patient data, in the form of either Electronic Health Records (EHRs) or adverse effects reports have been successfully used to detect ADEs [1,2]. We hypothesize that mining EHRs may reveal that subgroups of patients sensitive to some drugs are also sensitive to others. In such a case, several ADEs, each caused by different drugs, could be found to occur frequently in a subgroup of patients. While this is known to be true in certain classes of drugs, we further hypothesize that such associations can be found across different classes. We propose a method to identify these frequently associated ADEs in patients subgroups. The main issue to reach this goal is that ADE manifestations are complex and that they are reported in variable manners. Indeed, ADEs are not limited to the simple case of "one drug causing one phenotype" but may be an association between several drugs and several phenotypes. Furthermore, these drugs and phenotypes can be reported using different vocabularies and with varying levels of detail. For instance, two clinicians may report the same ADE caused by warfarin, an anticoagulant drug, either as "warfarin toxicity" or with a more precise description such as "ulcer bleeding caused by warfarin". As such, biomedical ontologies provide helpful resources to consider the semantic relationships between ADEs.
In [3], Roitmann et al. proposed a vector representation of patient ADE profiles: a patient is represented by a feature vector in which each feature is one phenotype experienced by the patient. All phenotypes are here considered as independent features. This representation is used with clustering algorithms to group patients into clusters in which prevalent drugs and phenotypes can be identified. This work could be expanded by considering biomedical ontologies coupled with a semantic similarity measure such as the one described in Devignes et al. [4], to cluster together patients taking distinct but similar drugs and expressing distinct but similar phenotypes. However, a limitation of a vector representation is that it aggregates all ADEs of a patient in a single object. In this paper, we propose a representation of the ADEs of a patient that preserves the distinctness of these events.
In [5], Winnenburg et al. extracted drug-phenotype pairs from the litterature to explore the relationships between drugs, drug classes and their adverse reactions. Adverse event signals are computed both at the drug and drug class levels. This work illustrates that some drug classes can be associated with a given adverse effect, and further investigates the association at the individual drug level. In cases where the association with the adverse effect is present for every drug in the class, it demonstrates the existence of a class effect. Otherwise, the association is present for only some drugs of the class, and cannot be intrinsically attributed to the class itself. This result shows that it is possible to consider ADEs either at the invididual drug level or at the drug class level. The approach we propose in this paper addresses this possibility, both at the level of ADE representation and inside the data mining approach itself, which allows generalization with biomedical ontologies. In addition, we are also capable of detecting ADE associations involving different classes of drugs.
For this purpose, we use an extension of Formal Concept Analysis (FCA) [6] called pattern structures [7] in combination with ontologies to enable semantic comparison of ADEs. FCA has been successfully used for signal detection in pharmacovigilance: in [8,9], FCA is used to detect signals in a dataset of ADEs described with several drugs causing a phenotype. In this case, FCA permits to mine for associations between a set of drugs and a phenotype. In this article, pattern structures allow us to extend the descriptions of ADEs with biomedical ontologies, and to mine higher-order associations, i.e., associations between ADEs.
We experimented with two types of datasets. A first dataset was extracted from EHRs of patients diagnosed with Systemic Lupus Erythematosus (SLE), a severe autoimmune disease. Such patients frequently experience ADEs as they often take multiple and diverse drugs indicated for SLE or derived pathologies [10]. Our second dataset was extracted from the U.S. Food & Drug Administration Adverse Event Reporting System (FAERS). This dataset was linked to biomedical ontologies thanks to a novel resource, AEOLUS [11].

ADE definition
An ADE is a complex event in that it may often involve several drugs, and manifest through several phenotypes. An ADE can then be characterized by a set of drugs, and a set of phenotypes. To facilitate comparison between ADEs, we consider sets of active ingredients of drugs, rather than sets of commercial drug names. In the rest of this article, we use the term "drug" to denote an active ingredient. In this study, we represent an ADE as a pair (D i , P i ), where D i is a set of drugs, and P i is a set of phenotypes. Table 1 presents examples of ADEs that could be extracted from the EHRs, and will serve here as a running example. Table 2 provides the origin and label of each ontology class code used in this article.

SLE EHR dataset from STRIDE
Our first dataset is a set of 6869 anonymized EHRs of patients diagnosed with SLE, extracted from STRIDE, the EHR data warehouse of Stanford Hospital and Clinics [12] between 2008 and 2014. It documents about 451,000 hospital visits with their relative dates, diagnoses encoded as ICD-9-CM phenotype codes (International Classification of Diseases, Ninth Revision, Clinical Modification) and drug prescriptions as a list of their ingredients, represented by RxNorm identifiers.
We first establish a list of ADE candidates for each patient EHR. From each two consecutive visits in the EHR, we extract the set of drugs D i prescribed during the first visit and the diagnoses P i reported during the second. The interval between the two consecutive visits must be less than 14 days, as it is reasonable to think that a side effect should be observed in such a time period after prescription. Moreover, Table 3 shows that increasing this interval does not significantly increase the number of patients in our dataset. An ADE candidate C i is thus a pair of sets C i = (D i , P i ). We retain in P i only phenotypes reported as a side effect for at least one drug of D i in the SIDER 4.1 database of drug indications and side effects [13]. We remove candidates where P i is empty. Furthermore, we remove an ADE candidate (D 1 , P 1 ) if there exists for the same patient another ADE candidate (D 2 , P 2 ) such that Class labels: ICD 599.8 is "other specified disorders of the urethra and urinary tract", ICD 599.9 is "unspecified disorders of the urethra and urinary tract", ICD 719.4 is "pain in joint"    In such cases, where several ADEs have comparable sets of drugs, we only retain the ADE with the maximal set, i.e., the most specialized set of drugs. Indeed, as we aim to find associations between different ADEs, we thus avoid considering multiple times such similar sets of drugs. Finally, we keep only patients having experienced at least two ADEs, as our goal is to mine frequently associated ADEs. After filtering, we obtain a total of 3286 ADEs for 548 patients presenting at least two ADEs.

FAERS dataset
FAERS publishes a database gathering ADEs reported by patients, healthcare professional and drug manufacturers in the United States. It is used for postmarketing pharmacovigilance by the U.S. Food & Drug Administration, data mining of signals in pharmacovigilance [2] or of adverse drug-drug interactions [14]. A recently-published resource, AEOLUS [11] maps FAERS drugs and phenotypes representations to RxNorm and SNOMED CT (Systematized Nomenclature of Medicine -Clinical Terms) respectively. We used this tool to rebuild a database of FAERS reports, linked to RxNorm and SNOMED CT, from the fourth quarter of 2012 to the second quarter of 2016 included. Each FAERS report lists a set of prescribed drugs D i and the a of experienced phenotypes P i . Thus, we can formalize each report as a pair of sets (D i , P i ). These reports are grouped in cases, enabling us to identify additional reports that follow up an initial ADE. We selected, in the FAERS database, cases with multiple reported ADEs, excluding ADEs where the set of drugs is included in another ADE of the same case. With these constraints, we extract 570 cases with two or more distinct ADEs, for a total of 1148 ADEs.

Medical ontologies
We use three medical ontologies, considering only their class hierarchy, to enable semantic comparisons of drugs and phenotypes when comparing ADEs: • ICD-9-CM describes classes of phenotypes, as it is used in STRIDE to describe diagnoses; • SNOMED CT is an ontology of medical terms, which we use to describe the phenotypes of FAERS, using the mappings provided by AEOLUS; • The Anatomical Therapeutic Chemical Classification System (ATC) describes classes of drugs. In this work, we used only the three most specific levels of ATC: pharmacological subgroups, chemical subgroups and chemical substances.

Association rule mining
Assocation rule mining [15] is a method for discovering frequently associated items in a dataset. Association rule mining is performed on a set of transactions, represented as sets of items. Association Rules (ARs) are composed of two sets of items L and R, and are noted L → R. Such a rule is interpreted as "when L occurs in a transcation, R also occurs". Note that ARs do not express any causal or temporal relationship between L and R. ARs are qualified by several metrics, including confidence and support. The confidence of a rule is the proportion of transactions containing L that also contains R. The support of a rule is the number of transactions containing both L and R. For instance, if a rule A, B → C has a confidence of 0.75 and a support of 5, then, C occurs in 3 4 of the transactions where A and B occur, and A, B, C occur together in 5 transactions. Note that the support may also be represented relatively to the total number of transactions in the dataset, e.g., 5 500 for a dataset of 500 transactions.
Several algorithms for association rule mining, such as Apriori, have been proposed, based on frequent itemsets [16]. Such frequent itemsets can be identified using an itemset lattice [17]. FCA offers facilities for building lattices, identifying frequent itemsets and association rule mining [18]. In the following section, we present FCA and its extension pattern structures, as a method to mine ARs.

Formal concept analysis and pattern structures
Formal Concept Analysis (FCA) [6] is a mathematical framework for data analysis and knowledge discovery. In FCA a dataset may be represented as a concept lattice, i.e., a hierarchical structure in which a concept represents a set of objects sharing a set of properties. In classical FCA, a dataset is composed of a set of objects, where each object is described by a set of binary attributes. Accordingly, FCA permits describing patients with the ADEs they experienced represented as binary attributes, as illustrated in Table 4. The AR ADE 1 → ADE 3 that can be extracted from this dataset has a support of 2 and a confidence of 2 3 . This AR expresses that two thirds of the patients that experienced ADE 1 also experienced ADE 3 , and that the rule was verified by 2 patients (P1 and P3) in the dataset. However, FCA does not take into account the similarity between attributes. For instance, both ADE 3 and ADE 4 could be caused by the same drugs, while presenting slightly different phenotypes. In such a case, we may want to extract a rule expressing that patients who experienced ADE 1 also experienced an ADE similar to ADE 3 or ADE 4 .
Accordingly, approaches extracting ARs from sets of binary attributes are limited as the similarity of attributes is not considered. This is the case of algorithms such as Apriori, or classical FCA approaches.We propose to introduce a more detailed representation of patients ADEs, along with a fine-grained similarity operator.
Pattern structures generalize FCA in order to work with a set of objects with descriptions not only binary but of any nature such as sets, graphs, intervals [7,19]. Particularly, pattern structures have been used to leverage biomedical knowledge contained in ontology-annotated data [20].
A pattern structure is a triple (G, (D, ), δ), where: • G is a set of objects, in our case, a set of patients, • D is a set of descriptions, in our case, representations of a patient's ADEs, • δ is a function that maps objects to their descriptions.
• is a meet operator such that for two descriptions X and Y in D, X Y is the similarity of X and Y : X Y is a description of what is common between descriptions X and Y . It defines a partial order ≤ on elements of D. X ≤ Y denotes that Y is a more specific description than X, and is by definition Table 4 Example of a binary table to be used for extraction of associations between ADEs using Formal Concept Analysis (FCA) Generalization on object descriptions is performed through the use of the meet operator. In the following section, we define three distinct meet operators ( 1 , 2 , 3 ) that enable considering similarities between ADE descriptions at different levels of granularity. This section also illustrates the application of pattern structures.
In pattern structures, the derivation operator . defines a Galois connection between sets of objects and descriptions, as follows: Intuitively, A is the most precise description for the set of objects A, and d is the set of objects described by a description more specific than d. A pattern concept is a pair (A, d) with A = d and d = A. Pattern structures enable building a lattice of pattern concepts, which allow associating a set of patients with a shared description of their ADEs, based on their similarity.
In our study, G is the set of patients that are related through δ to the description of their ADEs in D. We have designed different experiments using pattern structures, each providing their own definition of the triple (G, (D, ), δ).

Experimental design
In this section, we describe three experiments to extract ARs between ADEs. Each one defines a different representation of patient ADEs and a different setting of pattern structures, making increasing use of ontologies. Table 4 presents a naive representation of patient ADEs. However, we want a representation that takes into account similarity between ADEs, instead of considering ADEs as independent attributes. Accordingly, we propose in this first experiment a representation that groups ADEs with high level phenotypes and we define an operator to compare their sets of drugs.

Experiment 1: Pattern structure without semantic comparison
We define here the pattern structure (G, (D 1 , 1 ), δ 1 ): objects are patients, and a patient description of D 1 is a vector of sub-descriptions, with first-level ICD-9-CM classes as dimensions. Each sub-description is a set of drug prescriptions, i.e., a set of sets of drugs. For instance, considering only the two ICD-9-CM classes of Table 5: Here, ADEs are decomposed w.r.t. their phenotypes. Sub-descriptions are associated to a first-level ICD-9-CM Table 5 Example of representation of patient ADEs for (G, (D 1 , 1 ), δ 1 ), with two first-level ICD-9-CM classes: diseases of the genitourinary system (580-629), and of the musculoskeletal system and connective tissue (710-739) Patient ICD 580-629 (genitourinary system) ICD 710-739 {{prednisone, acetaminophen}} {{acetaminophen}} class to represent ADEs: the patient presents a phenotype of that class after taking a prescription in that sub-description. In the example presented in Table 5, the patient P1 experienced an ADE with a phenotype from the ICD-9-CM class 580-629 twice: once after prescription of prednisone, and another time after prescription of acetaminophen.
We define a sub-description as a set of prescriptions, where none of the prescriptions are comparable to each other by the partial order ⊆. We then define the meet operator 1 , such that, for every pair of descriptions (X, Y ) of D 1 : is the unique subset of maximal elements of a set S given any partial order ≤ i . Formally, In the present case, it retains only the most specific set of drugs prescribed in the description. For instance, given four drugs d 1 through d 4 : is the only ⊆-maximal element. Indeed, the semantic of {d 2 } -a prescription that contains the drug d 2 -is more general than the semantic of {d 1 , d 2 } -a prescription that contains both the drugs d 1 and d 2 .
Given that each patient has a description for each first-level ICD-9-CM class, the meet operator defined for a sub-description can be applied to a vector of subdescriptions:  Figure 1 shows the semi-lattice associated with this pattern structure and the data in Table 5. Nevertheless, this  Table 5 using the pattern structure (G, (D 1 , 1 ), δ 1 ), where arrows denote the partial order ≤ 1 example shows that in the absence of semantics between descriptions, generalization rapidly produces empty sets devoid of information.

Experiment 2: Extending the pattern structure with a drug ontology
Using a drug ontology permits to find associations between ADEs related to classes of drugs rather than individual drugs. Thus, we extend the pattern structure described previously to take into account a drug ontology: ATC. Each drug is replaced with its ATC class(es), as shown in Table 6. We notice that the fact that one drug can be associated with several ATC classes is handled by our method as sets of drugs become represented as sets of ATC classes.
We define this second pattern structure (G, (D 2 , 2 ), δ 2 ) where descriptions of D 2 are sets of prescriptions with drugs represented as their ATC classes. In order to compare sets of classes from an ontology O, we define an intermediate meet operator O , for x and y any two sets of classes of O:  classes than x. We then define the meet operator 2 such that for every pair of descriptions (X, Y ) of D 2 : This pattern structure allows generalization of ADEs involving different drugs that share a pharmacological subgroup. For instance:

Experiment 3: Extending the pattern structure with a drug and a phenotype ontology
We define a third pattern structure that permits the use of both ATC and a phenotype ontology for better specialization of phenotypes compared to the previous experiment. As this experimental design can be applied to both the EHR and FAERS datasets, we design a pattern structure that can operate with any drug and phenotype ontologies. We apply it to our EHR dataset with ATC and ICD-9-CM, and to the FAERS dataset with ATC and SNOMED CT.
To avoid over-generalization, we excluded the two mostgeneral levels of ICD-9-CM and the three most-general levels of SNOMED CT. Table 7 illustrates the data representation used with this pattern structure, using ATC and ICD-9-CM. Here, ADEs are represented as vectors D i , P i with two dimensions: the set of drugs D i associated with the set of phenotypes P i . A patient description is then a set of such vectors. Class labels: H02AA03 is desoxycortone, H02AB07 is prednisone, N02BE01 is acetaminophen, ICD 599.8 is "other specified disorders of the urethra and urinary tract", ICD 599.9 is "unspecified disorders of the urethra and urinary tract", ICD 719.4 is "pain in joint" We define the pattern structure (G, (D 3 , 3  The operator ADE applies the ontology meet operator O on both dimensions of the vector representing the ADE, using either ATC or ICD-9-CM as the ontology O. Both dimensions of the resulting vector needs to contain non-root ontology classes for it to constitute a representation of an ADE. If it is not the case, we set it to ∅, ∅ to ignore it in further generalizations. We define the meet operator 3 such that for every pair of descriptions (X, Y ) of D 3 : Compared to 2 , 3 introduces a supplementary level of computation with ADE , which generalizes ADEs and applies O to an additional ontology: ICD-9-CM.

Extraction and evaluation of associations rules
The pattern structures described previously can be used to build concept lattices, where each concept associates a set of patients with the similarity of their ADEs descriptions. Such a concept lattice allows for identifying frequent ADEs descriptions, which can be used for extracting Association Rules (ARs). An AR is identified between two related concepts in the lattice, with descriptions δ(l) and δ(r) such that δ(l) < δ(r). Thus, such an AR comprises a left-hand side L = δ(l) and a right-hand side R = δ(r) − δ(l), where "−" denotes set difference. Such a rule is noted L → R.
This process can be expected to generate a large amount of rules, among which ARs serving our goal of detecting associations between ADEs must be identified. We therefore filter ARs according to the following conditions: • The right-hand side R of the AR contains at least one ADE, noted as (D R , P R ) for which there is no ADE (D L , P L ) in the left-hand side L such that either D R and D L are ≤ O comparable, or P R and P L are ≤ O comparable. This condition ensures that the right-hand side of the rule introduces new drugs and phenotypes unrelated to those of the left-hand side, i.e., the association between the ADEs of both sides is not trivial. • As patients in the EHR dataset are treated for Systemic Lupus Erythematosus (SLE), rules must not include related phenotypes (ICD-9-Cm class 710 and descendants).
ARs extracted from the SLE patients EHR dataset were evaluated by computing their support in the entire STRIDE EHR dataset. Selected ARs with the largest support were transformed into SQL queries, in order to retrieve matching patients from the STRIDE database. Figures 2 and 3 show an overview of ATC drug classes associated by the ARs extracted in the third EHR experiment. We isolated every pair of ATC classes associated by ARs, i.e., one ATC class or one of its subclass is present in the left-hand side of the AR, and one is present in its right-hand side. Figure 2 shows the frequency of such associations and Fig. 3 shows, for the significant ones, the difference to the frequency obtained if the association would be random. For each pair (l, r) of ATC classes, we search for the set of rules of the form L → R, such that l or one of its subclasses appears in L and r or one of its subclasses appears in R and compute their combined support. The combined support of a set of rules is the number of patients described by at least one of these rules. The combined support of all rules having class l in L or class r in R is also calculated and indicated at the beginning of each row for l classes and at the top of each column for r classes. Cells of the Fig. 2 indicate, for each (l, r), the ratio between (i) the combined support of ARs where l appears in L and r appears in R and (ii) the combined support of ARs where l appears in L. This ratio denotes how often the extracted rules associate an ADE where a drug from l with an ADE where drug from r is involved. Note that the total of all ratios is greater than 1 for each row as one rule can associate more than two ATC classes, and one patient can verify more than one rule. Fig. 3 shows significant (p < 0.001, Z-test) deviations from the expected values of these ratios. For each ATC class appearing in right-hand sides of ARs, the expected ratio was computed as the combined support of rules where that class appears in the right-hand side divided by the combined support of all rules. A Z-test was used to assess significance at p < 0.001 of such deviations.

Results
We present in this section the results of the experiments described previously. As the first two experiments make use of the tree structure of ICD-9-CM to simplify the representation of ADEs (as specified in Methods, FAERS phenotypes are mapped onto SNOMED CT rather than ICD-9-CM), they were applied only to the EHR dataset. The third experimental design offers a generalization of the approach to any drug and phenotype ontologies, and was applied to both the EHR and FAERS datasets. We thus present the results of four experiments: three experiments on our EHR dataset using all three experimental designs, and a fourth one on the FAERS dataset using the third experimental design.

Overview of results
The four experiments result in four concept lattices, from which we extract Association Rules (ARs) of the form L → R. Empirically, we only retain ARs with a support of at least 5, and a confidence of at least 0.75. Table 8 presents some statistics about this process in our four experiments.
We observe that the third experiment generates a much larger concept lattice from the EHR dataset than from the FAERS dataset, despite their similar number of patients. Nevertheless, we obtain after filtering only twice as many rules from the EHR dataset in comparison with the FAERS dataset. Moreover, rules extracted from FAERS have generally larger support values. These results can be explained by the differences between the two datasets: the EHR dataset is built from ADEs extracted from EHRs of patients diagnosed with SLE, while the FAERS dataset gathers ADEs reported from the general population. Futhermore, the higher number of ADEs per patient in the EHR dataset tends to increase similarities between patients, thus increasing the number of generated concepts.  Fig. 2 was compared to its expected value assuming a proportional distribution of ATC classes in the right-hand side. Empty cells indicate that the difference between the observed and expected ratios is not significant (p > 0.001, Z-test). Other cells show the difference between the observed and expected ratios, and this difference is significant (p < 0.001, Z-test). p-values where computed using a standard normal table, assuming normal distributions centered on expected ratios Figures 2 and 3 show an overview of ATC drug classes present in ADEs associated by the ARs extracted in the third EHR experiment. Figure 2 shows the frequency of such associations and Fig. 3 shows, for the significant ones, the difference to the frequency obtained if the association would be random. Figure 3 highlights a few positive deviations from the expected association ratios. For instance, we find that ADEs involving Beta-Blocking Agents (C07A) are associated strongly with ADEs involving High-Ceiling Diuretics (C03C). Both classes of drugs are involved in antihypertensive therapy, either separately or in combination. Thus, it is likely that a certain number of patients are prescribed with these two classes of drugs. Our results suggest that among these patients, some could experience distinct ADEs involving each class. We also observe that ADEs involving Antithrombotic Agents (B01A) are significantly associated with other ADEs involving the same class of drugs. Thus, it appears that the proposed approach reveals significant associations of ADEs involving either the same or different classes of drugs. Table 9 presents examples of ADE associations obtained for the three experiments performed on EHRs. In fact, nearly the same rule is found here with varying generalization levels across the three experiments. Note that for readability and comparison purpose, all ARs are expressed in the third experiment formalism. In this example, we observe that the AR from experiment 2 is more general than the AR from experiment 1 (R06A is a super-class of doxylamine in ATC). In the third experiment, more specialized phenotypes are obtained (for instance ICD 586 is a sub-class of ICD 580-629). For each experiment, ADEs can involve a combination of two or more drugs or drug classes. ARs may also associate a pair of ADEs on the lefthand side with a single ADE on the right-hand side as in our thrid experiment. The complete set of filtered rules for each experiment is available online at https://github.com/g-a-perso/ADEassociations/.

Examples of extracted association rules
An overview of the 11 ARs extracted from the third experiment on EHR with support greater than or equal to 8 is presented in Table 10. For instance, we produce the following AR, with support 10 and confidence 0.77: {Benzothiazepine derivatives} , Congestive heart failure → Drugs for peptic ulcer and GORD , {Atrial fibrillation} This rule expresses that 10 13 of patients who present congestive heart failure (ICD 428.0) after prescription of benzothiazepine derivatives (C08DB), also present atrial fibrillation (ICD 427.31) after prescription of a drug for peptic ulcer and gastro-esophageal reflux disease (A02B). This rule holds for 10 patients.

Support of EHR rules in STRIDE
Our EHR dataset is only a small part of the total STRIDE data warehouse that contains about 2 million EHRs. We therefore evaluated the support of the 11 ARs listed in Table 10 in the whole STRIDE data warehouse. Each AR was transformed into an SQL query to retrieve the patients verifying the rule. Table 10 reports the support in the dataset of SLE-diagnosed patients as S 1 and the support in the entire STRIDE database as S 2 . In all cases the support raises from S 1 to S 2 and the increase ratio varies from 2 to 36. This illustrates that the ARs extracted from the SLE EHRs can be relevant to patients outside of the initial dataset.

ADE extraction
We observed a large quantitative difference between the results of our experiments on EHRs and on FAERS. This is explained by the different nature of the two datasets: while the FAERS dataset gathers self-reported ADEs, we built the EHR dataset from ADEs we extracted. As the extraction of ADEs from EHR is not the core of this work, we used a simple method that we do not evaluate here.
This method has inherent limitations. Particularly, there is uncertainty as whether the extracted events are actually caused by the concerned drugs. We acknowledge that our method for ADE detection is not as robust as disproportionality score algorithms [21]. In particular, we could consider confounding factors such as age, sex, comorbidities or concomitant drugs. Nevertheless, we filtered extracted ADEs using SIDER in order to retain only phenotypes that are known as side effects of the drugs listed in that ADE.
Another limitation is that we are considering only drug ingredients, whereas one ingredient may be prescribed in various forms (for instance, eye drops or tablets). Not considering the form of the drug may result in imprecise ADE definitions, as one phenotype may be caused by only some forms of the ingredient. Using the unambiguous encoding of prescriptions of the STRIDE EHR dataset would address this limitation, but was not available in this study. Class labels: A02B is "drugs for peptic ulcer and gastro-oesophagal disease", G04BE is "drugs used in erectile dysfunction", N06BC is "xanthine derivatives", R06A is "antihistamines for systemic use", R06AA is "aminoalkyl ethers" ICD 280-289 is "diseases of the blood and blood-forming organs", ICD 285.9 is "anemia, unspecified", ICD 580-629 is "diseases of the genitourinary system", ICD 586 is "renal failure, unspecified". Here, yohimbine belongs to the class G04BE (drugs used in erectile dysfunction), caffeine belongs to the classe N06BC (xanthine derivatives) and doxylamine belongs to the class R06AA (aminoalkyl ethers)  For these reasons, ADEs extracted from EHRs likely present a relatively high rate of false positives. This is also reflected in the size of the concept lattice we generated from that dataset, as noise increase the number of possible generalizations (see Table 8).

ADE representation
While pattern structures permit detailed descriptions of ADEs, the algorithmic complexity of comparing those descriptions and building the concept lattice needs to be considered. In particular, the size of the concept lattice that needs to be generated proves to be a limiting factor to scale the approach on larger datasets. We observed that the size of the lattice increases as we use more detailed descriptions of ADEs.
One apparent limitation of this work is the absence of temporal relationships between ADEs. We voluntarily did not consider that aspect because the order of occurrence of ADEs can vary between patients. However, in cases of interest, this order can be checked in patient EHRs as pattern structure concepts retain patient identifiers as well as their description. Preliminary investigation for a given subset of patient EHRs reveals that the ADEs of the lefthand side of an AR can occur either before or after the ADEs of the right-hand side of the rule.
In our experiments on EHRs, we only considered side effect phenotypes occuring in a timeframe of 14 days after a prescription, whereas an ADE may manifest much later after the initial prescription. Thus, we only extracted associations between rather short-term ADEs. The representation of ADEs used in the different experiments could be expanded with data about the actual delay between the prescription and the observed phenotypes. This would allow for mining associations in a dataset of both shortterm and long-term ADEs, while retaining the ability to discriminate between these different manifestations. In particular, this could permit extracting associations between short-term and long-term ADEs, where shortterm toxicity to a given drug could be used as a predictor of the long-term toxicity of another.

Associations between ADEs
We use association rule mining to extract associations between frequently co-occuring ADEs. A limitation of that approach is that we cannot infer any causal relationship between these ADEs. However, it appears more meaningful to investigate potential common causes of ADEs associated through an AR, rather than to search a direct causal relationship between involved ADEs. Besides concerns on the quality of the association itself, this limits its interpretation and exploitation: without a proper explanation of the relationship of the two ADEs, the rules cannot be used to guide drug prescription. They can however raise vigilance towards the possible occurence of an additionall ADE.
A large amount of ARs can be extracted from our concept lattices. We automatically filtered a subset of these ARs by excluding rules that do not fit the scope of the study. While the approach we proposed is flexible, it is difficult to compare ARs extracted from very different datasets and expressed with different ontologies. Therefore, we tested selected rules obtained from our SLE-oriented EHR dataset on the whole STRIDE database. The results of these tests indicate that rules extracted from a subset of EHRs (here patients diagnosed with SLE) can apply to a more general set of patients (Table 10). Indeed, SLE patients are susceptible to multiple occurrences of ADEs caused by a wide range of drugs. EHRs of such patients, used in conjunction with biomedical ontologies may then be used to identify frequently associated ADEs. We now need to prioritize these ARs with respect to their importance in terms of cost and risk of the phenotypes present in their right-hand side.

Conclusions
We explore in this paper an approach based on pattern structures to mine EHRs and adverse event reporting systems for commonly associated ADEs. Pattern structures permit to work with an expressive representation of ADEs, which takes into account the multiplicity of drugs and phenotypes that can be involved in a single event. Pattern structures also allow to enhance this representation with diverse biomedical ontologies, enabling semantic comparison of ADEs. To our knowledge, this is the first approach able to consider such detailed representations to mine associations between frequently associated ADEs. The proposed approach is also flexible and can be applied to various EHRs and adverse event reporting systems, along with any linked biomedical ontology. We demonstrated the genericity of the approach on two different datasets, each of them linked to two of three distinct biomedical ontologies.
The kind of extracted ARs presented in this article could serve as a basis for a recommandation system. For instance, such a system could recommand vigilance towards the possible occurence of an ADE based on the ADE history of the patient. Drugs involved in ARs of interest could be investigated, in light of the current knowledge of their mechanisms, to look for possible common causes between associated ADEs. Our chosen representation for ADEs could be further extended to include additional properties of drugs and phenotypes, such as drugs targets annotated with Gene Ontology classes. This could permit to search for association rules taking into account the drug mechanisms.