Skip to main content

Semantic interestingness measures for discovering association rules in the skeletal dysplasia domain



Lately, ontologies have become a fundamental building block in the process of formalising and storing complex biomedical information. With the currently existing wealth of formalised knowledge, the ability to discover implicit relationships between different ontological concepts becomes particularly important. One of the most widely used methods to achieve this is association rule mining. However, while previous research exists on applying traditional association rule mining on ontologies, no approach has, to date, exploited the advantages brought by using the structure of these ontologies in computing rule interestingness measures.


We introduce a method that combines concept similarity metrics, formulated using the intrinsic structure of a given ontology, with traditional interestingness measures to compute semantic interestingness measures in the process of association rule mining. We apply the method in our domain of interest – bone dysplasias – using the core ontologies characterising it and an annotated dataset of patient clinical summaries, with the goal of discovering implicit relationships between clinical features and disorders. Experimental results show that, using the above mentioned dataset and a voting strategy classification evaluation, the best scoring traditional interestingness measure achieves an accuracy of 57.33%, while the best scoring semantic interestingness measure achieves an accuracy of 64.38%, both at the recall cut-off point 5.


Semantic interestingness measures outperform the traditional ones, and hence show that they are able to exploit the semantic similarities inherently present between ontological concepts. Nevertheless, this is dependent on the domain, and implicitly, on the semantic similarity metric chosen to model it.


Over the course of the last decade, ontologies have become a fundamental building block in the knowledge acquisition and capturing processes in the biomedical domain. Repositories such as BioPortal [1] or the OBO Foundry [2] currently offer a varied range of ontologies, in addition to tool support to visualise, query and integrate concepts hosted by these ontologies. Subsequently, this enables the construction of decision support methods that use ontological background knowledge in order to produce more accurate and more refined outcomes.

Ontologies provide structured and controlled vocabularies and classifications for domain specific terminologies. Their adoption for annotation purposes provides a means for comparing medical concepts on aspects that would otherwise be incomparable. For example, the annotation of a set of disorders (directly or via patient cases) using a particular ontology enables us to compare these disorders, by looking at the underpinning annotation concepts. The actual comparison can be done in an exact or inexact manner. More concretely, one may take into account only those identical concepts that appear in all or some disorders, or may use a semantic similarity measure that relaxes the constraint on identical concepts. Such a semantic similarity measure represents a function that takes two or more ontology concepts and returns a numerical value that reflects the degree of similarity between these concepts in a given ontology. This comparison process represents a key aspect of typical data mining algorithms that form the core of any decision support method. For example, two ontological concepts, such as HP:0004481 (Progressive macrocephaly) and HP:0004482 (Relative macrocephaly) from the Human Phenotype Ontology (HPO) [3], would be treated differently by any classical data mining algorithm because of their symbolic (i.e., lexical grounding) difference. However, these two concepts, like any other two entities in an ontology, are to a certain extent semantically similar – a similarity that can be encoded via an existing or custom-made metric. Replacing exact matching with semantic similarity measures provides novel and exciting opportunities in knowledge discovery and decision support on annotated datasets [46].

Association rules [7] are valuable patterns that can discovered from annotated datasets. An association rule denotes an implication relationship (or a directed co-occurrence) between two sets of items within a transaction. A widely used algorithm to discover such association rules is Apriori [7]. However, regardless of the particular algorithm used, the discovery process has two major challenges: (i) too many rules may be generated (the rule quantity problem); (ii) not all rules are necessarily interesting (rule quality problem). The solution to the rule quality problem relies on specifying an interestingness measure [810] to encode the utility or significance of a pattern. These measures are intended for selecting and ranking patterns according to their potential interest and enables highly ranked rules to be immediately presented or used for particular purposes.

Existing work on interestingness measures takes into account only exact matching [10]. Semantic similarities, however, enable novel ways of interpreting data items, and hence may lead to the identification of association rules that are otherwise not discoverable via exact matching. In this manuscript, we advance the state of the art by exploring the application of semantic similarities in widely used interestingness measures in the context of association rule mining. In other terms, we aim to use existing taxonomic relations to calculate so-called “semantic interestingness measures”.

The context of our research is provided by the SKELETOME project [11], which aims to create a community-driven knowledge curation platform for the skeletal dysplasia domain. Skeletal dysplasias are a heterogeneous group of genetic disorders affecting skeletal development. Currently, there are over 450 recognised bone dysplasias, structured in 40 groups. Patients with skeletal dysplasias have complex medical issues including short stature, bowed legs, a larger than average head and neurological complications. Since most skeletal dysplasias are very rare (< 1:10,000 births), data on clinical presentation, natural history and best management practices is sparse. To date, we have developed an ontology, the Bone Dysplasia Ontology (BDO) [12], and a series of decision support methods [6, 13]. BDO has been built using the latest nosology of bone dysplasias [14] that groups disorders according to their overlapping clinical and genetic features. For example, Achondroplasia and Diastrophic dysplasia are similar, and are both part of the FGFR3 Group, because they share a range of clinical features (i.e., short stature with very short arms and legs).

Within this manuscript, we investigate both traditional, as well as semantic interestingness measures in the context of association rule mining, to discover implicit relationships between clinical features and disorders in skeletal dysplasia domain. The main contributions of this work are the following: (i) firstly, we analyse which of the existing traditional interestingness measures enables a more accurate discovery of association rules in the skeletal dysplasia domain; (ii) secondly, we propose a series of interestingness measures based on semantic similarity metrics using existing ontologies as background knowledge; and (iii) finally, we perform an extensive empirical evaluation to measure the quality of the resulting rules, using an annotated dataset built on real patient data. At the same time, we show that, given an appropriate semantic similarity metric, the semantic interestingness measures outperform the traditional ones.

As already mentioned, our work focuses only on skeletal dysplasias, and hence it investigates the efficiency of the above-described methods only in this domain. However, the generic definition of a semantic interestingness measure proposed in this manuscript is directly applicable in any other domain, while the rest of the research can be used as a guideline for choosing an appropriate domain-specific semantic similarity metric to be applied as part of the overall measure.


This section provides an overview of the foundational blocks of the experiments performed in the context of our research. We start by introducing the Human Phenotype Ontology and the Bone Dysplasia Ontology – i.e., the ontologies used as background knowledge for the semantic similarity metrics. Then, we describe some of the basic notions of semantic similarities, and finally, we discuss some of the traditional interestingness measures.

Human Phenotype Ontology

The Human Phenotype Ontology (HPO) [3] has lately become the de facto controlled vocabulary to capture and represent clinical and radiographic findings. The ontology consists of around 9,000 concepts describing modes of inheritance, onset and clinical disease courses and phenotypic abnormalities. This last category represents around 95% of the ontology and it is the main subject of our study. HPO structures phenotypic abnormalities in a hierarchical manner (via class-subclass relationships) from generic (e.g., HP:0000929 (Abnormality of the skull) to specific concepts (e.g., HP 0000256Macrocephaly). For instance, HP:0001629 (Ventricular septal defect) is a subclass of the concept HP:0010438 (Abnormality of the ventricular septum) in the sense that a ventricular septal defect is a kind of abnormality of the ventricular septum and hence. every person with a ventricular septal defect can also be said to have an abnormality of the ventricular septum. This goes along the line of the True path rule[15], which states that an annotation with a particular concept implies the path from that concept to the root to be “true”, or more concretely, a valid annotation with all ancestors of that concept.

One obvious advantage of capturing phenotypic information using ontologies is that it enables the design of association mining algorithms that can exploit the semantic relationships between concepts. For instance, an algorithm can be designed to support not only the patterns associated with a concept like HP:0001671 (Abnormality of the cardiac septa), but also those associated with its children, HP:0010438 (Abnormality of the ventricular septum) and HP:0011994 (Abnormality of the atrial septum).

Bone Dysplasia Ontology

The International Skeletal Dysplasia Society (ISDS – Nosology lists all recognised skeletal dysplasias and groups them by common clinical-radiographic characteristics and/or molecular disease mechanisms. The Nosology is revised every 4 years by an expert committee and the updated version is usually published in a medical journal. This is widely accepted as the “official” nomenclature for skeletal dysplasias within the biomedical community, with the latest version being published in 2010 [14].

The Bone Dysplasia Ontology [12] aims to complement the spectrum of existing ontologies and address the specific knowledge representation shortcomings of the ISDS Nosology. Its main role is to provide the scaffolding required for a comprehensive, accurate and formal representation of the genotypes and phenotypes involved in skeletal dysplasias, together with their specific and disease-oriented constraints. As opposed to the ISDS Nosology, the ontology enables a shared conceptual model, formalised in a machine-understandable description, in addition to a continuous evolution and a foundational building block for facilitating knowledge extraction and reasoning. Currently, the structure of the ontology follows closely the grouping of the disorders imposed by the expert committee via the Nosology by using class-subclass relationships between the 40 groups and their associated bone dysplasia members. These groups are then linked via the root concept Bone_Dysplasia.

Semantic similarity

Annotations using Bio-ontologies allow us to compare concepts on various aspects by using their intrinsic semantic similarity. Semantic similarity represents the quantification of the degree of similarity between two or more ontological concepts. For example, the annotation of two bone dysplasias with concepts emerging from the same ontology, e.g., HPO, enables their comparison by looking at the semantic similarity between the concepts used for annotation. In addition to this implicit role, semantic similarity measures can also be used to discover association rules in annotated datasets.

In principle, there are two types of approaches for computing semantic similarity measures: node-based and edge-based. The former uses the nodes and their properties as the data source whereas the latter uses the edges between nodes and their associated types as data source. The node-based approaches usually rely on the notion of Information Content (IC) to quantify informativeness of a concept. An IC value of a node is calculated by computing the negative likelihood of its frequency in a large text corpora (I C(c)=-l o g(p(c))), with the intuition that the more probable is the appearance of a concept in a corpus, the less information it conveys.

A large number of node-based measures have been proposed using Information Content as a central element, some of the most widely used being listed below, i.e., Resnik [16], Lin [17] and Jiang and Conrath [18]. As a note, in the equations below, MICA denotes the Most Informative Common Ancestor, i.e., the common ancestor of the nodes with the highest Information Content.

Resnik : SIM Res ( c 1 , c 2 ) = IC ( c MICA )
Lin : SIM Lin ( c 1 , c 2 ) = 2 IC ( c MICA ) IC ( c 1 ) + IC ( c 2 )
Jiang and Conrath : SIM JC ( c 1 , c 2 ) = 1 - IC ( c 1 ) + IC ( c 2 ) - IC ( c MICA )

In the other category, i.e., edge-based approaches, Wu & Palmer [19] proposed a measure based on the length of the shortest path between the Least Common Ancestor (LCA) and path between each of the concepts and that common ancestor.

Dis W & P ( c 1 , c 2 ) = 2 N 3 N 1 + N 2 + 2 N 3

where, N 3 is the length of path from LCA to the root; N 1 is the length of path from c1 to LCA; N 2 is the length of path from c2 to LCA.

Association rule mining

Association rules [7] provide knowledge in the form of probabilistic “if-then” statements, e.g., IQ. The head of the association rule (i.e., the if part – I) is called antecedent, while the body (i.e., the then part – Q) is called consequent. The antecedent and consequent of an association rule are disjoint – they do not have any items in common. To express uncertainty in association rules, i.e., IQ with a certain degree of certainty, several metrics can be used, two of the most widely adopted being Support and Confidence (discussed below). A set of association rules aimed for classification is called predictive association rule set. A class association rule set is a subset of association rules with the specified classes as their consequences. Predictive association rules form a small subset of class association rules. Generally, mining predictive association rules undergoes the following two steps: (i) Find all class association rules from a database, followed by (ii) Prune and organise the found class association rules to return a sequence of predictive association rules.

Traditional interestingness measures. As mentioned earlier, the rule discovery process is usually associated with two challenges, one of them being the rule quality problem, i.e., quantifying which of the discovered rules are more interesting. Interestingness measures play an important role in data mining, regardless of the kind of patterns being mined. They are intended for selecting and ranking patterns according to their potential interest to the user. Below, we present a number of existing association rules interestingness measures [10], which we have also applied in our experiments. This set of measures rely on the foundational Support and Confidence metrics.

Let T={t1,t2,…,t n } be a database of n transactions with a set of attributes (or items) I={i1,i2,…,i m }. For an itemset I X I and a transaction tT, we say that t supports I X if t has values for all the attributes in I X . By T I X we denote the transactions that contain all attributes in I X .

The Support of I X is computed as

Support ( I X ) = T I X n

or the fraction of transactions that include all attributes in I X .

The Confidence of an association rule I X Q, where Q is also an itemset (QI) and QI X =ϕ, is defined by:

Confidence( I X Q)= Support ( I X , Q ) Support ( I X )

or the ratio between the number of transactions that include all items in the consequent (Q), as well as in the antecedent (I X ) – namely, the Support of the union of I X and Q – and the number of transactions that include all items in the antecedent (i.e., the Support of I X ).

Confidence alone may not be enough to assess the descriptive interest of a rule, as rules with high confidence may occur by chance. Such spurious rules can be detected by determining whether the antecedent and the consequent are statistically independent. This inspired a number of measures, including Lift, Conviction, Leverage, Jaccard, Cosine and Correlation Coefficient [810]. We provide their mathematical definitions in the following sections.

Materials and methods

Annotation dataset

The rare nature of bone dysplasias makes the data collection particularly challenging. In 2002, the European Skeletal Dysplasia Network (ESDN, was created to alleviate, at least partly, the data sparseness issue. At the same time it aimed to provide a collaborative environment to help with the diagnosis of skeletal dysplasias and to improve the information exchange between researchers. To date, ESDN has gathered over 1,200 patient cases, which have been discussed by its panel of experts. The ESDN case workflow consists of three major steps: (i) a patient case is uploaded and an initial diagnosis is set by the original clinician that referred the case; (ii) the panel of experts discusses the case until an agreement is reached; (iii) the panel of experts recommends a final diagnosis. Among the total number of cases, 744 have a final bone dysplasia diagnosis (the remaining cases were not thought to be true bone dysplasias by the experts), with a total of 114 different skeletal dysplasias covered.

Patient clinical summaries in ESDN are represented in a free text format. The language used within the ESDN clinical summaries suffers from several issues, such as synonymy (several terms having the same meaning) or hyponymy (one term being more specific than another). In order to be able to use this data, we extracted patient phenotypes by annotating the text with corresponding terms from the Human Phenotype Ontology (HPO). The actual annotation process was performed using the National Centre for Biomedical Ontology (NCBO) Annotator [20], an ontology-based web service for annotation of textual sources with biomedical concepts. A bone dysplasia expert (one of the co-authors) has manually validated the resulting HPO annotations to ensure their correctness and to eliminate, in particular, false positives. As a remark, the false negatives resulted from the annotation process may be under-estimated, and could not be validated since we were not able to perform a full-fledged annotation of the clinical summaries. The diagnosis associated with the patient cases has also been annotated with concepts from the Bone Dysplasia Ontology (BDO). More concretely, the final diagnosis set by the panel of experts has been converted to the corresponding BDO concept.

In order to achieve realistic results using association rule mining, from the 114 types of dysplasias present in the ESDN dataset, we chose only those that were represented by more than 10 patient cases. This has reduced our dataset to 394 annotated patient cases (i.e., around 33% of the total number) diagnosed with 15 different bone dysplasias. The set features a total of 441 distinct phenotypes, with an average of 63.67 distinct phenotypes per disorder and an average of 4.49 distinct phenotypes per case. The experiments described in this manuscript use this dataset for training and testing purposes.

Proposed approach

Our goal is to discover association rules from annotated and diagnosed patient cases in order to observe co-occurrence relationships between clinical features and disorders. In other words, we aim to find association rules of the form {I CF }→{I BD }, where I CF represents the set of clinical features of a patient and I BD is a bone dysplasia diagnosis. From a conceptual perspective, I CF will comprise annotations assigned to patient cases, or more concretely, HPO concepts. We have adapted the Apriori algorithm by adding two constraints, required to match our aim: (i) every desired itemset must have one set of clinical features and a single dysplasia, and (ii) both candidate itemsets and frequent itemsets can have at most one dysplasia item.

Following the discovery of the desired itemsets, these are partitioned into two components: a component containing the skeletal dysplasia and one containing the phenotypes. A Boolean function that determines the type of a component is used to perform this classification. Subsequently, we calculate the different traditional or semantic interestingness measures between the bone dysplasia component and the phenotype set of the rule.

Modelling traditional support in the context of semantic annotations

If an itemset consists of the items I={i1,i2,i3,…,i m } for the reference concept RC and there are n transactions in the knowledge base KB, Support is defined as the proportion of instances of the reference concept RC in the knowledge base which contain the itemset I.

Support ( I , RC , KB ) = Number of instances of concept RC that contain the itemset I The total number of instances of the concept RC

In our case, the reference concept (RC) is represented by the patient (P) and KB is annotated dataset. Below we present an example of traditional Support calculation.

Let us consider the following set of clinical features represented by HPO concepts (c fI CF ), in addition to a bone dysplasia:

  • cf1HP:0008921 (Neonatal short-limb short stature)

  • cf2HP:0008905 (Rhizomelic short stature)

  • cf3HP:0000772 (Abnormality of the ribs)

  • cf4HP:0000774 (Narrow chest)

  • bd1BDO:Achondroplasia

Let us also consider three reference concepts (i.e., patients) p1, p2 and p3 and assume that the KB contains the following itemsets:

  • I( p 1 )={ I cf 1 ( p 1 ), I cf 3 ( p 1 ), bd 1 }

  • I( p 2 )={ I cf 1 ( p 2 ), I cf 4 ( p 2 ), bd 1 }

  • I( p 3 )={ I cf 2 ( p 3 ), I cf 3 ( p 3 ), bd 1 }

where I cf x ( p x )={ cf x |exhibits( p x , cf x )}. Our goal is to compute the support of the itemset I(p)={ I cf 1 (p), I cf 3 (p), bd 1 }. We can quickly observe that there is one patient instance that contains this pattern – i.e., p1. Since the total number of patient instances is 3, traditional support is then:

Support(I,P,KB)= 1 3 =0.33

However, a close look at cf1 and cf2 in HPO reveals that these concepts are fairly similar (they have a direct common ancestor in HP:0008873Disproportionate short-limb short stature), but not exactly the same. cf3 and cf4 are in a similar situation, with the parent of HP:0000774 (i.e., HP:0005257Thoracic hypoplasia) being a sibling of cf3. Unfortunately, traditional Support cannot leverage this semantic similarity information as it relies on exact matching. To overcome this issue, we propose an alternative set of semantic interestingness measures (Semantic Support, Semantic Confidence, etc.).

Semantic similarity of items

Our intuition is that by using semantic similarity measures on patient findings (i.e., HPO concepts) we are able to leverage and use the semantic relationships between phenotypes that cannot, otherwise, be acquired by typical data mining processes (due to their term-based matching process). As an example, if the background knowledge base lists HP:0000256 (Macrocephaly) as a phenotype of Achondroplasia and a new patient exhibits HP:0004439 (Craniofacial dysostosis), we want to use the semantic similarity value between the two concepts to associate the later to Achondroplasia with a certain probability. The semantic similarity between the concepts could be inferred, for example, via their most common ancestor – HP:0000929 (Abnormality of the skull). Such an association is not possible when employing a typical data mining process since each term would be considered individually and only in the context provided by the background knowledge base.

In principle, a good semantic similarity measure needs to take into account the specific aspects of the target domain. There are, nevertheless, a series of requirements – emerging also from the bone dysplasia domain and the structure of HPO – that are generally applicable:

  •  Given two HPO concepts, we consider them to be more similar if they are closer to each other (i.e., the path between them is shorter). E.g., HP:0004481 (Macrocephaly progressive) will be considered more similar to HP:0000256 (Macrocephaly) than HP:0004488 (Macrocephaly at Birth), because the distance between HP:0004481 and HP:0000256 is 1 whereas the distance between HP:0004481 and HP:0004488 is 2.

  •  Several strategies have been used in choosing the semantic similarity function. Li et al. [21], in their work on modelling and capturing semantic similarity in WordNet, have employed an exponent function to transfer the path length between concepts into a similarity value and have showed that the exponential measure significantly outperforms traditional similarity measures. Given that the design philosophy of HPO and WordNet are similar, we derive the similarity between two phenotypes as an exponent function of the path length between their corresponding HPO concepts. The same rationale is valid also for BDO.

  •  In order to be able to calculate the semantic interestingness measures, semantic similarity needs to take values between 0 to 1. At the same time, an exact match should be signalled by a semantic similarity value of 1.

  •  The semantic similarity value of two concepts should be dependent on the specificity of their LCA (i.e., its location in the overall hierarchy). More concretely, we consider the more specific LCA to be more informative. E.g., HP:0004439 (Craniofacial dysostosis) (as an LCA) should be considered more informative than HP:0000929 (Abnormality of the skull), which is in this case, is its direct parent.

In the following we describe a set of domain-oriented semantic similarity functions that satisfy the above-listed requirements.

Domain-specific semantic similarity measures. If i1 and i2 are two items, we define the semantic similarity between them as:

SemSim ( i 1 , i 2 ) = Dist ( LCA ( i 1 , i 2 ) , Root ) Dist ( i 1 , i 2 ) + Dist ( LCA ( i 1 , i 2 ) , Root )

where D i s t(L C A(i1,i2),R o o t) is the length of path from L C A(i1,i2) to the root and D i s t(i1,i2) is a distance measure between i1 and i2 that depends on the underlying types of the items.

If the items under scrutiny are phenotypes, we define D i s t(i1,i2) as shown in Eq. 10.

Dist( i 1 , i 2 )= 2 l x , if i 1 i 2 0 , if i 1 = i 2 root 1 , if i 1 = i 2 = root

where l x is the shortest path between i1 and i2. This formula determines the semantic similarity of two HPO terms based on both the distance between these terms and the location of their LCA in the HPO structure. It can also be observed that the larger the distance between the terms, the less similar they will be. Finally, if two concepts are the same but do not denote the root, the value of the function is 0, while if they do denote the root, the value of the function is 1, to avoid the division by 0 case.

In Eq. 10 the shortest path length is scaled by an exponential function to provide more weight to distance rather than depth. Furthermore, the base and the exponent of this power function aim to overemphasise the similarity between phenotypes when taking into account the HPO structure. Generally, this similarity decreases faster than the distance. For instance, the distance between Macrocephaly and Macrocephaly progressive is 1 and they are very similar, while the distance between Abnormality of Skull and Macrocephaly progressive is 3, with the former being much more generic and different to Macrocephaly progressive than any of the other macrocephalies.

Similar to the phenotype distance described above, if we consider two disorders using the Bone Dysplasia Ontology, we define the same D i s t(i1,i2) as shown in Eq. 11 – the semantic similarity equation remains unchanged (i.e., as per Eq. 9).

Dist( i 1 , i 2 )= 1 0 l x - 2 , if i 1 i 2 0 , if i 1 = i 2 root 1 , if i 1 = i 2 = root

where l x is again the shortest path between i1 and i2.

The rationale behind Eq. 11 is the same as for Eq. 10 (see above), with the remark that the overall similarity between disorders decays at an even higher rate (with the distance in BDO) because of their coarse grained nature, which has led to a fairly flat structure of the ontology. The structure of the ontology, and more concretely its maximum depth (i.e., 2), has influenced the constant (2) in the exponent of the formula (l x -2). The intuition is that concepts that belong to the same group, i.e., they are at the second level in the hierarchy and the distance between them is 2 (via the LCA), should receive the highest similarity, after the exact match.

Semantic support

Given a knowledge base and an itemset, our goal is to automatically derive a score that indicates the proportion of transactions in the knowledge base that contain the itemset at a semantic level, thus going beyond the exact matching methods traditionally used for this task. This needs to take into account the relations between items. We attempt to model the semantic support of an itemset as a function of the semantic similarity of the terms present in the knowledge base and the itemset.

If we consider a database T with n transactions {t1,t2,…,t n } and m items {i1,i2,…,i m }, Semantic Support of {i1,i2,…,i p } (pm) is calculated as follows:

SemSupport ( i 1 , i 2 , , i p ) = 1 n q = 1 n j = 1 p arg max v = 1 to | t q | | | SemSim ( i j , i v ) | |

The value of the Semantic Similarity (SemSim) ranges from 0 to 1 and so does the value of the Semantic Support.

Semantic interestingness measures

Semantic interestingness measures take into account how data items are semantically related. To do so, it makes use of the underlying structure of the ontology that hosts the corresponding items (e.g. generalisation, specialisation, etc). Hence, if we replace the traditional Support element in the confidence calculation with Semantic Support we get Semantic Confidence. The same process can be applied for the other well-known interestingness measures, such as lift, conviction, etc. Below we list the corresponding semantic calculation for these measures for an association rule I X Q.

SemConfidence ( I X Q ) = SemSupport ( I X , Q ) SemSupport ( I X )
SemLift ( I X Q ) = SemConfidence ( I X , Q ) SemSupport ( Q )
SemConviction ( I X Q ) = 1 - SemSupport ( Q ) 1 - SemConfidence ( I X Q )
SemLeverage ( I X Q ) = SemSupport ( I X , Q ) - SemSupport ( I X ) SemSupport ( Q )
SemJaccard ( I X Q ) = SemSupport ( I X , Q ) SemSupport ( I X ) + SemSupport ( Q ) - SemSupport ( I X , Q )
SemCosine ( I X Q ) = SemSupport ( I X , Q ) SemSupport ( I X ) SemSupport ( Q ) )
SemCorrelationCoeff ( I X Q ) = SemLeverage ( I X Q ) S Supp ( I X ) S Supp ( Q ) ( 1 - S Supp ( I X ) ( 1 - S Supp ( Q ) )

SSupp in Eq. 19 denotes Semantic Support.

Experimental design

We have carried out a series of experiments with the following goals:

  •  Firstly, we aim to analyse the accuracy of the resulting association rules when using existing traditional interestingness measures;

  •  Secondly, we are interested in finding out the same accuracy, but when using the proposed semantic interestingness measures;

  •  Finally, we aim to observe the difference between the accuracies produced via the two methods.

The quality of discovered rules depends on their ability to determine the correct diagnosis. To measure accuracy, we have employed a voting strategy, which is described below.

The purpose of evaluating the discovered rules is to understand the utility of the interestingness measures. Voting allows all firing association rules to contribute to the final prediction. This strategy combines the associations K F(p x ) that fire upon a new patient case p x . A simple voting strategy considers all the rules in K F(p x ), groups the rules by antecedent, and for each antecedent I X obtains the class corresponding to the rule with highest confidence. We will denote the class voted by an antecedent I i with a binary function v o t e(I i ,b d) that takes the value 1 when I i votes for disorder bd, and 0 for the any other class – {bd n 1,bd2,…,bd n }B D represent a set of bone dysplasias. The disorder that receives the maximum vote is the most probable diagnosis for patient case x.

TotalVote( bd i )= I i antecedents ( KF ( p x ) ) Vote( I i , bd i )

Weighted voting is similar to simple voting, however, each vote is multiplied by a factor that quantifies the quality of the vote. In the case of association rules, this can be done using one of the above defined measures.

TotalVote ( bd i ) = I i antecedents ( KF ( p x ) ) Vote ( I i , bd i ) QVote ( I i , bd i )

In our case, Q V o t e(I i ,bd i ) is the quality of vote, or more concretely the maximum interestingness of that particular antecedent group.

We have performed individual experiments for each of the interestingness measures previously described, using the voting strategy. To assess their efficiency, we have calculated the overall accuracy of the discovered association rules. In all experiments, we compute the prediction accuracy as the overall percentage of correctly predicted disorders at a given recall cut-off point (i.e., by taking into account only the top K predictions for different values of K, where K is the recall cut-off point). Hence, a success represents a correctly predicted disorder (the exact same, and not a sub or super class of it), while a miss represents an incorrectly predicted disorder. If N is the total number of test cases and C P is the number of correctly predicted disorders, then A c c u r a c y=C P /N. This is expressed in percentages in Tables 1, 2 and 3 in the Results section.

Table 1 Experimental results on finding the quality of association rules, discovered using traditional interestingness measures
Table 2 Experimental results on finding the quality of association rules, discovered using semantic interestingness measures
Table 3 Comparative overview of the experimental results achieved by the traditional and semantic interestingness measures

As mentioned earlier in the manuscript our annotated dataset consisted of over 300 patience cases, with the clinical features annotated using HPO and the disorders using BDO. In order to provide an accurate view over the prediction of the discovered rules, each experiment has been performed as a 5-fold cross validation with an 80-20 split (80% knowledge base, 20% test data). Tables 1, 2 and 3 lists the resulted average accuracy at five different recall cut-off points.

Within each experiment, we have used a relatively low minimum Support of 5/N, where N is the total number of cases, because we are interested in extracting both frequent and occasional associations. Every rule was able to contribute to the voting. Controlling the number of rules using any minimum interestingness threshold can bias the voting and hence, the overall result. Consequently, we have not used this parameter to control the number of rules. Finally, we have used a maximum itemset size of 10 as the computational cost increases exponentially with the itemset size in the association rule mining process.


In this section we present and discuss the experimental results achieved using traditional and semantic interestingness measures. We start with the semantic similarity proposed in the previous sections and then compare its results against a series of classic semantic similarity measures.

Proposed semantic similarity metric

In order to observe the quality improvements brought by semantic interestingness measures over the traditional ones, we have evaluated the discovered rules against real world patient data. As already mentioned, we performed two sets of experiments. Firstly, we have compared and evaluated different traditional interestingness measures. Then, we performed the same experiment but by using semantic interestingness measures. This has enabled us to perform an overall comparison between the two types of measures.

Table 1 lists the experimental results for the traditional measures. A first observation is that Confidence has the overall best behaviour. At any recall cut-off point greater than 2 (K > 1) Confidence outperforms or scores similarly to the other measures. For example, it achieves an accuracy of 46.58% for K = 2 and 53.42% for K = 3, both with 1.37% higher than the second scoring measure, Jaccard. The only exception appears for K = 1, where Jaccard outperforms Confidence by 2.74%. A second, interesting, observation is that with the increase in the recall cut-off point, the measures reach a common ground, and hence, achieve the same performance – for K = 5, six of the seven measures score the same accuracy (57.53%).

Each of the measures we have considered in our experiments studies certain properties of the data. Consequently, the above-listed results enable us to reach a better understanding of the underlying nature of the relationships manifested by the data in our bone dysplasia annotated dataset. For example, Confidence measures the level of causality (implication), while Jaccard measures the degree of overlap among the given sets, or in our cases patient phenotypes. This leads to the conclusion that the bone dysplasia data seems to be governed more by causality and overlap, rather than, for example, co-occurrence, which is described by Lift.

Table 2 lists the experimental results for the semantic interestingness measures. We can easily observe that the results follow the same trend as in the previous experiment. Semantic Confidence has, again, an overall best behaviour for K > 1, outperforming Semantic Jaccard with 1.37% for K = 2 (49.32%) and K = 3 (57.53%) and achieving the same accuracy for K = 4 (61.64%) and K = 5 (64.38%). Semantic Jaccard achieves a better accuracy for K = 1, i.e., 34.25%, with 2.74% higher than Semantic Confidence. Finally, as in the previous experiment, we observe that the increase in the recall cut-off point leads to a more uniform accuracy across all measures, although slightly less aligned as they do not achieve the exact same accuracy.

A comparative overview of the two types of measures is presented in Table 3, where we can observe that semantic measures achieve better results than the traditional ones. Furthermore, the increase in the recall cut-off point leads to a bigger difference in accuracy, from 2.74% for K = 1 to 6.85% for K = 5.

The main reason behind the increase in accuracy is the use of similarity matching between terms. For instance, an ESDN patient diagnosed with Achondroplasia had the following phenotypes: Rhizomelic short stature, Muscular hypotonia, Hypoplasia involving bones of the extremities and Malar flattening. The classifier using traditional confidence measures was not able to classify correctly this case, while the classifier using semantic confidence did. The semantic similarity employed by the latter found an association between Rhizomelic short stature and Achondroplasia based on the more generic Short stature phenotype, which is common in Achondroplasia. This represents a clear example where the exact matching used by traditional classifiers fails. Another similar instance was in the case of a MED patient that exhibited the following phenotypes: Pes planus (i.e., flat feet), Rhizomelic shortening and Frontal bossing. As in the previous example, the classifier using traditional confidence failed to classify this instance correctly, while the one using semantic confidence did, based on the semantic similarity between Pes planus and the diverse feet abnormalities that characterise MED.

In order to have an accurate view over the classification results, we have checked the statistical significance of the increase in accuracy at recall cut-off point 5. The purpose of this statistical significance testing was to assess the performance of the classification using semantic rules against the performance of the classification using traditional rules, both on the ESDN dataset. Such a test would validate the observed increase in accuracy of 6.85% and would show that it has not been obtained by chance.

Since the comparison is between two different approaches on a single domain (skeletal dysplasias), we have used the McNemar’s Chi-squared test with continuity correction [22]. The null hypothesis was that the number of patient cases correctly classified by the classifier using semantic confidence but not by the one using traditional confidence is equal to the number of patient cases correctly classified by the classifier using traditional confidence but not by the one using semantic confidence. Table 4 shows the distribution of the 394 patient cases used in our experimental classification setting: (i) 205 patient cases were correctly classified by both classifiers; (ii) 118 patient cases were misclassified by both classifiers; (iii) 51 patient cases were correctly classified using semantic confidence; and (iv) 20 patient cases were correctly classified using traditional confidence. From this data, the McNemar test statistic with continuity correction is:

Table 4 Distribution of classification results in the McNemar’s statistical significance test
χ McNemar 2 = ( | 51 - 20 | - 1 ) 2 51 + 20 = 12.67

A McNemar test value of 12.67 corresponds to a p-value of 0.00037157, which provides strong evidence to reject the null hypothesis. We can, hence, conclude that the semantic interestingness measures we have proposed are able, with the help of the underlying domain ontologies, to take advantage of the similarity matching between the terms in the skeletal dysplasia domain.

Classic semantic similarity metrics

In order to understand the role carried by the semantic similarity metric in the classification based on semantic interestingness we have experimented with three classic semantic similarities, defined earlier in the paper: Resnik, Lin and Wu & Palmer. The results achieved by each of these metrics are discussed below.

Table 5 lists the experimental results achieved by the semantic interestingness measures employing Resnik as semantic similarity. A first observation is that all measures have performed uniformly, while from a comparative perspective, they performed worse than exact matching and our proposed semantic similarity method. As in the previous experiments, we observe that the increase in the recall cut-off point leads to a more uniform accuracy across all measures. The Resnik semantic similarity method is primarily dependent on the frequency of the most informative common ancestors. If any of the ancestors does not exist in the corpus, the similarity value becomes infinity, i.e., the concepts under scrutiny are completely dissimilar. In the case of our dataset, this is the main issue behind the failure of the Resnik semantic similarity – being a real-world dataset, most patient cases will feature concrete (very specific) phenotypes, while common ancestors represent more generic/abstract concepts rarely found in clinical summaries. For example, the semantic similarity of Dolichocephaly and Full cheeks is , due to the fact that the frequency of all their ancestors (Abnormality of the head, Abnormality of head and neck and Phenotype abnormality) in the patient cases is 0.

Table 5 Experimental results on finding the quality of association rules discovered using semantic Interestingness measures that employed Resnik as semantic similarity method

The experimental results for the semantic interestingness measures using the second semantic similarity – Lin – have led 0% accuracy on all measures and all five recall cut-off points – consequently we have have included them in a table. As in the case of Resnik, Lin is also heavily dependent on the IC of the common ancestors, and hence suffers from the same issue discussed above. Another problematic aspect of the Lin measure is that, in the context of the ESDN data, it assigns higher similarity values to partial matches than to exact matches. A similarity value of 1 is achieved when the concepts being measured are the exact same – e.g., Short long bones. However, when the concepts are different and any of their ancestors is present in the underlying corpus, the similarity value will, usually, be greater than 1. This is because the frequency of the ancestors (more abstract concepts) will be less than the frequency of the actual concepts and IC is inversely proportional to frequency.

For instance, the semantic similarity value between Macrocephaly and Hypoplasia involving bones of the extremities is 2.19 because the frequency of their most informative common ancestor – Abnormality of the skeletal system is less than that of both concepts. The latter occurs only 5 times in the corpus whereas Macrocephaly and Hypoplasia involving bones of the extremities occur 41 and 70 times, respectively. The Resnik measure is able to avoid this issue by treating exact and partial matches in the same manner – i.e., directly and only via the IC of the most informative common ancestor and not by further diving it by the IC of the actual concepts. In an ideal scenario, exact matches should assign higher similarity values that partial matches.

Finally, Table 6 lists the experimental results for the semantic interestingness measures using the last semantic similarity – Wu & Palmer. We can observe that the results follow fairly closely the trend present in our experiments with the traditional interestingness measures and the semantic interestingness measures employing our proposed metric. Similarly to those results, there is an increase in accuracy with the increase in the recall cut-off point, which also leads to a more uniform accuracy across all measures. Semantic Confidence has an overall best behaviour for K > 1, while Semantic Leverage achieves a better accuracy for K = 1, i.e., 23.29%, with 2.74% higher than Semantic Confidence.

Table 6 Experimental results on finding the quality of association rules discovered using semantic Interestingness measures that employed Wu & Palmer as semantic similarity method

The Wu & Palmer similarity score ranges between 0 and 1, with 1 denoting an exact match and the rest of the values being assigned based on the depth in the hierarchy and distance between the concepts. This is the main reason behind its good performance – i.e., it uses only structural distances instead of information content. It is, however, biased more towards depth than the actual distance between concepts, or more concretely it is influenced by the depth of the common ancestor of the concepts. In the case of out dataset, and using HPO as background knowledge, this represents an issue because most common ancestors are located at fairly uniform depths (due to the inherent specificity of the terms) and, as such, do not provide enough variety for the final similarity score.

In conclusion, none of the classic semantic similarities perform better than the approach we have proposed: node-based similarities are heavily influenced by the presence, or more precisely absence, of the common ancestor in the dataset (which leads to complete dissimilarity), while the edge-based similarity we have experimented with focuses more on the depth of the common ancestor, as opposed to the distance between the concepts, which is more appropriate given our dataset and background knowledge.

Discussion and conclusions

Main findings

In conclusion, based on the annotated bone dysplasia dataset, Confidence appears to be the best interestingness measure regardless of way in which is computed, i.e., traditional or semantic. The use of semantics provides a marginal, but consistent, improvement in accuracy over traditional measures. Since the semantic similarity relies on the structure of the underlying ontology, this improvement is heavily dependent on the reflection provided by the domain ontology over the real domain knowledge.

Limitations and generalisation

Every domain is governed by a set of rules. A good semantic similarity measure needs to take into account the rules of the target domain. In our case, we have proposed and used two particular similarity measures, one tailored on the knowledge externalised by HPO and one on the structure of bone dysplasias, provided by BDO. These semantic similarity measures are not necessarily directly applicable to other domains. Consequently, while the definition of semantic support is generic, in order to apply our approach in a different domain, an investigation is required to determine the most appropriate semantic similarity for that domain.

Related work

The literature contains a number of studies on using association rule mining to identify relationships among medical attributes using biomedical ontologies [2326]. Kumar et al. [23] used association rules to indicate dependence relationships between Gene Ontology terms using an annotation dataset and background knowledge. Myhre et al. [24], on the other hand, have focused entirely on proposing an additional gene ontology layer via discovering cross-ontology association rules from GO annotations. However, none of these approaches use the biomedical ontologies and, in particular, their hierarchical structure to compute interestingness measures. Another set of existing research on applying association rule mining to biomedical ontologies includes studies on mining single level, multi-level and cross-ontology association rules [2729]. Carmona-Saez et al. [27], for example, mine single level associations between GO annotations and expressed genes from microarray data integrated with GO annotation information. However, as in the previous case, the inherent information provided by the ontology structure is not considered when computing the interestingness measures, and hence limit, to some extent, the knowledge discovered.

Interestingness measures play an essential role by reducing the number of discovered rules and retaining only those with the best utility, in a post-processing step. Different rule interestingness measures have different qualities or flaws. There is no optimal measure and one way to solve this challenge is to try to find a good compromise. Research has been performed on finding optimal measures for different datasets [8, 9], but by taking into account only traditional interestingness measures.

In summary, prior efforts in association rule mining applied to datasets annotated with biomedical ontology concepts focus on mining normal, cross-ontology and multi-level association rules, but leave out the use of the semantic relationships between the target concepts from the computation of the interestingness measures.


Concepts defined and described by biomedical ontologies, e.g., the Human Phenotype Ontology, enable us to compare medical terms at a semantic level – a comparison that is otherwise not possible. Our research has focused on the use of semantic relationships between patient phenotypes, annotated by HPO entities, in the process of mining association rules. In this manuscript, we have proposed a method that integrates concept similarity metrics into the computation of traditional interestingness measures, with application to finding association rules in the bone dysplasia domain. This method has been applied on an annotated patient dataset and used domain-specific semantic similarities.

Experimental results have led to the conclusion that, for our domain, Confidence is the most accurate measure, independently on the underlying computation method, i.e., traditional or semantic. On the other hand, Semantic Confidence was able to take advantage of structure of the domain ontologies and of the custom semantic similarity to achieve better results (up to 6.85% better accuracy over the traditional Confidence). In conclusion, these results suggest that, given an appropriate domain-specific ontology, semantic similarities are able to improve the efficiency of traditional interestingness measures in the association rule discovery process, hence enabling a valuable semantic interestingness measures framework.


  1. Whetzel PL, Noy NF, Shah NH, Alexander PR, Nyulas C, Tudorache T, Musen MA:BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications. Nucleic Acids Res. 2011, 39 (Web Server issue): W541-W545.

    Article  Google Scholar 

  2. Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Shah N, Whetzel PL, Lewis S, The OBI Consortium:The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nature Biotechnol. 2007, 25: 1251-1255. 10.1038/nbt1346.

    Article  Google Scholar 

  3. Robinson PN, Kohler S, Bauer S, Seelow D, Horn D, Mundlos S:The human phenotype ontology: a tool for annotating and analyzing human hereditary disease. Am J Human Genetics. 2008, 83 (5): 610-615. 10.1016/j.ajhg.2008.09.017.

    Article  Google Scholar 

  4. Pesquita C, Faria D, Falcao A, Lord P, Couto F:Semantic similarity in biomedical ontologies. PLoS Comput Biol. 2009, 5 (7): e1000443-10.1371/journal.pcbi.1000443.

    Article  MathSciNet  Google Scholar 

  5. Batet M, Sanchez D, Valls A:An ontology-based measure to compute semantic similarity in biomedicine. J Biomed Inform. 2011, 44: 118-125. 10.1016/j.jbi.2010.09.002.

    Article  Google Scholar 

  6. Paul R, Groza T, Zankl A, Hunter J: Semantic similarity-driven decision support in the skeletal dysplasia domain. Proc. of the 11th International Semantic Web Conference (ISWC 2012). 2012, Boston, MA, USA: Springer: Berlin Heidelberg, 164-179.

    Chapter  Google Scholar 

  7. Agrawal R, Srikant R:Fast algorithms for mining association rules in large databases. Proceedings of the 20th International Conference on Very Large Data Bases. 1994, San Francisco, CA, USA: Santiago de Chile, Chile; Morgan Kaufmann, 487-499.

    Google Scholar 

  8. Azevedo PJ, Jorge AM: Comparing rule measures for predictive association rules. Proceedings of the 18th European conference on Machine Learning. 2007, Warsaw, Poland: Springer: Berlin Heidelberg, 510-517.

    Google Scholar 

  9. Lenca P, Vaillant B, Meyer P, Lallich S:Association rule interestingness measures: experimental and theoretical studies. Quality Measures in Data Mining. Edited by: Guillet F, Hamilton JH. 2007, Springer: Berlin Heidelberg, 51-76.

    Chapter  Google Scholar 

  10. Geng L, Hamilton JH:Interestingness measures for data mining: A survey. ACM Comput Surv. 2006, 38 (3): 610-615.

    Article  Google Scholar 

  11. Groza T, Zankl A, Li YF, Hunter J: Using semantic web technologies to build a community-driven knowledge curation platform for the skeletal dysplasia domain. Proc. of the 10th International Semantic Web Conference (ISWC 2011). 2011, Bonn, Germany: Springer: Berlin Heidelberg, 81-96.

    Chapter  Google Scholar 

  12. Groza T, Hunter J, Zankl A:The Bone Dysplasia Ontology: integrating genotype and phenotype information in the skeletal dysplasia domain. BMC Bioinformatics. 2012, 13: 50-10.1186/1471-2105-13-50.

    Article  Google Scholar 

  13. Paul R, Groza T, Hunter J, Zankl A:Decision support methods for finding phenotype–disorder associations in the bone dysplasia domain. PLoS One. 2012, 7 (11): e50614-10.1371/journal.pone.0050614.

    Article  Google Scholar 

  14. Warman ML, Cormier-Daire V, Hall C, Krakow D, Lachman R, LeMerrer M, Mortier G, Mundlos S, Nishimura G, Rimoin DL, Robertson S, Savarirayan R, Sillence D, Spranger J, Unger S, Zabel B, Superti-Furga A:Nosology and classification of genetic skeletal Disorders: 2010 revision. Am J Med Genet Part A. 2011, 155 (5): 943-968. 10.1002/ajmg.a.33909.

    Article  Google Scholar 

  15. The Gene Ontology Consortium:Gene ontology: tool for the unification of biology. Nat Genet. 2000, 25: 25-29. 10.1038/75556.

    Article  Google Scholar 

  16. Resnik P:Using information content to evaluate semantic similarity in a taxonomy. Proc. of the 14th International Joint Conference on Artificial Intelligence. 1995, Montreal, Quebec, Canada; Morgan Kaufmann: San Francisco, CA, USA, 448-453.

    Google Scholar 

  17. Lin D:An information-theoretic definition of similarity. Proc. of the 15th International Conference on Machine Learning. 1998, Madison, Wisconsin, USA; Morgan Kaufmann: San Francisco, CA, USA, 296-304.

    Google Scholar 

  18. Jiang JJ, Conrath DW:Semantic similarity based on corpus statistics and lexical taxonomy. Proc. of the 10th International Conference on Research in Computational Linguistics. 1997, Taiwan; ACL: Stroudsburg, PA, USA, 19-33.

    Google Scholar 

  19. Wu Z, Palmer M:Verb semantics and lexicon selection. Proc. of the 32nd Annual Meeting on Association for Computational Linguistics. 1994, Las Cruces, New Mexico, USA; ACL: Stroudsburg, PA, USA, 133-138.

    Chapter  Google Scholar 

  20. Jonquet C, Shah NH, Musen MA:The open biomedical annotator. Proc. of the 2010 AMIA Summit of Translational Bioinformatics. 2010, San Francisco, California, USA; AMIA: Bethesda, MD, USA, 56-60.

    Google Scholar 

  21. Li Y, Bandar ZA, McLean D:An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans Knowl Data Eng. 2003, 15 (4): 871-882. 10.1109/TKDE.2003.1209005.

    Article  Google Scholar 

  22. Japkowicz N, Shah M: Evaluating Learning Algorithms: A Classification Perspective. 2011, New York: Cambridge University Press,

    Book  Google Scholar 

  23. Kumar A, Smith B, Borgelt C:Dependence Relationships between Gene Ontology Terms based on TIGR Gene Product Annotations. Proc. of the 3rd International Workshop on Computational Terminology. 2004, Geneva, Switzerland; ACL: Stroudsburg, PA, USA, 31-38.

    Google Scholar 

  24. Myhre S, Tveit H, Mollestad T, Laegreid A:Additional gene ontology structure for improved biological reasoning. Bioinformatics. 2006, 22 (16): 2020-2027. 10.1093/bioinformatics/btl334.

    Article  Google Scholar 

  25. Hemert JV, Baldock R:Mining spatial gene expression data for association rules. Proc. of the 1st International Conference on Bioinformatics Research and Development. 2007, Berlin, Germany; Springer: Berlin Heidelberg, 66-76.

    Chapter  Google Scholar 

  26. Creighton C, Hanash S:Mining gene expression databases for association rules. Bioinformatics. 2003, 19: 79-86. 10.1093/bioinformatics/19.1.79.

    Article  Google Scholar 

  27. Carmona-Saez P, Chagoyen M, Rodriguez A, Trelles O, Carazo JM, Pascual-Montano A:Integrated analysis of gene expression by association rules discovery. BMC Bioinformatics. 2006, 7: 54-10.1186/1471-2105-7-54.

    Article  Google Scholar 

  28. Tseng VS, Yu HH, Yang SC:Efficient mining of multilevel gene association rules from microarray and gene ontology. J Inform Syst Front. 2009, 11 (4): 433-447. 10.1007/s10796-009-9156-1.

    Article  Google Scholar 

  29. Manda P, Ozkan S, Wang H, McCarthy F, Bridges SM:Cross-ontology multi-level association rule mining in the gene ontology. PLoS One. 2012, 7 (10): e47411-10.1371/journal.pone.0047411.

    Article  Google Scholar 

Download references


We gratefully acknowledge the editor and anonymous reviewers whose comments and advices have helped us improve our manuscript. This research is funded by the Australian Research Council (ARC) under the Linkage grant SKELETOME – LP100100156 and the Discovery Early Career Researcher Award (DECRA) – DE120100508.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Tudor Groza.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

JH and AZ formulated the basic idea behind SKELETOME. JH coordinates the project. TG leads the development of the project. RP and TG designed the experiments. RP run the experiments. RP and TG analysed the experimental results. AZ provided the domain expertise. RP and TG wrote the manuscript. JH and AZ edited the manuscript. All authors read and approved the final manuscript.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Paul, R., Groza, T., Hunter, J. et al. Semantic interestingness measures for discovering association rules in the skeletal dysplasia domain. J Biomed Semant 5, 8 (2014).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Association Rule
  • Semantic Similarity
  • Skeletal Dysplasia
  • Achondroplasia
  • Interestingness Measure