A rule-based ontological framework for the classification of molecules
© Magka et al.; licensee BioMed Central Ltd. 2014
Received: 7 May 2013
Accepted: 15 January 2014
Published: 15 April 2014
A variety of key activities within life sciences research involves integrating and intelligently managing large amounts of biochemical information. Semantic technologies provide an intuitive way to organise and sift through these rapidly growing datasets via the design and maintenance of ontology-supported knowledge bases. To this end, OWL—a W3C standard declarative language— has been extensively used in the deployment of biochemical ontologies that can be conveniently organised using the classification facilities of OWL-based tools. One of the most established ontologies for the chemical domain is ChEBI, an open-access dictionary of molecular entities that supplies high quality annotation and taxonomical information for biologically relevant compounds. However, ChEBI is being manually expanded which hinders its potential to grow due to the limited availability of human resources.
In this work, we describe a prototype that performs automatic classification of chemical compounds. The software we present implements a sound and complete reasoning procedure of a formalism that extends datalog and builds upon an off-the-shelf deductive database system. We capture a wide range of chemical classes that are not expressible with OWL-based formalisms such as cyclic molecules, saturated molecules and alkanes. Furthermore, we describe a surface ‘less-logician-like’ syntax that allows application experts to create ontological descriptions of complex biochemical objects without prior knowledge of logic. In terms of performance, a noticeable improvement is observed in comparison with previous approaches. Our evaluation has discovered subsumptions that are missing from the manually curated ChEBI ontology as well as discrepancies with respect to existing subclass relations. We illustrate thus the potential of an ontology language suitable for the life sciences domain that exhibits a favourable balance between expressive power and practical feasibility.
Our proposed methodology can form the basis of an ontology-mediated application to assist biocurators in the production of complete and error-free taxonomies. Moreover, such a tool could contribute to a more rapid development of the ChEBI ontology and to the efforts of the ChEBI team to make annotated chemical datasets available to the public. From a modelling point of view, our approach could stimulate the adoption of a different and expressive reasoning paradigm based on rules for which state-of-the-art and highly optimised reasoners are available; it could thus pave the way for the representation of a broader spectrum of life sciences and biomedical knowledge.
KeywordsSemantic technologies Knowledge representation and reasoning Logic programming and answer set programming Datalog extensions Cheminformatics
Life sciences data generated by research laboratories worldwide is increasing at an astonishing rate turning the need to adequately catalogue, represent and index the rapidly accumulating bioinformatics resources into a pressing challenge. Semantic technologies have achieved significant progress towards the federation of biochemical information via the definition and use of domain vocabularies with formal semantics, also known as ontologies[1–3]. OWL , a family of logic-based knowledge representation (KR) formalisms standardised by the W3C, has played a pivotal role in the advent of Semantic technologies. This is to a great extent thanks to the availability of robust OWL-based tools that are capable of deriving knowledge that is not explicitly stated by means of logical inference. In particular, OWL bio- and chemo-ontologies with their intuitive hierarchical structure and their formal semantics are widely used for the building of life sciences terminologies [5, 6].
Taxonomies provide a compelling way of aggregating information, as hierarchically organised knowledge is more accessible to humans. This is evidenced, e.g. by the pervasive use of the periodic table in chemistry, one of the longest-standing and most widely adopted classification schemes in natural sciences. Organising a large number of different objects into meaningful groups facilitates the discovery of significant properties pertaining to that group; these discoveries can then be used to predict features of subsequently detected members of the group. For instance, esters with low molecular weight tend to be more volatile and, so, a newly found ester with low weight is expected to be highly volatile, too. As a consequence, classifying objects on the basis of shared characteristics is a central task in areas such as biology and chemistry with a long tradition of taxonomy use. Due to the availability of performant OWL reasoners, life scientists can employ OWL to represent expert human knowledge and thus drive fast, automatic and repeatable classification processes that produce high quality hierarchies [7, 8]. Nevertheless, a prerequisite is that OWL is expressive enough to model the entities that need to be classified as well as the properties of the superclasses that lie higher up in the hierarchy.
Two main restrictions have been identified in the expressive power of OWL as hindering factors for the representation of biological knowledge [9, 10]. First, due to the tree-model property of OWL  (which otherwise accounts for the robust computational properties of the language) one is not able to describe cyclic structures with adequate precision. Second, because of the open-world assumption adopted in OWL (according to which missing information is treated as not known rather than false) it is difficult to define classes based on the absence of certain characteristics. These limitations manifest themselves—among others—via the inability to define a broad range of classes in the chemical domain. For instance, one cannot effectively encode in OWL the class of compounds that contain a benzene ring or the class of molecules that do not contain carbon atoms, i.e. inorganic molecules.
These inadequacies obstruct the full automation of the classification process for chemical ontologies, such as the ChEBI (Ch emical E ntities of B iological I nterest) ontology, an open-access dictionary of molecular entities that provides high quality annotation and taxonomical information for chemical compounds . ChEBI fosters interoperability between researchers by acting as the primary chemical annotation resource for various biological databases such as BioModels , Reactome  and the Gene Ontology . Moreover, ChEBI supports numerous tasks of biochemical knowledge discovery such as the study of metabolic networks, identification of disease pathways and pharmaceutical design [14, 15]. ChEBI is manually curated by human experts who annotate and check the validity of existing and new molecular entries. Currently, ChEBI describes 36,660 fully annotated entities (release 110) and grows at a rate of approximately 4,500 entities per year (estimate based on previous releases ). Given the size of other publicly available chemical databases, such as PubChem  that contains records for 19 million molecules, there is clearly a strong potential for ChEBI to expand by speeding up curating tasks. ChEBI curating tasks span a wide range of activities such as adding natural language definitions and structure information or classifying chemical entities by determining their position in the ChEBI taxonomy. Thus automating chemical classification could free up human resources and accelerate the addition of new entries to ChEBI.
As the classification of compounds is a key task of the drug development process , the construction of chemical hierarchies has been the topic of various investigations capitalising on logic-based KR [19–23], statistical machine learning (ML) [24–26] and algorithmic [27–29] techniques. In KR approaches, molecule and class descriptions are represented with logical axioms crafted by experts and subsumptions are identified with the help of automated reasoning algorithms; in ML approaches a set of annotated data is used to train a system and the system is then employed to classify new entries. So, KR approaches are based on the explicit axiomatisation of knowledge, whereas ML algorithms specify for new entries superclasses that are highly probable to be correct. As a consequence, the taxonomies produced using logic-based techniques are provably correct (as long as the modelling of the domain knowledge is faithful), but the statistically produced hierarchies (although much faster) need to be evaluated against a curated gold standard. Algorithmic techniques involve the definition of imperative procedures for determining classes of molecules. These approaches are usually much quicker than logic-based techniques but have the disadvantage of requiring a programmer for defining new classes or for modifying the existing ones, as opposed to ontological knowledge bases that can be manipulated and extended by non-programmers. Here, we focus on logic-based chemical classification, which in certain cases can complement statistical and algorithmic approaches [8, 15].
In previous work, we laid the theoretical foundation of nonmonotonic existential rules which is an expressive ontology language that is sound and complete and that is suitable for the representation of graph-shaped objects; additionally, we demonstrated how nonmonotonic existential rules can be applied to the classification of molecules . The aforementioned formalism addressed the expressivity limitations outlined above; however, the performance of the implementation—although faster than previous approaches—was not satisfactory (more than 7 minutes were needed to classify 70 molecules under 5 chemical classes on a standard desktop computer) failing thus to confirm practicability of the formalism.
We present a prototype that performs logic-based chemical classification based on a sound, complete and terminating reasoning algorithm; we model more than 50 chemical classes and we show that the superclasses of 500 molecules are computed in 33 seconds.
We harness the expressive power of nonmonotonic existential rules to axiomatise a variety of chemical classes such as classes based on the containment of functional groups (e.g. esters) and on the exact cardinality of parts (e.g. dicarboxylic acids), classes depending on the overall atomic constitution (e.g. hydrocarbons) and cyclicity-related classes (e.g. compounds containing a cycle of arbitrary length or alkanes).
We present a surface syntax that enables application experts to create ontological description of chemical entities without prior knowledge of logic. The syntax we propose is closer to natural language than to first-order logic notation and is uniquely translatable to logical axioms.
We exhibit a significant speedup in comparison with previous ontology-based chemical classification implementations.
We identify examples of missing and contradictory subsumptions from the expert curated ChEBI ontology that are present and absent, respectively, from the hierarchy computed by our prototype.
Concerning future benefits, our prototype could form the basis of an ontology-mediated application to assist biocurators of ChEBI towards the sanitisation and the enrichment of the existing chemical taxonomy. Automating the maintenance and expansion of ChEBI taxonomy could contribute to a more rapid development of the ChEBI ontology and to the efforts of the ChEBI team to make annotated chemical datasets available to the public. From a modelling point of view, our approach could stimulate the adoption of a different and expressive reasoning paradigm based on rules for which state-of-the-art and highly optimised reasoners are available; it could thus pave the way for the representation of a broader spectrum of life sciences knowledge.
Knowledge base design
The reasoning task carried out using our methodology is the identification of chemical classes for molecules, e.g. assigning water to the class of inorganic molecules or benzene to cyclic molecules. In this section we provide a high-level description of the knowledge base (KB) we built for the purposes of our chemical classification experiments. We use the word ‘classification’ to refer to the detection of subsumptions between molecules and chemical classes rather than to the computation of the partial order for the set comprising the chemical classes and molecules w.r.t. the subclass relation. The KB consists of nonmonotonic existential rules that formally describe molecular structures and chemical classes; this representation can subsequently be used to determine the chemical class subsumers of each molecule. For a formal definition of syntax and semantics of nonmonotonic existential rules as well as decidability proofs, we refer the interested reader to the relevant articles [9, 30, 31].
For each chemical entity that we model using rules, we also provide its axiomatisation in the surface syntax—a less-logician-like syntax which we designed and which enables the ontological description of structured objects without the use of logic. Our surface syntax is in the same style of the Manchester OWL syntax  and draws inspiration from a syntax suggested for OWL 2 rules . The main motivation for designing this syntax is to provide a means for creating ontological descriptions in a more succinct way and without the use of special symbols. We have formally defined the surface syntax and its translation into nonmonotonic existential rules, but we have not implemented an ontology editor that would allow to write axioms in the new syntax. Similarly, we have not conducted experiments evaluating the use of surface syntax by application experts, but given that the Manchester OWL syntax has been well received by non-logicians  and there is active development of tools for supporting more human readable ontology query languages , we believe that the suggested syntax has the potential to facilitate curating tasks. Since our main focus is to illustrate the transformation of molecular graphs and chemical class definitions into rules, we omit the technical details and describe our methodology by means of running examples. For a complete specification of the surface syntax including a BNF grammar and mappings to nonmonotonic existential rules we provide an online technical report .
The rule above is a typical first-order implication with a single atomic formula in the body and a conjunction of atomic formulae in the head. Informally, the rule ensures that every time that the ascorbic acid molecule instantiated, its structure is unfolded according to its specified DG. Thus, triggering of the rule implies that (i) new terms that correspond to the DG’s nodes are generated (excluding node 0), e.g. f1( x ) represents atom node 1 (ii) each new term is typed according to the label of the relevant node with the help of a unary atomic formula (e.g. o ( f1( x ) )) and (iii) each pair of terms with corresponding nodes connected in the DG is assigned the respective label with the help of a binary atomic formula (e.g. single ( f1( x ) , f7( x ) )). In order to ensure disjointness of the several molecular structures on the interpretation level, distinct function symbols are used in the rule of each molecule.
General chemical knowledge and chemical classes
For our experiments, we represented 51 chemical classes using rules; we based our chemical modelling on the textual definitions found in the ChEBI ontology .
We covered a diverse range of classes that can be categorised into four groups. For each class that we discuss, we provide the surface syntax definition and its corresponding translation into one or more rules. Certain classes with an intricate definition (such as the class of cyclic molecules that appears later) are not expressible in surface syntax; these can be directly added as rules. Here we show in full detail only a sample of the rules; the complete set of rules is available in Additional files 1, 2 and 3.
Existence of subcomponents
One can find below the corresponding translations into rules. We define as carbon molecular entities the molecules that contain carbon; polyatomic entities are the entities that contain at least two different atoms. Heteroorganic entities are the ones containing carbon atoms bonded to non-carbon atoms. Carboxylic acids are defined as molecules containing at least one carboxy group (a functional group with formula C(=O)OH) attached to a carbon or hydrogen; due to the implicit hydrogens assumption we are not able to distinguish between an oxygen and a hydroxy group and, so, we need to specify that the oxygen of the hydroxy group is not charged (NOT charged) and participates to only one bond (NOT middleOxygen). Similarly, carboxylic esters contain a carbonyl group connected to an oxygen ((C=O)O) which is further attached to two atoms that are carbon or hydrogen.
Exact cardinality of parts
Determining subclass relations
Finally, we demonstrate how meaningful subsumptions can be derived using a KB containing the rules outlined in the previous two sections. In order to determine the superclasses of a certain molecule, we extend the KB with a suitable fact (i.e., a variable-free atomic formula) and we examine the model that satisfies the KB under the stable model semantics (the addition of the fact and the examination of the model is done automatically by our implementation). A formal definition of the stable model semantics is provided by Gelfond and Lifschitz . Intuitively, the stable model of a KB is the minimal set of facts that are derived by exhaustively applying the existing rules under a particular rule order; a rule is applied if its positive body can be matched to the so far derived facts and no atom of the negative body is in the already produced set of facts for the said matching.
From the stable model atoms we can infer the superclasses of ascorbic acid, that is we deduce that ascorbic acid is—among others—an unsaturated, polyatomic, heteroorganic, cyclic molecular entity that contains carbon and a carboxylic ester. If there is no relevant atom for a chemical class in the stable model, then we conclude that the said class is not a valid subsumer, e.g. since carboxylicAcid ( a ) is not found in the stable model, carboxylic acid is not a superclass of ascorbic acid.
The KB discussed above contains rules with function symbols in the head, such as the rule used to encode the molecular structure of ascorbic acid. These rules may incur non-termination during the computation of the stable model due to the creation of infinitely many terms. In order to ensure termination of our reasoning process and thus decidability of the employed formalism, we perform a decidability check on the constructed KB. In a nutshell, the decidability check (also known and as model-summarising acyclicity) involves transforming the rules of the KB and inspecting the stable models of the transformed KB for the existence of a special symbol. If the KB passes the decidability check, then termination is guaranteed; this is the case for the types of KBs that were previously described. Technical details of the aforementioned condition are out of the scope of this text and can be found in the relevant sources .
CDK-aided parsing. LoPStER parses the molfiles  of the molecules to be classified using the Chemistry Development Kit Java library . The molfile is a widely used chemical file format that describes molecular structures with a connection table; e.g. the molfile of ascorbic acid appears on the left of Figure 1. For each molecule, a description graph (e.g. Figure 1 bottom right) representation is generated from its molfile according to a transformation as the one described for ascorbic acid.
Compilation of the KB. For each molecule the description graph representation is used to produce a set of rules that encode the structure of the molecule, following the translation that was discussed in the previous section. These rules along with the classification rules and the facts necessary to determine subclass relations are combined to produce DLV programs (i.e. sets of rules) that are stored as plain text files on disk. In particular two kinds of DLV programs are created for each molecule, the program needed to perform the decidability check as described before and the program needed to compute subclass relations between the molecules and the chemical classes.
Invoke DLV for decidability check. During this step, the model of the program, which was produced in the previous step for acyclicity testing, is computed. If the check is successful, then execution proceeds to the next stage; otherwise, the program is exited with a suitable output message.
Invoke DLV for model computation. This is the stage where DLV is invoked to compute the stable model of the KB. Due to the check of the previous step, the computation is guaranteed to terminate.
Stable model storage. At this point, the stable model computed by DLV is stored in a file on disk to enable subsequent discovery of the subclass relations.
Subsumptions extraction. This is the final phase where the stable model file is parsed in order to detect the superclasses of each molecule. All the subsumee-subsumer pairs are stored in a separate spreadsheet file on disk.
In order to assess the applicability of our implementation, we measured the time required by LoPStER to perform classification of molecules. To obtain test data we extracted molfile descriptions of 500 molecules from the ChEBI ontology. The represented compounds were of diverse size, varying from 1 to 59 atoms. Next, we investigated the scalability of our prototype by altering two different parameters of the knowledge base, namely the number of represented molecules and the type of modelled chemical classes. Initially, we constructed ten DLV programs each of which contained rules encoding 50·i different compounds, where 1≤i≤10, and rules defining the chemical classes (a sample of which was previously described) excluding the cyclicity-related classes (48 classes in total). Next, we repeated the same construction but this time including the rules for the cyclicity-related classes (51 classes in total). In the rest of the section, we refer to the first setting as ‘no cyclic’ and to the second as ‘with cyclic’.
Additionally and in order to optimise the performance, we explored how classification times fluctuate depending on the size of DLV programs. In particular, we partitioned the DLV programs into modules, we measured classification times for each module separately and we summed up the times. Each module contains the facts and the rules describing a subset of the molecules represented in the initial DLV program; the rules defining chemical classes are included in each one of the modules. Thus, the size of each module depends on the number of encoded molecules. We tested modules of various sizes as well as DLV programs without any partitioning for both ‘no cyclic’ mode and ‘with cyclic’ mode. Modifying the size of the module had a clear impact on the measured times and performing classification with the modularised knowledge base was always quicker than with the unpartitioned one; we observed the shortest execution times for module size 50 when testing in ‘no cyclic’ mode and for module size 20 when testing in ‘with cyclic’ mode; the timings we provide next refer to the aforementioned module sizes.
Time measurements for classification
No of rules
Time no cyclic
Time with cyclic
The performance results of Table 1 are encouraging for the practical feasibility of our approach: the classification of 500 molecules was completed in less than 33 secons for the suite of 51 modelled chemical classes. The drop in classification times between the 50 and 100 molecules case is potentially due to JVM startup overhead. One can also observe that the rules encoding cyclicity-related classes introduce a significant overhead for the classification times. In fact, it is the class that recognises molecules with cycles of arbitrary length that incurs the performance penalty. The rules that encode the class of cyclic molecules need to identify patterns that are extremely frequent in molecular graphs; as a consequence, the amount of computational resources needed to detect ring-containing molecules is much higher. However, since our class definition for cyclic molecules detects compounds with cycles of variable length, which is a significant property for the construction of chemical hierarchies, we consider this overhead acceptable.
Discussion and related work
Concerning expressive power, the current approach allows for the representation of strictly more chemical classes in comparison with other logic-based applications for chemical classification. Villanueva-Rosales and Dumontier  describe an OWL ontology of functional groups for the classification of chemical compounds; in their work, they point out the inherent inability of OWL to represent cyclic functional groups and how this impedes the use of OWL in logic-based chemical classification. As a remedy, Hastings et al.  employ an extension of OWL  for the representation of non-tree-like structures and, thus, for the classification of molecular structures. However, the used formalism only allows for the identification of cycles of fixed length and with alternating single and double bonds. In the current approach we are able to recognise molecules containing cycles of both arbitrary and fixed length and without requiring a particular configuration of bonds.
Moreover, in both approaches outlined above the adopted open world assumption of OWL prevents one from defining structures based on the absence of certain characteristics. In our approach we operate under the closed world assumption which permits the definition of a broad range of chemical classes that were not expressible before such as the class of inorganic, hydrocarbon or saturated compounds. Finally and in comparison with previous work , we take full advantage of the suggested formalism by specifying a much wider range of chemical classes and we do not require from the modeller a precedence relation between the represented structures.
In terms of performance, the classification results appear more promising than previous and related work. Hastings et al.  report that a total of 4 hours was required to determine the superclasses of 140 molecules, whereas LoPStER identifies the chemical classes of 500 molecules in less than 33 seconds. LoPStER is quicker in comparison with previous work too  where 450 seconds were needed to classify 70 molecules (two orders of magnitude faster). Please note that both cases discussed above considered a subset of the chemical classes used here. Regarding the significant change in speed, we identify the following two factors that could explain it. First, DLV is a more suitable reasoner for our setting due to its bottom-up computation strategy as well as its active maintenance team and frequent releases. Second, we employ a more efficient condition (model-summarising acyclicity  instead of semantic acyclicity ) in order to obtain termination guarantees which allows for a more prompt decidability check. Finally, the classification times reported here are slightly improved in comparison with a preliminary version of this paper due to some modelling optimisations and the use of a recent new version of DLV.
The chemical classification methodology that we present here is similar to other classification efforts based on semantic technologies, such as classification of proteins  or lipids . Wolstencroft et al. use a bioinformatics tool to extract composition information from protein descriptions and subsequently translate this information into OWL axioms; these axioms are next used to classify the proteins using a DL reasoner. Chepelev et al. use a cheminformatics tool to process lipid descriptions and produce annotated lipid specifications that are then classified using an OWL ontology. The motivation of these two investigations is similar to ours, i.e. alleviation of biocurating tasks; what distinguishes the two approaches from ours is the use of a different ontology language and the role that this language plays during classification. In particular, in our work we use nonmonotonic existential rules instead of OWL which, unlike OWL, are able to capture cyclic structures. Also, in the sequence of steps followed by our classification process we do not rely on a cheminformatics functionality to algorithmically annotate the molecular descriptions, but instead the identification of structural features forms integral part of reasoning. The framework we suggested can be suitable for the domains of lipids and proteins, as long as they are restricted to structures of finite size; however empirical evaluation would be needed to assess the suitability of the framework in practice. Regarding the application of our prototype to ChEBI classification, it could be used to classify ChEBI molecules under the chemical classes defined here, but more curating effort would be needed to model the thousands of chemical classes that appear in ChEBI.
In this work, we represent and reason about chemical knowledge using an ontology language. However, the majority of axioms constituting the ontology, that is the molecule descriptions, are sourced through molfiles that are parsed using cheminformatics libraries. The information provided by these files includes connectivity between atoms, types of atoms and bonds and charges of atoms. This information is converted into logical axioms that are subsequently processed by an automated reasoning algorithm to identify the chemical classes of the molecules. This approach has the advantage of allowing the knowledge modeller to define new classes in a declarative way, that is without the need of writing code for detecting their subsumees. However, a feature that could be detected using cheminformatics algorithms and become part of the ontology axioms is the existence of ring atoms. The benefits of such a modification could be twofold: it could considerably speed up the computation of all cyclicity-related classes (e.g. determining whether an atom is a ring atom can be done very quickly using the CDK library) and at the same time could allow for the definition of strictly more cyclicity-related classes, such as carbocyclic compounds.
An alternative approach could be to build rules from chemical identifiers other than molfiles, such as InChi  or preferred IUPAC names . In particular, InChi with its abilitiy to encode isotopical and stereochemical information (which can be critical for biological applications) could lead to richer chemical modelling. Also, widely used chemical databases, such as ChemSpider , could be used as a resource for adding to rules information about molecular properties.
A category of molecules that our framework does not cover is tautomers. A tautomer is each of two or more isomers that exist together in equilibrium, and are readily interchanged by migration of an atom (usually hydrogen) or group within the molecule. InChi handles tautomerism by allowing a compound to contain mobile hydrogen atoms, that is some hydrogens are marked as being able to occur in different positions. This is an approach that could be adopted by our methodology too, if we extended our formalism with the ability to represent disjunctive information. However, enriching nonmonotonic existential rules with disjunction would require to alter the design and implementation of the reasoning algorithm, so treating tautomers could be part of a future extension of our framework.
We presented an implementation that performs logic-based classification of chemicals and builds upon a sound and complete reasoning procedure for nonmonotonic existential rules; our prototype relies on the DLV system and is considerably quicker than previous approaches. For our evaluation, we represented a wide variety of chemical classes that are not expressible with OWL-based formalisms and described a surface syntax that could enable cheminformaticians to define ontological descriptions of chemical entities intuitively and without the need to use first-order logic notation; additionally, our software revealed subclass relations that are missing from the manually curated ChEBI ontology as well as some erroneous ones. We demonstrated thus the capabilities of a datalog-based ontology language that displays a favourable trade-off between expressive power and performance for the purpose of structure-based classification.
Further work could involve the building of an ontology editor for the creation of surface syntax expressions and their automatic conversion into nonmonotonic existential rules. We will also seek to extend our prototype to accommodate subsumption between chemical classes so as to generate a complete multi-level chemical hierarchy using ideas from our recent work [49, 50]. We could extend our formalism with numerical value restrictions  in order to express e.g. classes depending on molecular weight. Moreover, it could be of interest exploring the integration of our prototype with Protégé , Life Sciences platforms  and chemical structure visualisation tools [54, 55] as well as defining a mapping of the introduced formalism to RDF .
Web ontology language
Chemical entities of biological interest
World wide web consortium, KR:Knowledge representation
Logic programming for structure entities reasoner
RDFResource description framework.
We would like to thank Dr Chris Batchelor-McAuley for answering our chemistry questions and the anonymous reviewers of this article for providing useful references and highly constructive comments. This work was supported by the Royal Society, the Seventh Framework Program (FP7) of the European Commission under Grant Agreement 318338, “Optique” and the EPSRC projects ExODA, Score! and MaSI3.
- Wolstencroft K, Lord PW, Tabernero L, Brass A, Stevens R: Protein classification using ontology classification. ISMB (Supplment of Bioinformatics). 2006, Oxford University Press, 530-538.http://bioinformatics.oxfordjournals.org/content/22/14/e530,Google Scholar
- Chepelev L, Dumontier M:Chemical entity semantic specification knowledge representation for efficient semantic cheminformatics and facile data integration. J Cheminformatics. 2011, 3 (20):
- Chepelev L, Dumontier M:Semantic Web integration of Cheminformatics resources with the SADI framework. J Cheminformatics. 2011, 3 (16):
- Horrocks I, Patel-Schneider PF, van Harmelen F:From SHIQ and RDF to OWL: the making of a web ontology language. J Web Sem. 2003, 1: 7-26. 10.1016/j.websem.2003.07.001.View ArticleGoogle Scholar
- Chan J, Kishore R, Sternberg P, Van Auken K:The gene ontology enhancements for 2011. Nucleic Acids Res. 2012, 40 (D1): D559-D564.View ArticleGoogle Scholar
- Hastings J, de Matos P, Dekker A, Ennis M, Harsha B, Kale N, Muthukrishnan V, Owen G, Turner S, Williams M, Steinbeck C:The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013. Nucleic Acids Res. 2013, 41 (Database-Issue): 456-463.View ArticleGoogle Scholar
- Wolstencroft K, Brass A, Horrocks I, Lord PW, Sattler U, Turi D, Stevens R:A little semantic web goes a long way in biology. ISWC. 2005, Springer,http://link.springer.com/chapter/10.1007%2F11574620_56,Google Scholar
- Chepelev LL, Riazanov A, Kouznetsov A, Low HS, Dumontier M, Baker CJO:Prototype semantic infrastructure for automated small molecule Classification and Annotation in Lipidomics. BMC Bioinformatics. 2011, 12: 303-10.1186/1471-2105-12-303.View ArticleGoogle Scholar
- Magka D, Motik B, Horrocks I: Modelling structured domains using description graphs and logic programming. ESWC, Volume 7295 of Lecture Notes in Computer Science. Edited by: Simperl E, Cimiano P, Polleres A, Corcho Ó, Presutti V. 2012, Springer, 330-344.Google Scholar
- Mungall C: Experiences using logic programming in bioinformatics . ICLP. 2009, Springer, 1-21. [Keynote talk]. http://link.springer.com/chapter/10.1007%2F978-3-642-02846-5_1,Google Scholar
- Vardi MY: Why is modal logic so robustly decidable?. Descriptive Complexity and Finite Models DIMACS Workshop. 1996, American Mathematical Society, 149-184.Google Scholar
- Li C, Donizelli M, Rodriguez N, Dharuri H, Endler L, Chelliah V, Li L, He E, Henry A, Stefan MI:BioModels Database: An enhanced, curated and annotated resource for published quantitative kinetic models. BMC Syst Biol. 2010, 4: 92-10.1186/1752-0509-4-92.View ArticleGoogle Scholar
- Croft D, O’Kelly G, Wu G, Haw R, Gillespie M, Matthews L, Caudy M, Garapati P, Gopinath G, Jassal B, Jupe S, Kalatskaya I, Mahajan S, May B, Ndegwa N, Schmidt E, Shamovsky V, Yung C, Birney E, Hermjakob H, D’Eustachio P, Stein L:Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res. 2011, 39 (Database-Issue): 691-697.View ArticleGoogle Scholar
- Hoehndorf R, Dumontier M, Gkoutos GV:Identifying aberrant pathways through integrated analysis of knowledge in pharmacogenomics. Bioinformatics. 2012, 28 (16): 2169-2175. 10.1093/bioinformatics/bts350.View ArticleGoogle Scholar
- Ferreira JD, Couto FM:Semantic similarity for automatic classification of chemical compounds. PLoS Comput Biol. 2010, 6 (9): e1000937-10.1371/journal.pcbi.1000937.View ArticleGoogle Scholar
- The database and ontology of chemical entities of biological interest. [http://www.ebi.ac.uk/chebi/],
- Bolton EE, Wang Y, Thiessen PA, Bryant SH:PubChem: integrated platform of small molecules and biological activities. Ann Reports in Comput Chem. 2008, 4: 217-241.View ArticleGoogle Scholar
- Wegner JK, Sterling A, Guha R, Bender A, Faulon JL, Hastings J, O’Boyle NM, Overington JP, van Vlijmen H, Willighagen EL:Cheminformatics. Commun ACM. 2012, 55 (11): 65-75. 10.1145/2366316.2366334.View ArticleGoogle Scholar
- Villanueva-Rosales N, Dumontier M:Describing chemical functional groups in OWL-DL for the classification of chemical compounds. OWLED CEUR-WS.org. 2007,http://ceur-ws.org/Vol-258/paper28.pdf,Google Scholar
- Konyk M, Battista ADL, Dumontier M:Chemical knowledge for the semantic web. DILS. 2008, Evry, France: Springer, 169-176.Google Scholar
- Hastings J, Dumontier M, Hull D, Horridge M, Steinbeck C, Stevens R, Sattler U, Hörne T, Britz K:Representing chemicals usingowl, description graphs and rules. OWLED, Volume 614. 2010, CEUR-WS.org,http://ceur-ws.org/Vol-614/owled2010_submission_13.pdf,Google Scholar
- Dumontier M: Molecular symmetry and specialization of atomic connectivity by class-based reasoning of chemical structure. OWLED. 2012, CEUR-WS.org,http://ceur-ws.org/Vol-849/paper_33.pdf,Google Scholar
- Hastings J, Magka D, Batchelor CR, Duan L, Stevens R, Ennis M, Steinbeck C:Structure-based classification and ontology in chemistry. J Cheminformatics. 2012, 4: 8-10.1186/1758-2946-4-8.View ArticleGoogle Scholar
- King R, Muggleton S, Srinivasan A, Sternberg M:Structure-activity relationships derived by machine learning: the use of atoms and their bond connectives to predict mutagenicity by inductive logic programming. Proceedings of the National Academy of Sciences. 1996, 93: 438-442. 10.1073/pnas.93.1.438.View ArticleGoogle Scholar
- Deshpande M, Kuramochi M, Wale N, Karypis G:Frequent substructure-based approaches for classifying chemical compounds. IEEE TKDE. 2005, 17 (8): 1036-1050.Google Scholar
- Grego T, Pesquita C, Bastos HP, Couto FM:Chemical entity recognition and resolution to ChEBI. ISRN Bioinformatics. 2012, 2012: Article ID 619427-View ArticleGoogle Scholar
- Bobach C, Böhme T, Laube U, Püschel A, Weber L:Automated compound classification using a chemical ontology. J Cheminformatics. 2012, 4: 40-10.1186/1758-2946-4-40.View ArticleGoogle Scholar
- Sankar P, Aghila G:Design and development of chemical ontologies for reaction representation. J Chem Inform Modeling. 2006, 46 (6): 2355-2368. 10.1021/ci050533x.View ArticleGoogle Scholar
- Feldman HJ, Dumontier M, Ling S, Haider N, Hogue CW:CO: A chemical ontology for identification of functional groups and semantic comparison of small molecules. FEBS Lett. 2005, 579 (21): 4685-4691. 10.1016/j.febslet.2005.07.039.View ArticleGoogle Scholar
- Grau BC, Horrocks I, Krötzsch M, Kupke C, Magka D, Motik B, Wang Z:Acyclicity notions for existential rules and their application to query answering in ontologies. J Artif Intell Res (JAIR). 2013, 47: 741-808.Google Scholar
- Magka D:Foundations and applications of knowledge representation for Structured entities. PhD thesis. University of Oxford, 2013,
- Horridge M, Drummond N, Goodwin J, Rector AL, Stevens R, Wang H: The manchester OWL syntax. OWLED, Volume 216 of CEUR Workshop Proceedings. Edited by: Grau BC, Hitzler P, Shankey C, Wallace E. 2006, CEUR-WS.org,http://ceur-ws.org/Vol-216/submission_9.pdf.Google Scholar
- Glimm B, Horridge M, Parsia B, Patel-Schneider PF: A syntax for rules in OWL 2. OWLED, Volume 529 of CEUR Workshop Proceedings. Edited by: Hoekstra R, Patel-Schneider PF. 2009, CEUR-WS.orghttp://ceur-ws.org/Vol-529/owled2009_submission_16.pdf.Google Scholar
- Tudose I, Hastings J, Muthukrishnan V, Owen G, Turner S, Dekker A, Kale N, Ennis M, Steinbeck C:OntoQuery: easy-to-use web-based OWL querying. Bioinformatics. 2013, 29 (22): 2955-2957. 10.1093/bioinformatics/btt514.View ArticleGoogle Scholar
- Magka D, Krötzsch M, Horrocks I:A syntax for representing structured entities. Tech. rep., University of Oxford 2013. [http://www.cs.ox.ac.uk/isg/people/despoina.magka/pubs/reports/MagkaKH-SS-13.pdf],
- LoPStER. [https://github.com/magkades/lopster],
- Gelfond M, Lifschitz V: The stable model semantics for logic programming. ICLP/SLP. 1988, MIT press, 1070-1080.Google Scholar
- Cuenca Grau B, Horrocks I, Krötzsch M, Kupke C, Magka D, Motik B, Wang Z:Acyclicity conditions and their application to query answering in description logics. KR 2012. 2012, Rome, Italy: AAAI Press,Google Scholar
- Leone N, Pfeifer G, Faber W, Eiter T, Gottlob G, Perri S, Scarcello F:The DLV system for knowledge representation and reasoning. ACM TOCL. 2006, 7 (3): 499-562. 10.1145/1149114.1149117.MathSciNetView ArticleGoogle Scholar
- Dalby A, Nourse JG, Hounshell WD, Gushurst AKI, Grier DL, Leland BA, Laufer J:Descriion of several chemical structure file formats used by computer programs developed at molecular design limited. J Chem Information and Comput Sci. 1992, 32 (3): 244-255.View ArticleGoogle Scholar
- Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen EL:Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics. Curr Pharm Des. 2006, 12 (17): 2111-2120. 10.2174/138161206777585274.View ArticleGoogle Scholar
- Motik B, Cuenca Grau B, Horrocks I, Sattler U:Representing ontologies using description logics, description graphs, and rules. Art Int. 2009, 173 (14): 1275-1309. 10.1016/j.artint.2009.06.003.MathSciNetView ArticleGoogle Scholar
- Heller SR, McNaught AD:The IUPAC international chemical identifier (InChI). Chem Int. 2009, 31: 7-Google Scholar
- McNaught AD, Wilkinson A: Compendium of Chemical Terminology, Volume 1669. 1997, Oxford, UK: Blackwell Science OxfordGoogle Scholar
- Pence HE, Williams A:ChemSpider: an online chemical information resource. J Chem Educ. 2010, 87 (11): 1123-1124. 10.1021/ed100697w.View ArticleGoogle Scholar
- Boelling C, Dumontier M, Weidlich M, Holzhütter HG: Role-based representation and inference of biochemical processes. ICBO. 2012, CEUR-WS.orghttp://ceur-ws.org/Vol-897/session3-paper14.pdf,Google Scholar
- Low H, Baker C, Garcia A, Wenk M: An OWL-DL ontology for classification of lipids. ICBO. 2009, Nature precedings, 3-3.http://precedings.nature.com/documents/3542/version/1,Google Scholar
- Sang LH:Knowledge representation and ontologies for lipids and lipidomics. Master’s Thesis. 2009,Google Scholar
- Magka D, Krötzsch M, Horrocks I: Computing stable models for nonmonotonic existential rules. IJCAI. Edited by: Rossi F. 2013, IJCAI/AAAI,http://www.aaai.org/ocs/index.php/IJCAI/IJCAI13/paper/view/6598,Google Scholar
- Krötzsch M, Magka D, Horrocks I:Concrete results on abstract rules. LPNMR, Volume 8148 of Lecture Notes in Computer Science. Edited by: Cabalar P, Son TC. 2013, Corunna, Spain: Springer, 414-426.Google Scholar
- Magka D, Kazakov Y, Horrocks I:Tractable extensions of the description logic, with numerical datatypes. J Autom Reasoning. 2011, 47 (4): 427-450. 10.1007/s10817-011-9235-0.MathSciNetView ArticleGoogle Scholar
- Protégé Ontology Editor. [http://protege.stanford.edu],
- Spjuth O, Alvarsson J, Berg A, Eklund M, Kuhn S, Mäsak C, Torrance GM, Wagener J, Willighagen EL, Steinbeck C, Wikberg JES:Bioclipse 2: A scriptable integration platform for the life sciences. BMC Bioinf. 2009, 10: 397-10.1186/1471-2105-10-397.View ArticleGoogle Scholar
- Jmol: an open-source Java viewer for chemical structures in 3D. [http://www.jmol.org],
- Krause S, Willighagen EL, Steinbeck C:JChemPaint - using the collaborative forces of the internet to develop a free editor for 2D chemical structures. Molecules. 2000, 5 (10): 93-98.View ArticleGoogle Scholar
- Klyne G, Carroll JJ, McBride B:Resource description framework (RDF) concepts and abstract syntax. W3C Recommendation. 2004,, 10. http://www.w3.org/TR/rdf-concepts/,Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.