BAO 2.0 native organization and main components
The new BAO 2.0 formally describes perturbation bioassays in the domain of drug and probe discovery, such as small molecule HTS assays and screening results for the purpose of categorizing the assays and outcomes by concepts that relate to the screening model system (format), assay method, the biology interrogated in the assay (such as a protein target or biological process), the detection method (how does the assay work), and types of results (endpoints). BAO 2.0 is organized into several major sections, which include multiple levels of subcategories of subsumption class hierarchies. A number of specific object property relationships were created to connect the classes and develop a knowledge representation.
The main categories in BAO 2.0, titled components, include bioassay, assay biology, assay method, assay format, assay endpoint, assay screened entity (Figure 1). Each of these component classes includes the subsumption trees of terms corresponding to the category and additional trees of related terms to describe each of the main components properly and formally. In BAO 2.0, we incorporated a slightly different pattern from BAO 1.6, since we were interested in making BAO 2.0 compatible with the existing upper-level and other domain-level ontologies. The BAO 2.0 categories also lend BAO to its native structures that is most useful to users, for example to annotate assays or to implement a user interface in a software application. We describe briefly the main class hierarchies of BAO 2.0 corresponding to the above components (Figure 1):
-
BAO assay bioassay component includes the bioassay subsumption tree, and several other classes to describe assays, including assay kit, bioassay type, and bioassay specification, which contains terminology trees to describe various details about a bioassay and its context. The class hierarchy bioassay includes the list of the bioassays and their formal description, e.g., cell cycle assay, enzyme activity assay. Bioassays are organized roughly by their application (what the assay is used for). The class hierarchy assay kit includes the reagents and their cocktails that are commercially available to perform the different chemical reactions that encompass an assay (i.e., out of the box, ready to run assays). The information in bioassay specification is similar to BAO 1.6.
-
BAO assay format component includes the assay format subsumption tree to describe the biological model system; a conceptualization of assays based on the biological and/or chemical features of the experimental system.
-
BAO assay method component includes terminologies to describe how the assay is performed, most importantly assay method and physical detection method. It also includes computational method, instrument, and relevant other material entity "assay ingredients". The class hierarchy assay method includes assay design method and assay supporting method; assay design method describes how a biological perturbation of the model system is translated into a detectable signal. The class hierarchy computational method contains various methods that are based on the application of information technology to chemistry and biology. The physical detection method hierarchy includes the method (technology) used to detect the signal that corresponds to the perturbagen in the assay environment and enabled by the assay design method. Class instrument consists of instruments used for detection/readout from an assay and their components, e.g., FLIPR, ViewLux plate reader, PHERAstar, etc; software lists the types of software that are used in the various instruments, e.g., image analysis software, which is a component of the high content screening (HCS) platforms.
-
BAO assay biology component includes various class hierarchies to describe the biology of the assay including biological process, biological macromolecule, cell line cell, cellular component, cell phenotype, anatomical entity, disease, function, organism. Many of these are mapped to external sources (vide infra). To describe the biology of a simple binding assay for example, a biological macromolecule protein would have the biological role target. Many other role classes exist (vide infra). The class function includes the physiological function of biological macromolecules, e.g., protein binding, kinase activity. This module was imported from the Gene Ontology (GO). The class cellular phenotype encompasses both the molecular characteristics of a cell and the (morphological) shape and structure of a cell and its parts.
-
BAO assay screened entity component includes screened entity, which is the chemical or biological entity that is tested/screened in the assay. The screened entity typically modulates the function of the (known or unknown) biological macromolecule with the role of a target. The most important screened entity for BAO is the class small molecule, that contains compounds that are tested in the process of developing chemical probes and drugs, which is the primary domain of BAO.
-
BAO assay endpoint component includes subsumption trees to describe the assay result or endpoint and other required information to quantitatively or qualitatively express the biological perturbation measured in a bioassay, such as units of measurement (imported from UO), and other details to interpret the results in the context of the assay methodology and the biology, such as as the mode of action of the perturbagen that the endpoint characterizes, or the signal direction and endpoint action correlation of the assay. More details about the class endpoint are described below.
-
Additional classes that were not assigned to any one of the main BAO components are organization, people, role, and quality: Organization includes, for example manufactures of assay kits, instruments, etc., or screening center where assays are performed. People include the individuals who are involved in performing scientific research, such as assay development, compound screening, chemical synthesis, etc. Role describes the action that an entity performs in a given context; an entity can have more than one role, e.g., target, perturbagen. BAO 2.0 has imported roles from the Chemical Entities of Biological Interest (ChEBI) ontology and we have added some missing classes. Quality lists the characteristics that inhere in an entity of biological origin, namely, organism, cell, and molecule or a physical entity, e.g., intensity, optical quality. Most of the terms in this class were imported from the Phenotypic Quality Ontology (PATO); missing ones were added to BAO.
-
BAO properties include both the object and data types that are required to create relationships among the different concepts in BAO 2.0. These properties were either imported from the Relationship Ontology (RO), where available or created in BAO 2.0.
Upper level ontology structure and aligning external ontologies
Since, there are several advantages of using upper level ontologies (ULOs), BAO 2.0 makes use of the Basic Formal Ontology (BFO) and OBO Relations Ontology (OBO-RO) as its upper level ontologies. We have used the current release of BFO ontology (http://purl.obolibrary.org/obo/bfo.owl), which is also tightly coupled with OBO-RO ontology (http://purl.obolibrary.org/obo/ro.owl). Figure 2 shows the main categories of BFO and examples of corresponding BAO 2.0 classes. BFO conceptualization abstractly represents objects, entities, and relations in our domain of discourse, and it is substantially used in biomedical ontologies compared to other OWL version of ULOs such as SUMO (http://www.ontologyportal.org/SUMO.owl) or DOLCE (http://www.loa.istc.cnr.it/ontologies/DLP397.owl). The advantage of using an ULO is that it allows integration of existing domain ontologies, by grounding them on a formally rigid ontological framework [23, 24]. We make available a development instance of the BAO BFO version. Figure 2 also illustrates external ontologies, components of which we currently use in BAO (see Methods). Their alignment with BAO is facilitated in part by the BFO structure [25, 26]. One important mid level ontology is the Ontology for Biomedical Investigations (OBI) [27]. We have previously outlined the different focus of BAO vs. OBI [12]. However, this is not to say that they are incompatible; alignment is one of the future tasks required to evolve BAO further. We have created a version of BAO 2.0 that contains BFO and OBO-RO as ULOs (bao_complete_bfo_dev, which is a development version) and another one without them (bao_complete, released). Bao_complete_bfo_dev simplifies alignments to external ontologies and is targeted to the ontology development community while bao_complete is targeted to the drug and probe screening community and developers of software applications (such as our BAOSearch application). We emphases the fact that the BAO-to-BFO alignment is based on our knowledge and understanding of BFO and OBO-RO structures, and BAO mechanisms. The alignment is an ongoing process, and we have a community wide bug reporting system to uses to provide feedback to provide semantically better alignments. bao_complete_bfo_dev and bao_complete are targeted towards different users groups, the latter is more amenable to perform on large-scale analysis of the chemical biology data without the additional constraints imposed by BFO and OBO-RO.
BAO 2.0 modular architecture and implementation
The modularization implementation is described in detail in Methods. Our modularization approach is illustrated in Figure 3. The modularization framework uses a layered architecture and uses the modeling primitives, vocabularies, modules and axioms. Vocabularies only contain terms (classes with subsumption only). Module layers enable combining vocabularies in flexible ways to create desired ontology structures or subsets. Axioms are separate files that do not contain any classes or properties. Classes and relationships are imported (directly or indirectly) from module and/or vocabulary files. The above mentioned classes in BAO 2.0 were created as separate vocabulary files. They were then imported into the bao_core file. BAO core only contains axioms incorporating BAO classes and BAO properties. In our modularization approach we separate external and internal sources. External modules (compare Figure 2) are generated as described in Methods. Overlap among external and internal classes and properties (i.e., those required in BAO core) are resolved using combinator modules, that is, external classes and properties are mapped (equivalence or subsumption) to corresponding BAO classes and properties. This approach assures that BAO core remains stable and independent from external sources that may change. The complete BAO includes external axioms and imports BAO core (indirectly importing all vocabularies and properties) and external modules (bao_complete file). Using this approach we also generated the BFO version of BAO. All internal and external vocabulary, module and axiom files are available via the BAO website (http://bioassayontology.org). Figure 4 shows the current implementation of the modularization illustrating vocabularies, intermediate modules, ontology axioms, BAO internal and external sources and their mappings.
Modeling assays and results using BAO 2.0
In addition to the BAO modularized design and systematic construction, we also tried to make the definitions of concepts in BAO consistent. We especially defined bioassays with their essential components such as assay design method, endpoints, measure groups, and molecular participants. Figure 1 illustrates how assays are modeled by specifying information related to the biology (such as target and/or biological process), assay format, assay method (including assays design method and physical detection method, screened entity and endpoint (result) as described above. The BAO 2.0 architecture allows a more flexible definition of bioassays, for example the same biomolecule can participate in assays in different roles and functions. Important classes include:
-
target: The target concept is defined by using the relationships has participant and has role. That is because targets are biological entities (i.e., participants) of assays that are playing the role target. Assays may have single or multiple targets depending on the assay type.
-
biological process: A large number of assays are designed to measure outcomes of biological processes. Thus, based on the assay in study, we have written axioms for these information in the assay definitions.
-
screened entity: This concept refers to a molecular entity with the role screened entity role.
-
participants: Every assay has at least one participant, usually more. While axiomizing the assays, we try to define the particular roles that these participants play in the different assays. However, when we are not certain about the roles, we choose not to put axioms in order to avoid false reasoning cases.
-
assay design method: Every assay has an assay design as the underlying method to generate a detectable signal and could correlate with the strength of the perturbation of the biological model system by the screened entity.
-
physical detection method: An assay design method, generating a type of signal is linked to a corresponding detection method (the physical principle of detecting the signal), which is typically performed by a detection instrument.
The concepts listed above along with various other classes are used while modeling the concepts bioassay, measure group, and endpoint.
We had previously introduced the concept measure group to link multiple endpoints to the same bioassay [28]. We have now generalized this model so that measure group can be derived from one or more measure groups. This allows the formal and iterative construction of more complex assays and endpoints that are derived from multiple measurements (Figure 5). The axiomatization was done in a way that infers measure group as a subclass of bioassay (compare Figure 1). The axiomatization was motivated by pragmatic considerations for the workflows and perspectives for organizing and analyzing the assay results, which is the core focus of BAO. It may be argued, that operationally it is not formally an assay; however that is not in conflict with the BAO perspective. It should be noted that BAO measure groups and results remain associated with their corresponding subclasses of bioassay, whose instances are procedurally, methodologically, and materially real. To understand better the relations between the concepts measure group and endpoint we explore them in more depth:
-
The class measure group is a concept to group and link one or more (different) sets of experimental results to one bioassay. A bioassay can have multiple measure groups. A measure group contains overlapping axioms with the bioassay, which allows the reasoner to infer that the measure group is acting like an equivalent class of bioassay. This equivalence cannot simply be asserted. The measure group, in addition to holding the assay component metadata for each reported endpoint, also provides flexibility to generate different derived endpoints, e.g., IC50 (generated from several response values at different concentrations, i.e., concentration-response), or profile endpoints (e.g., a kinase panel assay). This can be formally done via derived measure groups, in cases where we have multiple measure group that vary in one parameter (such as concentration or kinase target).
-
The class endpoint, alternatively called result, is a quantitive or qualitative representation of a perturbation (change from a defined reference state of the model system) that is measured by the bioassay. An endpoint consists of a series of data points, one for each perturbing agent employed by the assay. Every endpoint is obtained by using at least one measure group. For each endpoint, there exists a unit and a value, which is a number (e.g., float, which makes this concept a data property, and the concept is axiomized using a data property as opposed to an object property). For example, for a concentration endpoint (e.g., IC50), there exists a concentration unit and a concentration value, which is a float number (data property, not functional). Assays could have single or multiple endpoints depending on the assay type.
Endpoints are not used to handle the different measurements in the same assay. That is axiomized through the measure group concept. They may vary due to parameters such as time, concentration, target, and so on, or combinations. The formal definitions allow us to create individuals for different endpoints that might be using the same measure groups, i.e., results are measured once and different methods are applied on these measurements to find different derived endpoints. We can group different measure groups to define "intermediate" results. We can create profile endpoints and we can define profiles of intermediate aggregated measure groups (Figure 5). An endpoint individual is associated with a specific measure group and a specific compound combination and has a specific value and unit.
In BAO 2.0, endpoints are classified into several categories; the most important ones are concentration endpoint (which includes concentration response endpoint), response endpoint, protein substrate and ligand constant, and physical property endpoint. The class mode of action defines the functional effect and physical binding characteristics of the screened entity on the target using the subclasses ligand function mode of action (inhibition, activation, etc.) and ligand binding mode of action (reversible, irreversible, competitive, etc). Each endpoint is associated with a mode of action, e.g., IC50 and percent activation have inhibition and activation as the functional mode of action, respectively. The class signal direction defines how the functional effect of the perturbation corresponds to the intensity of the detected signal, i.e., increase or decrease with activation or inhibition. This is important to identify suitable counter screens; for example if the detected perturbation results in signal decrease in a cell-based assay, cytotoxic compounds may be detected as actives. The class endpoint action correlation defines if the endpoint value corresponds to increased or decreased functional effect (inhibition, activation). Both signal direction and endpoint action correlation are required to formally interpret the results, because the same perturbation (e.g., inhibition of substrate-protein binding by a competing ligand) may be measured via a different molecular entity with the role measured entity (e.g., substrate-bound protein or ligand-bound protein) and the effect can be expressed in different ways (e.g., normalized as remaining percent activity or percent inhibition). Further, depending on the assay design method, the same perturbation in the same model system may result in increased or decreased signal.
Application to model LINCS profiling and panel assays and results
The concepts bioassay, measure group, and endpoint as described above enable the formal definition of panel and profiling assays such as those routinely run in the LINCS program. An effective modeling solution is relevant, because of the emphasis of LINCS to operate on result profiles and signatures, in contrast to individual endpoints. We define a panel assay as the parallel, spatially separate implementation of several identical assays, but that vary in one parameter (other than the screened entity), typically the target. A popular example is a kinase panel, for example the DisoveRx KINOMEscan assay that is also run at LINCS and in which compounds are screened against over 450 kinases in parallel. Similar to a panel assay, a profiling assay can generate a large number of readouts for any given tested compound, but all results are obtained from the same physical experiment, i.e., the same well. Such assays are also called multiplexed assays and rely on sophisticated assay methods and/or detection technologies that enable the detection of many signals in parallel, such as flow cytometry, mass spectrometry or imaging. One example also run at LINCS is the L1000 transcriptional profiling assay (vide supra). As illustrated in Figure 5, our approach would also allow to define concentration response (e.g., IC50) kinase profiling assays via iterative aggregation of sets of measure groups corresponding to two parameters, namely m screening concentration (values) and n kinase targets. The first aggregation by screening concentration (e.g., via curve fitting) defines the IC50 endpoint for each kinase and the second aggregation defines an IC50 kinase profile endpoint. An actual example of such a assay is the ActivX Biosciences KiNative assay, which is also run in the LINCS program. We have modeled several LINCS assays including KINOMEScan assay, transcriptional response profiling assay, cell cycle state assay. The specific instances of these assays including hundreds of kinase targets, transcribed genes, cell lines, etc was implemented in an application ontology and these assays and screening results are available in our LIFE software system [7].
An example of a phenotypic cell-based LINCS assay is the cell cycle state assay. It is also described in BAO 2.0. In the LINCS project, several small molecules that are known to function as kinase inhibitors were tested on cancer cell lines for their ability to arrest the mitotic cell cycle. This assay was modeled in BAO as follows: the assay design method is S phase assessment or M phase assessment method. The presence of the markers, namely, EdU and anti-MPM-2 antibody, indicates that cells have entered/completed S phase and M phase, respectively. Hoechst 33342 was used to stain nuclei from all cells to obtain the total cell count in the assay. The detection method is fluorescence microscopy and the measured entity is DNA. The assay readout parameters are intensity parameter and counting parameter. The intensity of EdU and MPM2 were measured in the nucleus and cytoplasm, respectively. The counts of Hoechst 33342, EdU and MPM2 positive cells were reported after the threshold to signal intensity of each marker was applied. The endpoint was derived from the assay readout parameters after normalizing with the assay controls. The endpoint for this assay is percent apoptotic cells, percent mitotic cells, percent interphase cells, percent DNA replicated cells, percent G2 arrested cells, and/or percent mitotic arrested cells. The cellular phenotype or its disposition is obtained by quantifying cells which are positive for each of these markers.
Categorizing mechanistically related assays by inference
BAO 2.0 contains detailed description of a range of common HTS assay, including the categories: binding assay, cell cycle assay, cell viability assay, cytotoxicity assay, enzyme activity assay, gene expression assay, redistribution assay, and signal transduction assay. The essential information that was described for each assay type includes format, method (including assay design), detection, endpoint, and molecular and cellular entities and their roles, qualities and functions describing the biology of the system or which are key components involved in the assay design or detection methods. We have previously shown how promiscuous frequent hitter compounds (undesired assay artifacts) can be deconvoluted and categorized mechanistically based on detailed knowledge about the assays and their related design and detection methods [29]. However, using the previous version of BAO (1.6) these assays were not yet defined in a way that formalizes all necessary knowledge about their commonalities. This means that previously, in order to perform mechanism-based cross assay analysis, some human expert knowledge was required to identify and categorize related assays beyond their asserted annotations.
BAO 2.0 provides a framework that enables automated classification of assays into meaningful categories of interest, for example to aid in identifying common assay artifacts and their likely mechanism of action. We illustrate this using several related assays: luciferase reporter gene assay, cell viability ATP quantitation assay, cytochrome P450 enzyme activity assay, kinase activity assay, and luciferase enzyme activity assay. Of these, the reporter gene and cell viability assays are cell-based, while the others are biochemical assays. The modeling of these assays is illustrated in Figure 6. All assays use a different assay design method. Therefore they cannot be identified as mechanistically related based on that annotation alone. The physical detection method chemiluminescence is the same for all assays, but it is too generic to classify the assays by mechanisms that underlie artifacts, because luminescence can be generated by many methods. However, among these examples, all assays perform (in different ways) the luciferase-catalyzed chemical reaction of luciferin and ATP forming oxyluciferin and light (luminescence) and thus luciferase and ATP participate in all these assays, although in different roles. For example in the reporter gene assay the amount of expressed luciferase is quantified by the intensity of light (luminescence) produced in the presence of substrates, ATP, and luciferin. In the viability assays the proportion of living cells is quantified by measuring ATP content, again by the same reaction (with ATP as the limiting reagent in the role measured entity). Similarity ATP-coupled assays measure the residual amount of ATP (e.g., after a kinase reaction) by a coupled luciferase reaction. The P450 luciferin-coupled assay mentioned above measures the amount of luciferin generated after detoxification by cytochrome P450 enzyme activity. Luciferase enzyme activity assays quantify the biochemical luciferase enzyme activity by the intensity of light, again using the same chemical reaction. In BAO2.0 we modeled these assays with the necessary formalism to enable the reasoning engine to categorize the assays as mechanistically related. As an example, Figure 7 shows the asserted TBox of the assay design method ATP quantitation using luciferase and ATP coupled enzyme activity measurement method and the inferred TBox in which the latter is classified as a subclass of the former. For illustrative purposes we defined a class of all assays with an assay design method in which luciferase participates (in any role). The axioms and the asserted and inferred hierarchies are shown in Figure 8. All assays mentioned above are inferred as assays that use luciferase, thus illustrating how BAO formal assay definitions enable a classification based on the mechanistic principle of the assay (assay design method). This in turn classifies the assay based on likely common artifacts (e.g. compounds that stabilize or inhibit luciferase) [29]. Figure 8 also shows the justification for classifying the assays mentioned above under this category.
Collaborative development and application of BAO to annotate assays
We had previously annotated (using BAO 1.6) a large set of assays from PubChem [28] and made these annotations searchable in BAOSearch [13], which is a Semantic Web application. These annotations were now mapped to BAO 2.0 and expanded to include additional information such as bioassay type, cell culture conditions, DNA construct details, roles, functions and qualities of molecular entities participating the the assays. 297 luciferase assays (vide supra) containing 328 measure groups are among the annotated assays, including the ones mentioned above and others. As explained above, the formalization of assays in BAO allows us to retrieve these assays based on a participant such as luciferase or ATP, even though these molecular entities were not explicitly annotated. A large project in which BAO has been applied and which in-turn significantly influenced the evolution of BAO is BARD [6]. In BARD, all MLP data, consisting of over 600 probe discoveries, are curated and annotated using controlled terms and organized into probe projects. BARD makes these data searchable in various ways and enables integrative analysis. During the development of BARD, data curation and annotation, the development of new terminology, and evolution of BAO has occurred in parallel. The development of terminologies and ontologies to annotate assays at Novartis also influenced our work in the BARD and BAO projects and highlights its relevance. We also applied BAO to define LINCS assays (vide supra); we make LINCS data searchable and explorable via the LIFE [7], which leverages Semantic Web technologies to integrate and search diverse data types. Our RegenBase project [30] also leverages BAO. BAO is also explored in the ChEMBL and PubChem projects, where BAO endpoints are used in RDF schema [16], and at PubChem. As another example from the pharmaceutical industry, a research group at Astra Zeneca is using BAO to annotate assays in the context of the Open PHACTS project (personal communications).