Modeling molecular functions with FueL
FueL enables the graphical modeling of functions both in a compact and in an extended form. The compact form is particularly suited for large models containing many functions, whereas the extended form is designed for visualizing the dependencies within the structure of a single function or between several functions. Figures 1 and 2 present an exemplary FueL model, depicting the structure of MFO function GO:0015144: carbohydrate transmembrane transporter activity. Figure 1 presents the compact notation, whereas the extended notation is shown in Fig. 2. The stereotypes utilized in the figures are discussed in the remainder of this section.
Functions
A function in FueL is understood as a role that an entity plays in the context of some goal achievement, e.g. in a teleological process. Put differently, a role in virtue of which the transition to a goal situation is achieved, or which contributes to such achievement, constitutes a function. An entity, like putative glucose uptake protein in Fig. 2, that plays such a role has that role as its function. This account of functions is similar to [20], where a biological function of a molecule is described as the role that the molecule plays in a biological process. In this sense, the function GO:0015144: carbohydrate transmembrane transporter activity, defined in GO as ‘catalysis of the transfer of carbohydrate from one side of the membrane to the other’, depicts the catalyst role in the teleological process of transferring carbohydrate from one side of the membrane to the other.
In terms of the structure we can therefore say that a function specification contains as its part a specification of a goal achievement, understood as a teleological entity which is specified in terms of a transformation from an input situation to an output situation. As presented in Figs. 1 and 2, a function is depicted by a UML classifier with a stereotype «Function». It connects to its goal achievement by an association with a stereotype «has-goal-achievement» in the extended notation, whereas the compact notation utilizes the attribute goal_achievement.
Goal achievements
In FueL, a goal achievement (GA) is defined as a teleological transition, i.e., as a transition to a certain output situation (the goal). Note that transitions further exhibit an input situation. The GA characterization applies at both the individual and categorial level. With respect to the latter, input and output are defined as follows:
-
The input category x of goal achievement y is a situation category such that every instance of y is a transition starting from a situation instantiating x.
-
The output x of goal achievement y is a situation category specifying the situations in which instances of y result by transition. Every instance of y is a transition resulting in a situation instantiating x.
For example, the goal achievement (category) carbohydrate transmembrane transport establishes the input category, the instances of which are situations of carbohydrate being on one of the two sides of the membrane, and the output category, the instances of which are situations of carbohydrate being on the other side of the membrane. This means that every instance of carbohydrate transmembrane transport exhibits a transition from an instance of the input category to an instance of the output category, i.e. from individual situations of carbohydrate located on one side of the membrane, to individual situations of carbohydrate located on the other side of the membrane.
In the compact notation, the input is captured by the input attribute of a function, see Fig. 1. In contrast, Fig. 2 illustrates that an association with stereotype «has-input» is used for connecting a function with its input in the extended notation. The representation of outputs is analogous in both variants.
Typically, a transformation from an input to an output situation is a process. At the categorial level, the GA can then be understood as a process category. In the running example, the GA is a teleological process category, namely of carbohydrate transfer from one side of the membrane to the other. This process exhibits the causal transition from the situation of carbohydrate being on one side of the membrane to the situation where carbohydrate is on the other side of the membrane.
Mode of goal achievement
In some cases the specification of a function is not reduced to a mere input-output pair, but it defines constraints on the method of function realization. For example, the molecular functions GO:0015399: primary active transmembrane transporter activity and GO:0015291: secondary active transmembrane transporter activity share the same input: solute is on one side of the membrane, and the same output: solute is on the other side of the membrane. Therefore, the pure input-output views of the functions are equal. However, they are distinct due to the way in which they achieve the goal. The former function is realized by means of some primary energy source, for instance, a chemical, electrical or solar source, whereas the latter relies on a uniporter, symporter or antiporter protein. Thus we see that the functions provide the same answer to the question on what is to be achieved, however they provide different answers on how that is realized. In order to represent this distinction, in FueL we introduce another component of function structure, called Mode of Goal Achievement (or Mode of Realization). The mode x of the goal achievement y specifies the way in which y transforms the input to the output situation. For GO:0015399 the mode is: by some primary energy source, for instance chemical, electrical or solar source, and for GO:0015291 it is: by uniporter, symporter or antiporter protein. The mode is a constraint on the function realization, which does not affect the input or the output. For example, if one adds to the function of transmembrane transport the constraint that the transport should be realized by the uniporter protein, then the input and the output remain unchanged. However, the function as such changes in that not every transportation process realizes it, but only those that are driven by a uniporter protein.
Participants
Often goal achievements are expressed by action sentences of natural language and thus the results of linguistic analysis of action sentences can be applied to the analysis of the structure of goal achievements. In linguistics, the role that a noun phrase plays with respect to the action or state described by the verb of a sentence is called a thematic role [21]. The specifications of molecular functions in MFO often contain two thematic roles – a patient (called an operand in FueL) and an actor (called a doer in FueL). An operand indicates the entity undergoing the effect of the action. At the categorial level we say that an operand y of the goal achievement x specifies a category y such that instances of x operate on instances of y. GO:0015144 operates on (transports) carbohydrate.
A doer is not as common in MFO as an operand. For example, in the discussed carbohydrate transmembrane transport function no doer is indicated. Typically, a doer is a part of the GA in cases where the mode of realization is provided. For instance, the functions GO:0015292: uniporter activity and GO:0015293: symporter activity both specify the mode of realization and each indicates its doer, namely the respective protein.
Patterns of function subsumption
Behind function subsumption various distinct relations are actually implicitly hidden [14]. In this section we introduce three patterns for function subsumption that can be indicated by FueL stereotypes [19]. The subsequent “Application” section demonstrates the application of those patterns to the modeling of MFO.
In FueL the notion of function subsumption is founded on the subsumption of goal achievements. We say that the function x is subsumed by the function y if the goal achievement of x is subsumed by the goal achievement of y. Since goal achievements are quite complex entities, it is not trivial to answer the question of what it means that one goal achievement subsumes another. Here, however, the analysis of GA structure is helpful, which pertains to the intensional aspects of the corresponding GA category, as discussed in previous sections. Based on this approach one can detect various patterns of function subsumption.
Operand specialization
Since function specifications often contain operands, it is very common to construct a hierarchy of functions on the basis of the taxonomic hierarchy of their operands. In fact, this pattern is applied frequently in MFO. Consider, for instance, the functions GO:0015075: ion transmembrane transporter activity and GO:0008324: cation transmembrane transporter activity, linked by the is_a relation in GO. As presented in Fig. 3 the relation between those two functions is based on the relation of their operands, as cation is subsumed by ion.
Function subsumption by operand specialization is depicted in FueL with a specialization link with the stereotype «operand-spec». The supplier of the link is the subsumed function, the client is the subsumer.
Mode addition
Another pattern of function subsumption, frequently met in MFO, is based on modes of goal achievement. Consider two functions presented in Fig. 4, GO:0022857: transmembrane transporter activity and GO:0022804: active transmembrane transporter activity. Both share the same operand, namely substance, as well as the same input-output pair – operand is on one side of the membrane and operand is on the other side of the membrane. In this sense those functions are equal. However, they differ in that the former does not define any mode of realization, whereas the latter has the following mode defined: the transporter binding the solute undergoes a series of conformational changes. Therefore, one can say that GO:0022804 specializes GO:0022857 by addition of a mode. We say that function x is subsumed by the function y by mode addition if x is subsumed by y and x has some mode, whereas y has no mode assigned. Function subsumption by mode addition is depicted in FueL by means of a specialization link with stereotype «mode-added». The subsumed function is the supplier of the link and the subsuming function is a client.
Mode specialization
Subsumption of functions can be based on the mode of realization also in cases where a parent function has already a mode assigned. Consider, for instance, the function GO:0022804: active transmembrane transporter activity having the mode: transporter binds the solute and undergoes a series of conformational changes and the function GO:0015291: secondary active transmembrane transporter activity with the mode: transporter binds the solute and undergoes a series of conformational changes driven by chemiosmotic energy sources, including uniport, symport or antiport. The latter clearly characterizes particular modes of active transmembrane transport. Consequently, it seems intuitive to say that GO:0015291 specializes GO:0022804 (as is the case in GO). We call this type of function subsumption the subsumption by mode specialization and define it as follows: The function x is subsumed by the function y by mode specialization if x is subsumed by y and mode r of x specializes mode s of y. In FueL function subsumption by mode specialization is depicted with a specialization link with stereotype «mode-spec». The subsumed function is the supplier of the link and the specialized function is a client.
Application
Objectives of applying FueL
In general, graphical modeling languages like UML are broadly applied in connection with diverse tasks, such as brainstorming, collaborative design, and the modeling of key principles of systems and subject matters. Another broad area of application concerns standardized visualization, for example, for documentation purposes.
Regarding FueL more specifically, its application to GO and MFO, in particular, pursues three objectives. The first objective is the use of FueL for establishing a semantic basis for molecular functions that supports the representation of functions in a systematic way, beyond their textual description. Moreover, the discussed patterns represent basic knowledge of the interrelations between biological processes and molecular functions. The part_of relation between biological processes and molecular functions can be mapped to the has-goal-achievement association between functions and goal achievements. Figure 2 comprises a corresponding example, where the process GO:0034219: carbohydrate transmembrane transport is modeled as a goal achievement of the function GO:0015144: carbohydrate transmembrane transporter activity.
The second and the main objective of applying FueL to MFO is to explicitly document design choices and the subsumption patterns utilized implicitly in MFO. Figure 5 presents such a documentation of a fragment of MFO in terms of FueL. The patterns are indicated by the FueL stereotypes, which enables an easy-to-grasp visualization of the structure of MFO as well as of the underlying design choices. Stereotypes further allow for displaying multiple facets of function subsumption, as in the case of GO:0022804, which can be understood to involve mode addition as well as operand specialization. The explicit specification of design choices makes the ontology much more intelligible for human users, which is a major benefit of this approach.
Thirdly, the application of FueL reveals potential for the refactoring and revision of GO. Contributing to the latter is another important objective of our work. For instance, the application of FueL in modeling the functions GO:0022857: transmembrane transporter activity and GO:0022891: substrate-specific transmembrane transporter activity shows that both share similar goal achievements: transfer of an operand from one side of a membrane to the other, with input: operand is on one side of the membrane, and output: operand is on the other side of the membrane. Consequently and following FueL, a potential difference between GO:0022857 and GO:0022891 can be searched for in their operands. For GO:0022857 that is ‘a substance’, whereas for GO:0022891 it is ‘a specific substance or group of substances’.
Analysis of refactoring options
Let us consider the previous case in greater detail, thereby identifying three possibilities of analyzing and refactoring MFO elements based on FueL. A first FueL view on a selected set of functions that includes the two just named is depicted in Fig. 5. It rests on the assumption that ‘a specific substance or group of substances’ can be considered as a subclass of ‘a substance’. Accordingly, Fig. 5 documents explicitly the pattern of subsumption between GO:0022857 and GO:0022891, namely as a case of operand specialization. The same aspect applies to GO:0022804, the operand of which is also ‘a specific substance or group of substances’.
This straightforward approach, however, may be reconsidered, especially the question of what the actual relation between ‘a substance’ and ‘a specific substance or group of substances’ is. One indication may be derived from GO:0022892: substrate-specific transporter activity (not displayed in Fig. 5), which is another parent function of GO:0022891 in MFO. An operand of GO:0022892 is exemplified by macromolecules, small molecules or ions. If we thus interpret ‘a specific substance or group of substances’ as macromolecules, small molecules or ions, this seems to suggest that further functions such as GO:0090482: vitamin transmembrane transporter activity and GO:0015238: drug transmembrane transporter activity should also be considered as subclasses of substrate-specific transmembrane transporter activity. The latter is currently not the case in MFO, such that positioning those functions under GO:0022891 is a refactoring option, independently of adopting FueL as a representation language. If FueL is employed, these considerations yield an alternative to Fig. 5 (not shown in a separate figure), where, for instance, GO:0090482 is an operand specialization of GO:0022891 instead of GO:0022857. GO:0022804, based on its operand identical to that of GO:0022891, would turn into a specialization of the latter by mode addition.
Another possible refactoring originates from an analysis of the subclasses of GO:0022891: substrate-specific transmembrane transporter activity. Examining those subclasses we find that they differ only in their operands. Each of those functions specifies the transport of a specific kind of substance, for example, ion (GO:0015075) or carbohydrate (GO:0015144). This suggests that the distinction between the operands of GO:0022857 and GO:0022891 is only superficial. According to this interpretation, GO:0022891 is merely used for the organization of the function taxonomy, i.e., for grouping all functions that are distinguished by their operands. GO:0022891 would then be a duplication of GO:0022857, which is only introduced into MFO for structuring purposes, but which captures no distinct specification of a biological function. The introduction of such grouping artifacts is a design choice that is clearly not desirable, especially in complex ontologies like MFO or GO overall. One reason for avoiding them is that in many cases of using them subclasses occur after several steps of specialization that do not or not exactly match the grouping specification. For example, GO:0005402: cation:sugar symporter activity in Fig. 5 may be questioned to be a (pure) substrate-specific transmembrane transporter activity, given the subsumption path via GO:0022804 involving mode addition and mode specialization.
Concerning the purpose of better organization of the taxonomy, we argue that FueL proves beneficial, not at least due to its stereotyped links. As illustrated in Fig. 6, the application of FueL allows for dropping GO:0022891 (if interpreted as a grouping artifact), on the one hand, while on the other hand, FueL enables the explicit specification of design choices by stereotyped specialization links. Note that this supports the “local” grouping of the immediate, explicit subclasses of a given function based on the link stereotypes.
The decision on such refactoring options, as in any modeling enterprise, is the responsibility of the modeler(s), i.e., GO developers in our case. Regarding refactoring means and methods, however, we argue that the above analysis demonstrates how graphical languages such as FueL, similarly as in software and systems engineering, can drive and support the revision of biological ontologies like MFO. Although graphical modeling may not be efficient for representing the complete content of large and complex ontologies, we defend the position that graphical languages can still be extremely helpful, for example, for depicting ontology fragments that exhibit problems. Moreover, in view of ontology development as a collaborative enterprise, graphical modeling formalisms like FueL help to conduct community based analysis in structured ways.