A neural classification method for supporting the creation of BioVerbNet

Background VerbNet, an extensive computational verb lexicon for English, has proved useful for supporting a wide range of Natural Language Processing tasks requiring information about the behaviour and meaning of verbs. Biomedical text processing and mining could benefit from a similar resource. We take the first step towards the development of BioVerbNet: A VerbNet specifically aimed at describing verbs in the area of biomedicine. Because VerbNet-style classification is extremely time consuming, we start from a small manual classification of biomedical verbs and apply a state-of-the-art neural representation model, specifically developed for class-based optimization, to expand the classification with new verbs, using all the PubMed abstracts and the full articles in the PubMed Central Open Access subset as data. Results Direct evaluation of the resulting classification against BioSimVerb (verb similarity judgement data in biomedicine) shows promising results when representation learning is performed using verb class-based contexts. Human validation by linguists and biologists reveals that the automatically expanded classification is highly accurate. Including novel, valid member verbs and classes, our method can be used to facilitate cost-effective development of BioVerbNet. Conclusion This work constitutes the first effort on applying a state-of-the-art architecture for neural representation learning to biomedical verb classification. While we discuss future optimization of the method, our promising results suggest that the automatic classification released with this article can be used to readily support application tasks in biomedicine. Electronic supplementary material The online version of this article (10.1186/s13326-018-0193-x) contains supplementary material, which is available to authorized users.


Background
The experiment aims to extend the small biomedical verb classification of Korhonen et al. (2006) [1] with the view facilitating the creation of BioVerbNet. The small classification contains 192 verbs organized into a 3-level taxonomy consisting of 16, 34 and 50 classes. We have now applied an automatic classification approach (described in the associated paper draft) to create an extended classification. It consists of 1,149 verbs in total (the 192 original ones plus 957 new ones) that have been grouped into the original class taxonomy based on their shared meanings and syntax according to our learning technique.
Your task is to verify whether these new candidate verbs are really similar in terms of their meanings as well as syntactic patterns to existing verbs in the original classification. Here is our initial proposal for how the task could be conducted.
The task has the following the steps (in blue, tasks to be answered in the Excel spreadsheet: Answer.xlsx): 2 Task A: Decide whether new verbs in each verb class share the similar meanings and syntactic patterns

Materials
You will be provided with 3 documents to support this task.

Task Description
Open the file: Question.xlsx, you will see verbs grouped into classes based on their shared meanings and syntax. They are organised in five columns (see Figure 1) as follows: Class Name: The name of each class.
Sub-class Name: The name of each sub-class.
Class index: The unique identifier (which you will need to use throughout the entire task).
Example Verbs: Example verbs for each class from the original 192 classification.
New candidates: The list of new candidate verbs for verification. Figure 1: A screen-shot of the subset of verb class in Question.xlsx. Class Name is the name of the top-level class. Sub-class Name is the name of each sub-class. Class index is the unique identifier of each class/sub-class. Example Verbs has the member verbs of each sub-class. New candidates contains verbs to be verified by annotators. They are separated from Example verbs by red line for distinction .
Your task is to decide whether each new candidate verb (i.e. New Candidates in Fig 1) has been assigned to the right class/sub-class based on your interpretation of the Example verbs in each class, as well as the sentence examples we provided for each verb (in the Example folder, as describes in Section 2.2.1). You should give your answers on the file we provided (Answer.xlsx, as describes in Section 2.2.2).

Sentence Examples
To help you understand how a verb is used in biomedical text, we provide about thirty example sentences from the corpus we used in our experiment, which illustrate the most common syntactic structures of each verb (in descending order, most common on top and least common at bottom). They are stored in folder: Example with the test verb as the filename. They are organized in 3 columns: The first column is the name of the dependency pattern exemplified in the sentence. The second column is the sentence example. The third column is the word in sentences corresponding to the syntactic pattern (see Figure 2).
Figure 2: A screen-shot of example sentences of increase (in Folder: Example). The first column contains common syntactic patterns for increase in descending order (e.g. obj =object). The second column stores the sentence example for using the corresponding pattern. The third column stores the corresponding words in the sentence for the pattern (e.g. strain) Look into the sentence examples of each New candidates and Example Verbs in each class (as mentioned in Section 2.2), decide if each new candidate verb has been assigned to the right class. Give your answers on our answer template in the pre-defined format, which is described in the next section.

Answers Template
Open the file: Answer.xlsx, you will see all the new candidates (Column 1) and the classes they are currently assigned to (Column 2). Please write down the Class Index (reference from (c) Any verbs you cannot find a good class for, please put in 0 as its class index in the Final Class column.
Give a final class index to each new candidate verb. HOWEVER, A VERB CAN ONLY BE ASSIGNED TO ONE CLASS/SUBCLASS ONLY!!!

Submission
There is not necessarily a fully correct solution and a perfect grouping to the task. It is perfectly reasonable to use your intuition or gut feeling as a biologist while working on this task. Upon finish, please email back your completed Answer.xlsx to Billy at hwc25@cam.ac.uk. Thank you very much for your help.