Skip to main content

Table 4 Examples of information about the application of information extraction rules

From: Sequential pattern mining for discovering gene interactions and their contextual information from biomedical texts

 

Number of

Number of

Information

retrieved interactions

true positives

extraction rule

  

AGENE AGENE the@dt

6

1

response@nn

  

AGENE AGENE serine@nn

3

3

AGENE reveal@vvd AGENE

0

undefined

AGENE association@nn

6

4

with@in AGENE

  

AGENE bind@vvz to@to AGENE

8

5

  1. Table 4 gives an excerpt of provided information about patterns extracted from the PubMed corpus. The meaning of the columns is: sequential pattern, number of interactions detected by the pattern and number of detected interactions that are correct with respect to the oracle, i.e. interactions that also exist in BioGRID.
  2. The first pattern can be read as “a gene followed by a gene then by the word the and the word response”. This pattern detects 6 interactions and 1 is in BioGRID. The second pattern can be read as “a gene followed by a gene then by the word serine”. It detects 3 interactions that are all in BioGRID. The third pattern can be read as “a gene followed by the verb reveal in past tense, then by a gene”. This pattern does not detect interactions in the rule validation corpus, thus no information is provided to evaluate it. The fourth pattern can be read as “a gene followed by the noun association, then by the word with and a gene name”. It detects 6 interactions out of which 4 are in BioGRID. The fifth pattern can be read as “a gene followed by the verb bind in present tense, then by the word to and a gene name”. This pattern detects 8 interactions and 5 of them are in BioGRID. For example, it detects that the following complex sentence “Cbl is a cytosolic protein that is rapidly tyrosine phosphorylated in response to Fc receptor activation and binds to the adaptor proteins Grb2, CrkL, and Nck.” contains an association between two signalling molecules (Cbl and Grb2).