Skip to main content

Table 4 Number of correct and incorrect terms for each of the rewrite and suppression rules.

From: Rewriting and suppressing UMLS terms for improved biomedical term identification

Rule Most frequent Random
Rewrite rules Correct Incorrect Correct Incorrect
Syntactic inversion 50 0 100 0
Possessives 50 0 100 0
Short/long form 49 1 98 2
Angular brackets 50 0 97 3
Semantic type 50 0 100 0
Begin parentheses 1 25 - -
End parentheses 49 1 96 4
Begin brackets 38 12 91 9
End brackets 46 4 95 5
Suppression rules     
Dosages 50 0 100 0
Short token 50 0 100 0
At-sign - - - -
EC numbers 50 0 99 0
Any classification 50 0 100 0
Any underspecification 50 0 100 0
Miscellaneous 50 0 100 0
Words > 5 0 50 5 95
  1. The calculations are based on the, for every rule, 50 most frequently found terms in the corpus and 100 randomly selected terms in the corpus (if available). The At-sign rule has no values because terms suppressed by this rule were not found in the corpus.