Skip to main content

Table 4 Number of correct and incorrect terms for each of the rewrite and suppression rules.

From: Rewriting and suppressing UMLS terms for improved biomedical term identification

Rule

Most frequent

Random

Rewrite rules

Correct

Incorrect

Correct

Incorrect

Syntactic inversion

50

0

100

0

Possessives

50

0

100

0

Short/long form

49

1

98

2

Angular brackets

50

0

97

3

Semantic type

50

0

100

0

Begin parentheses

1

25

-

-

End parentheses

49

1

96

4

Begin brackets

38

12

91

9

End brackets

46

4

95

5

Suppression rules

    

Dosages

50

0

100

0

Short token

50

0

100

0

At-sign

-

-

-

-

EC numbers

50

0

99

0

Any classification

50

0

100

0

Any underspecification

50

0

100

0

Miscellaneous

50

0

100

0

Words > 5

0

50

5

95

  1. The calculations are based on the, for every rule, 50 most frequently found terms in the corpus and 100 randomly selected terms in the corpus (if available). The At-sign rule has no values because terms suppressed by this rule were not found in the corpus.