Skip to main content

Table 1 New terms generated by the rewrite rules and terms suppressed by the suppression rules.

From: Rewriting and suppressing UMLS terms for improved biomedical term identification

Rule

Terms in thesaurus

Original

2,696,820

Rewrite rules

 

Syntactic inversion

231,976

Possessives

10,388

Short/long form

288

Angular brackets

2,824

Semantic type

7,231

Begin parentheses

376

End parentheses

45,265

Begin brackets

11,402

End brackets

17,620

Suppression rules

 

Dosages

171,369

Short token

2,044

At-sign

123

EC numbers

161

Any classification

5,299

Any underspecification

40,237

Miscellaneous

37,885

Words > 5

653,128

  1. "Terms in thesaurus" indicates the number of new terms generated by the rewrite rules and the number of terms suppressed by the suppression rules, for every rule. The row "Original" indicates the total number of terms in the thesaurus when no rewrite or suppression rule was applied.