Skip to main content

Table 1 New terms generated by the rewrite rules and terms suppressed by the suppression rules.

From: Rewriting and suppressing UMLS terms for improved biomedical term identification

Rule Terms in thesaurus
Original 2,696,820
Rewrite rules  
Syntactic inversion 231,976
Possessives 10,388
Short/long form 288
Angular brackets 2,824
Semantic type 7,231
Begin parentheses 376
End parentheses 45,265
Begin brackets 11,402
End brackets 17,620
Suppression rules  
Dosages 171,369
Short token 2,044
At-sign 123
EC numbers 161
Any classification 5,299
Any underspecification 40,237
Miscellaneous 37,885
Words > 5 653,128
  1. "Terms in thesaurus" indicates the number of new terms generated by the rewrite rules and the number of terms suppressed by the suppression rules, for every rule. The row "Original" indicates the total number of terms in the thesaurus when no rewrite or suppression rule was applied.