Skip to main content

Table 1 ShARE/CLEF e-health training corpus semantic types

From: Concept selection for phenotypes and diseases using learn to rank

ID

UMLS Semantic type

Freq.

Unique

Av. term length

T047

Disease or syndrome

1803

410

1.97

T184

Sign or symptom

842

163

1.56

T046

Pathologic function

518

133

1.65

T037

Injury or poisoning

213

96

2.00

T019

Congenital abnormality

184

25

3.61

T190

Anatomical abnormality

103

36

1.77

T191

Neoplastic process

92

49

1.87

T048

Mental or behavioral dysfunction

84

32

1.76

T033

Finding

45

15

2.90

T020

Acquired abnormality

40

17

1.93

  1. Distribution of UMLS semantic types for annotations by frequency and frequency without duplication as well as the average term length in tokens.