Skip to main content

Table 3 Descriptive statistics of the lexicon and count of part-of-speech categories

From: MedLexSp – a medical lexicon for Spanish medical natural language processing

 

Lemmas

Forms

CUIs

Single-words

33 988

130 915

-

Multi-words

66 899

171 628

-

Total

100 887

302 543

42 958

M per CUI

2.35

7.04

-

SD

2.16

15.43

-

Max / Min

30 / 1

475 / 1

-

PoS

Example

Count (%)

N

hígado (‘liver’)

90 188 (89.40)

ADJ

hepático (‘hepatic’)

4933 (4.89)

NPR

Streptococcus

2786 (2.76)

ADJ/N

neonato (‘newborn’)

1033 (1.02)

AFF

reno- (‘kidney’)

913 (0.90)

V

sangrar (‘to bleed’)

867 (0.86)

N/NPR

aspirina (‘aspirin’)

107 (0.11)

ADV

levemente (‘mildly’)

40 (0.04)

ADJ/ADV

in situ

20 (0.02)

  1. *Abbreviations: M Mean; SD Standard deviation; CUI Concept unique identifier; N Noun; ADJ Adjective; NPR Proper name: V Verb; AFF Affix; ADV Adverb; ADJ/N 'Adjective’ or ‘noun’ (depending on the context; idem for ADJ/ADV etc.)