Skip to main content

Table 1 Results of the nearest neighbors (NN) experiments with different word embedding models

From: MedLexSp – a medical lexicon for Spanish medical natural language processing

Word embedding model

OOVs

% of NN

OOVs mapped to CUI

% of OOVs mapped to CUI

SBCWE, uncased,

740

74.0%

48

6.49%

SkipGram, d=100

    

SBCWE, uncased,

742

74.2%

51

6.88%

CBOW, d=100

    

SBCWE, uncased,

762

76.2%

45

5.91%

SkipGram, d=50

    

SBCWE, uncased,

732

73.2%

47

6.42%

CBOW, d=50

    

SBCWE, uncased,

752

75.2%

46

6.12%

SkipGram, d=300

    

SBCWE, uncased,

741

74.1%

50

6.75%

CBOW, d=300

    

COVID-19 corpus, uncased,

677

67.7%

56

8.27%

SkipGram, d=100, min=5

    

COVID-19 corpus, uncased,

690

69.0%

78

11.30%

SkipGram, d=100, min=3

    
  1. Abbreviations: CUI UMLS concept unique identifier; d: embedding dimensions;
  2. NN Nearest neighbors, OOVs Out-of-vocabulary items;
  3. SBCWE Spanish Biomedical and Clinical Word Embeddings