Skip to main content

Table 8 Results on clinical evaluation set

From: Synonym extraction and abbreviation expansion with ensembles of semantic spaces

Evaluation configuration

Abbr →Exp

Exp →Abbr

Syn

RI_4+RP_4_sw

RI_4+RP_4_sw

RI_20+RP_4_sw

P

R

P

R

P

R

RI Baseline

0.04

0.22

0.03

0.19

0.07

0.39

RP Baseline

0.04

0.23

0.04

0.24

0.06

0.36

Clinical Ensemble

0.05

0.31

0.03

0.20

0.07

0.44

+Post-Processing (Top 10)

0.08

0.42

0.05

0.33

0.08

0.43

+Dynamic Cut-Off (Top ≤ 10)

0.11

0.41

0.12

0.33

0.08

0.42

  1. Results (P = weighted precision, R = recall, top ten) of the best models with and without post-processing on the three tasks. Dynamic # of suggestions allows the model to suggest less than ten terms in order to improve precision. The results are based on the application of the model combinations to the evaluation data. The improvements in recall between the best baseline and the ensemble method for the synonym task and for the abbr →exp task are both statistically significant for a p-value < 0.05. (abbr →exp task: p-value = 0.022 and synonym task: p-value = 0.002.) The improvement in recall that was achieved by post-processing is statistically significant for both abbreviation tasks (p-value = 0.001 for abbr →exp and p-value = 0.000 for exp →abbr).