Skip to main content

Table 2 Presentation of the general characteristics of the datasets used in the experiments; number of pairs and distinct items describe the size of the datasets; the focus of the dataset column contains the information on the type of relationship captured in the reference results

From: tESA: a distributional measure for calculating semantic relatedness

Dataset

No of pairs

Distinct items

Reference

Focus of the dataset

Annotators

Scale

ICC(2,1)

umnsrsSim

566

375

[37]

Similarity

Residents

0 - 1600

0.47

umnsrsRelate

587

397

[37]

Relatedness

Residents

0 - 1600

0.5

mayo101

101

191

[36]

Relatedness

Medical coders

1 - 10

0.5

mayo29c

29

56

[16]

Relatedness

Medical coders

1 - 10

0.78

mayo29ph

29

56

[16]

Relatedness

Physicians

1 - 10

0.68

  1. The ICC (2,1) presents interclass corelation coefficient, which provides an objective measure of inter-annotator agreement; the issues of inter-annotator reliability are covered in more detail in the corresponding reference papers