Skip to main content

Table 2 The data sets used in our experiments

From: A cascade of classifiers for extracting medication information from discharge summaries

Data Sets

# of Summaries

# of Entries

# of Fields

# of Name

# of Dose

# of Freq

# of Mode

# of Duration

# of Reason

Training set

110

5970 (54.3)

14886 (135.3)

5684 (51.7)

2929 (26.6)

2740 (24.9)

2146 (19.5)

302 (2.7)

1085 (9.9)

Dev set

35

2401 (68.6)

5988 (171.1)

2302 (65.8)

1163 (33.2)

1096 (31.3)

880 (25.1)

111 (3.2)

436 (12.5)

Test set

251

8936 (35.6)

22041 (87.8)

8495 (33.8)

4387 (17.5)

3999 (15.9)

3307 (13.2)

511 (2.0)

1342 (5.3)

  1. The numbers in parentheses are the average numbers of entries or fields per discharge summary.