Skip to main content

Table 2 The data sets used in our experiments

From: A cascade of classifiers for extracting medication information from discharge summaries

Data Sets

# of Summaries

# of Entries

# of Fields

# of Name

# of Dose

# of Freq

# of Mode

# of Duration

# of Reason

Training set


5970 (54.3)

14886 (135.3)

5684 (51.7)

2929 (26.6)

2740 (24.9)

2146 (19.5)

302 (2.7)

1085 (9.9)

Dev set


2401 (68.6)

5988 (171.1)

2302 (65.8)

1163 (33.2)

1096 (31.3)

880 (25.1)

111 (3.2)

436 (12.5)

Test set


8936 (35.6)

22041 (87.8)

8495 (33.8)

4387 (17.5)

3999 (15.9)

3307 (13.2)

511 (2.0)

1342 (5.3)

  1. The numbers in parentheses are the average numbers of entries or fields per discharge summary.