Skip to main content

Table 1 Annotated clinical corpora and/or annotation guidelines (tks stands for tokens)

From: Design of an extensive information representation scheme for clinical narratives

Reference

Text type/size

Entities/events

Mapping

Attributes

Relations

Temporal data

Ogren et al. [17]

160 clinical notes (47,975 words)

1 (disorder)

SNOMED

3 (context, status, flag)

  

CLEF [15]

Cancer patient records (50 clinical narratives, 50 histopathology and 50 imaging reports)

6 (condition, locus, intervention, investigation, result, drug, device)

UMLS

3 (negation, laterality, sub-location)

5 (has target, has finding, has indication, has location, modifies)

Derived from TimeML

i2b2 [3842, 46]

2009 data: 1,243 discharge summaries 2010 data: 1,748 discharge summaries and progress reports 2012 data: 310 discharge summaries (178,000 tks)

7 in the 2009 call (drug name, dosage, mode, frequency, duration, reason, list/narrative) 3 in the 2011 call (tests, treatments, problems) 6 in the 2012 call (tests, treatments, problems, clinical department, evidential, occurrence)

 

6 values for assertion in the 2011 call (present, absent, possible, conditional, hypothetical and not associated with the patient). In the 2012 call, attributes were type, polarity (positive, negative) and modality (present, proposed, conditional, possible)

8 in the 2011 call (improves, worsens, causes, is administered for, is not administered because of, reveals, conducted, indicates)

Derived from TimeML

IxA-Med-GS [19]

75 clinical reports (41,633 tks)

3 (disorder, drug, procedure)

 

2 (negation, speculation)

2 (caused by, related with)

 

THYME [4749]

1,251 clinical notes (colon and brain cancer)

All events related to the patient’s clinical timeline (e.g. procedures, diseases, diagnoses or patient complaints)

 

7 (docTimeRel, type, polarity, degree, contextual modality, contextual aspect, permanence)

5 types of temporal links (TLINKs) and 4 types of aspectual links (ALINKs)

Derived from TimeML

SHARP Annotation Templates [23]

 

7 (diseases, signs/ symptoms, procedures and methods, anatomical sites, medications, devices, labs)

UMLS (SNOMED and RxNorm)

General (13), lab (7), medication (13) and relation attributes (3)

13 (affects, causes, complicates, contraindicates, degree of, diagnoses, disrupts, is indicated for, location of, treats, manifestation of, prevents, result of)

THYME guidelines

MiPACQ clinical corpus [18]

Clinical narrative and pathology notes (colon cancer, 127,606 tks)

17 (15 UMLS groups + signs/ symptoms + person

UMLS

2 (negation, status)

  

ShARe/CLEF eHealth labs [51, 52]

433 clinical reports (discharge summaries, radiology, electrocardiograms, echocardiograms)

1 (disorder)

UMLS

10 (negation, uncertainty, subject, course, severity, conditional, generic, body location, docTime, temporal)

  

Harvey [53]

750 primary care notes (22,914 tks)

4 (quantity, locative, temporal, on examination)