Skip to main content

Table 6 Error analysis of the annotation with disagreements

From: PhenoDEF: a corpus for annotating sentences with information of phenotype definitions in biomedical literature

Error

Dimension

Examples (Sentences)

Abbreviated terms

Biomedical & Procedure

"Events that occurred during follow-up were identified from hospitalization records, and ARIC and CHS study" (PMID25104519)

Standard codes

"Finally, the Apollo Data Repository provided data for ICDs" (PMID26961369)

Medications

" ‘‘common’’ side effects, e.g. headache, to judge the relevance of side effects associated with AZA”. (PMID24177317)

Use of NLP

"From this cohort, we identified 15,761 patients with HPI that was processed through a natural language processing algorithm…” (PMID25567824)

Data

“Cohort with HPI data” (PMID25567824)

EXC1 – irrelevant evidence

"190 patients completed the SCID assessment"(PMID25827034)

EXC2 – Computational and statistical evidence

"The MCMC method" (PMID21931496)

Missed keywords or criteria

Use of NLP

"The algorithm uses non-negated terms indicative of HF" (PMID17567225)

Data

"If data on weight and height were available” (PMID21862746)

EXC1 – irrelevant evidence (financial)

"until termination of insurance coverage”. (PMID12952547)

EXC1 – irrelevant evidence (ethical)

"To protect patient confidentiality, all personal identifiers are deleted” (PMID21051745)

EXC1 – irrelevant evidence (location of the study)

"We randomly sampled outpatient clinical encounters from October 1, 2003 through March 31, 2004 at VA Maryland (VAMHCS) and at VA Salt Lake City (VASLCHCS) Health Care systems”. (PMID20976281)

EXC2 – Computational and statistical evidence

"Characteristics were measured during the one-year baseline period (i.e., before time zero)”. (PMID20112435)

Without co-occurrence with a biomedical, procedure, or medication terms

Use of NLP

"Humedica derives NLP items from text entries that correspond primarily to terms in two large dictionaries, SNOMED and MedDRA" (PMID26725697)

Data

"If the first record for a woman was either …" (PMID22071529)

Term ambiguity

Biomedical & procedures events

"Only acute conditions occurring during the first 24 h of hospital admission were considered”. (PMID24734124)

Study design or IRB

"The nucleotide reference for this allele is guanine. 4″. (PMID26221186)

EXC2 – Computational and statistical evidence

"More points mean a higher risk of hyperkalemia”. (PMID20112435)

Neither biomedical nor procedure (e.g. Social status)

Biomedical & Procedure

"We created a binary variable for marital status, where “single” included those patients classified as divorced, single, widowed, or separated”. (PMID25091637)

A not clear statement of using standard codes

Standard codes

"Outcomes were evaluated by administratively coded data” (PMID26370823)

Assigning terms as biomedical & procedure vs. medications (e.g. substances)

Biomedical & Procedure/Medications

"The most recent fasting lipid profile in patients with dyslipidemia and glycosylated hemoglobin level in patients with diabetes” (PMID11388131)

Spelling and short forms

Medications

"Asthma meds refilled regularly”. (PMID12952547)

Without co-occurrence with supportive definition evidence

Biomedical & Procedure/Medications

"reports KD = 9100 for bupropion and KD > 10 000 for mirtazapine (vs 200 for nefazodone)”. (PMID22466034)

“More than or less than” value, but not directly relevant to phenotyping

Clinical measurable values

" ≥ 2 years of observation before the period of interest; n = 50″. (PMID23449283)

New keywords for the dimension

EXC2 – Computational and statistical evidence

Examples of new keywords describing “EXC2” are: risk score, inter-rater variability, custom-designed data entry template, predictor variable, Tukey multiple comparison test, Web-accessible, teleconferences, propensity-matched, machine-implementable rule, Illumina Omni1_- QUAD, Illumina 660 W, TaqMan, Illumina 660-Quad, and Illumina