Skip to main content

Table 2 Sentence-level annotation’s categories, dimensions, and sub-dimensions

From: PhenoDEF: a corpus for annotating sentences with information of phenotype definitions in biomedical literature

Category, dimension, and sub-dimension

Description

Examples (Sentences)

1. Inclusions category ( n = 5)

  1.1. Biomedical & Procedure

Evidence of defining a phenotype when biomedical and procedure entities co-occur with phenotype definition cues

“dyslipidemia was defined as total cholesterol greater than 220 mg/dl…” (PMID:20819866)

  1.2. Standard Codes

Evidence of using standard terminologies that are commonly used in a clinical setting. Examples of these standard coding classifications and/or terminologies are ICD-9/10, SNOMED CT, and CPT codes

“a primary or any secondary discharge diagnosis (International Classification of Diseases, Ninth Revision, Clinical Modification [ICD-9-CM] code) of myoglobinuria (791.3)”. (PMID:15572716)

  1.3. Medications

Evidence of the use of medication for defining a phenotype

“The use of a lipid-lowering medication”. (PMID:20819866)

  1.4. Laboratories

Evidence of using quantitative values reflecting clinical measurable values (i.e. laboratory tests values, vital values, procedures, clinical)

“Dyslipidemia was defined as total cholesterol greater than 220 mg/dl”. (PMID:20819866)

  1.5. Use of Natural Language Processing (NLP)

Evidence of NLP uses accompanied with any of the following entities: biomedical, procedure, and/or medications

“The algorithm uses nonnegated terms indicative of HF: cardiomyopathy, heart failure, congestive heart failure, pulmonary edema, decompensated heart failure, volume overload, and fluid overload”. (PMID:17567225)

2. Intermediate category ( n  = 2)

  2.1. Data sources

Evidence of information relevant to data sources used in the study or the phenotype definition. Some examples when describing a database used, clinical data, and/o electronic health records (EHR)

“Computerized medical and pharmacy records were reviewed”. (PMID:11388131)

  2.2. Study design or IRB

Evidence of information about study design or the IRB. For example, evidence of the method used as “Gold standard”

“STUDY DESIGN: Retrospective chart review”. (PMID:11388131)

3. Exclusion category ( n  = 3)

  3.1. Exclusion 1– Irrelative evidence:

    3.1.1. Location

    3.1.2. Ethical

    3.1.3. Financial

    3.1.4. Patient direct contact

    3.1.5. Provider or researchers (excluding patients)

    3.1.6. Performance

    3.1.7. Quality of Care

Evidence of information about other study methodological details that are not supportive for defining a phenotype directly

“All patients were members of the managed care system and incurred a significant financial advantage from having their prescriptions filled within the system”. (PMID:16765240)

  3.2. Exclusion 2- Computational and statistical evidence:

    3.2.1. Alerts

    3.2.2. Software

    3.2.3. Statistics

Evidence of computational or statistical information that is not supported for phenotype definitions

“We used logistic regression models with generalized estimating equations to adjust for race, year, race x year interactions, age, and sex”. (PMID:16567608)

  3.3. Exclusion 3- Insufficient evidence.

Sentences that do not show any evidence in any of the nine dimensions

“Fallon is offered by about 3,500 employers”. (PMID:12952547)