Table 2 Sentence-level annotation’s categories, dimensions, and sub-dimensions

From: PhenoDEF: a corpus for annotating sentences with information of phenotype definitions in biomedical literature

Category, dimension, and sub-dimension Description Examples (Sentences)
1. Inclusions category ( n = 5)
  1.1. Biomedical & Procedure Evidence of defining a phenotype when biomedical and procedure entities co-occur with phenotype definition cues “dyslipidemia was defined as total cholesterol greater than 220 mg/dl…” (PMID:20819866)
  1.2. Standard Codes Evidence of using standard terminologies that are commonly used in a clinical setting. Examples of these standard coding classifications and/or terminologies are ICD-9/10, SNOMED CT, and CPT codes “a primary or any secondary discharge diagnosis (International Classification of Diseases, Ninth Revision, Clinical Modification [ICD-9-CM] code) of myoglobinuria (791.3)”. (PMID:15572716)
  1.3. Medications Evidence of the use of medication for defining a phenotype “The use of a lipid-lowering medication”. (PMID:20819866)
  1.4. Laboratories Evidence of using quantitative values reflecting clinical measurable values (i.e. laboratory tests values, vital values, procedures, clinical) “Dyslipidemia was defined as total cholesterol greater than 220 mg/dl”. (PMID:20819866)
  1.5. Use of Natural Language Processing (NLP) Evidence of NLP uses accompanied with any of the following entities: biomedical, procedure, and/or medications “The algorithm uses nonnegated terms indicative of HF: cardiomyopathy, heart failure, congestive heart failure, pulmonary edema, decompensated heart failure, volume overload, and fluid overload”. (PMID:17567225)
2. Intermediate category ( n  = 2)
  2.1. Data sources Evidence of information relevant to data sources used in the study or the phenotype definition. Some examples when describing a database used, clinical data, and/o electronic health records (EHR) “Computerized medical and pharmacy records were reviewed”. (PMID:11388131)
  2.2. Study design or IRB Evidence of information about study design or the IRB. For example, evidence of the method used as “Gold standard” “STUDY DESIGN: Retrospective chart review”. (PMID:11388131)
3. Exclusion category ( n  = 3)
  3.1. Exclusion 1– Irrelative evidence:
    3.1.1. Location
    3.1.2. Ethical
    3.1.3. Financial
    3.1.4. Patient direct contact
    3.1.5. Provider or researchers (excluding patients)
    3.1.6. Performance
    3.1.7. Quality of Care
Evidence of information about other study methodological details that are not supportive for defining a phenotype directly “All patients were members of the managed care system and incurred a significant financial advantage from having their prescriptions filled within the system”. (PMID:16765240)
  3.2. Exclusion 2- Computational and statistical evidence:
    3.2.1. Alerts
    3.2.2. Software
    3.2.3. Statistics
Evidence of computational or statistical information that is not supported for phenotype definitions “We used logistic regression models with generalized estimating equations to adjust for race, year, race x year interactions, age, and sex”. (PMID:16567608)
  3.3. Exclusion 3- Insufficient evidence. Sentences that do not show any evidence in any of the nine dimensions “Fallon is offered by about 3,500 employers”. (PMID:12952547)