Skip to main content

Table 2 The target terms for PMSB and VetCN datasets

From: Exploring semantic deep learning for building reliable and reusable one health knowledge from PubMed systematic reviews and veterinary clinical notes

Target terms for this study and their concept identifiers in UMLS and SNOMED CT

BMJ Best Practice document

UMLS CUI

SNOMED CT identifier

VetCN dataset

n-gram (frequency count)

PMSB dataset

n-gram (frequency count)

C0018801

84,114,007

heart_failure (1292)

heart_failure (4615)

Chronic congestive heart failure

C0004096

195,967,001

asthma (1194)

asthma (8891)

Asthma in adults

C0014544

84,757,009

epilepsy (1164)

epilepsy (3521)

Generalised seizure

C0017601

23,986,001

glaucoma (1657)

glaucoma (1635)

Open-angle glaucoma

C1561643

709,044,004

ckd (2698)

CKD (1550)

Chronic kidney disease

C0029408

396,275,006

osteoarthritis (1765)

osteoarthritis (1991)

Osteoarthritis

C0002871

271,737,000

anaemia (1414)

anaemia (1154)

Assessment of anaemia

C0003864

3,723,001

arthritis (8276)

arthritis (1023)

Rheumatoid arthritis

C0011849

73,211,009

diabetes (3660)

diabetes (12846)

Type 2 diabetes in adults

C0020538

38,341,003

hypertension (1132)

hypertension (8365)

Essential hypertension

C0028754

414,916,001

obesity (1763)

obesity (10030)

Obesity in adults

  1. The last column contains the names and references of BMJ Best Practice documents used for validation in Step 5 (see details within the section Materials and methods). The first column contains the UMLS CUI mapped to a target term (n-gram) with the aid of MetaMap. The second column shows the SNOMED CT identifier mapped to the UMLS CUI with the aid of the UMLS API. The third column displays the target terms from the VetCN dataset, i.e. the n-grams with their frequency counts in the corpus appear within brackets. The fourth column shows the target terms from PMSB dataset with the same format of the third column. All target terms (i.e. n-grams) are identical for both datasets except one. The well-known medical condition “chronic kidney disease” with UMLS CUI = “C1561643” has the n-gram “CKD” (i.e. a short form with all the characters in upper case) in the PMSB dataset; while in VetCN dataset it has the n-gram “ckd”. The difference in these two target terms “CKD” and “ckd” happens as in Step 1, VetCN corpus is transformed to lower case while PMSB corpus is not