Skip to main content

Table 2 The target terms for PMSB and VetCN datasets

From: Exploring semantic deep learning for building reliable and reusable one health knowledge from PubMed systematic reviews and veterinary clinical notes

Target terms for this study and their concept identifiers in UMLS and SNOMED CTBMJ Best Practice document
UMLS CUISNOMED CT identifierVetCN dataset
n-gram (frequency count)
PMSB dataset
n-gram (frequency count)
C001880184,114,007heart_failure (1292)heart_failure (4615)Chronic congestive heart failure
C0004096195,967,001asthma (1194)asthma (8891)Asthma in adults
C001454484,757,009epilepsy (1164)epilepsy (3521)Generalised seizure
C001760123,986,001glaucoma (1657)glaucoma (1635)Open-angle glaucoma
C1561643709,044,004ckd (2698)CKD (1550)Chronic kidney disease
C0029408396,275,006osteoarthritis (1765)osteoarthritis (1991)Osteoarthritis
C0002871271,737,000anaemia (1414)anaemia (1154)Assessment of anaemia
C00038643,723,001arthritis (8276)arthritis (1023)Rheumatoid arthritis
C001184973,211,009diabetes (3660)diabetes (12846)Type 2 diabetes in adults
C002053838,341,003hypertension (1132)hypertension (8365)Essential hypertension
C0028754414,916,001obesity (1763)obesity (10030)Obesity in adults
  1. The last column contains the names and references of BMJ Best Practice documents used for validation in Step 5 (see details within the section Materials and methods). The first column contains the UMLS CUI mapped to a target term (n-gram) with the aid of MetaMap. The second column shows the SNOMED CT identifier mapped to the UMLS CUI with the aid of the UMLS API. The third column displays the target terms from the VetCN dataset, i.e. the n-grams with their frequency counts in the corpus appear within brackets. The fourth column shows the target terms from PMSB dataset with the same format of the third column. All target terms (i.e. n-grams) are identical for both datasets except one. The well-known medical condition “chronic kidney disease” with UMLS CUI = “C1561643” has the n-gram “CKD” (i.e. a short form with all the characters in upper case) in the PMSB dataset; while in VetCN dataset it has the n-gram “ckd”. The difference in these two target terms “CKD” and “ckd” happens as in Step 1, VetCN corpus is transformed to lower case while PMSB corpus is not