Exploring semantic deep learning for building reliable and reusable one health knowledge from PubMed systematic reviews and veterinary clinical notes

Table 2 The target terms for PMSB and VetCN datasets

Target terms for this study and their concept identifiers in UMLS and SNOMED CT				BMJ Best Practice document
UMLS CUI	SNOMED CT identifier	VetCN dataset n-gram (frequency count)	PMSB dataset n-gram (frequency count)	BMJ Best Practice document
C0018801	84,114,007	heart_failure (1292)	heart_failure (4615)	Chronic congestive heart failure
C0004096	195,967,001	asthma (1194)	asthma (8891)	Asthma in adults
C0014544	84,757,009	epilepsy (1164)	epilepsy (3521)	Generalised seizure
C0017601	23,986,001	glaucoma (1657)	glaucoma (1635)	Open-angle glaucoma
C1561643	709,044,004	ckd (2698)	CKD (1550)	Chronic kidney disease
C0029408	396,275,006	osteoarthritis (1765)	osteoarthritis (1991)	Osteoarthritis
C0002871	271,737,000	anaemia (1414)	anaemia (1154)	Assessment of anaemia
C0003864	3,723,001	arthritis (8276)	arthritis (1023)	Rheumatoid arthritis
C0011849	73,211,009	diabetes (3660)	diabetes (12846)	Type 2 diabetes in adults
C0020538	38,341,003	hypertension (1132)	hypertension (8365)	Essential hypertension
C0028754	414,916,001	obesity (1763)	obesity (10030)	Obesity in adults

The last column contains the names and references of BMJ Best Practice documents used for validation in Step 5 (see details within the section Materials and methods). The first column contains the UMLS CUI mapped to a target term (n-gram) with the aid of MetaMap. The second column shows the SNOMED CT identifier mapped to the UMLS CUI with the aid of the UMLS API. The third column displays the target terms from the VetCN dataset, i.e. the n-grams with their frequency counts in the corpus appear within brackets. The fourth column shows the target terms from PMSB dataset with the same format of the third column. All target terms (i.e. n-grams) are identical for both datasets except one. The well-known medical condition “chronic kidney disease” with UMLS CUI = “C1561643” has the n-gram “CKD” (i.e. a short form with all the characters in upper case) in the PMSB dataset; while in VetCN dataset it has the n-gram “ckd”. The difference in these two target terms “CKD” and “ckd” happens as in Step 1, VetCN corpus is transformed to lower case while PMSB corpus is not

ISSN: 2041-1480