Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies

Kersloot, Martijn G.; van Putten, Florentien J. P.; Abu-Hanna, Ameen; Cornet, Ronald; Arts, Derk L.

doi:10.1186/s13326-020-00231-z

Journal of Biomedical Semantics

Table 4 Included publications and their evaluation methodologies

From: Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies

Author	Year	Ref. std.	Validation	External	Generalizability ^a	Ref
Afshar	2019	Existing EHR data	Hold-out validation (train, test, development)	No	No, validation is needed	[29]
Alnazzawi	2016	Existing annotated corpus	External	ShARe/CLEF, NCBI disease, Heart failure and pulmonary embolism corpora	Yes, achieves competitive performance on other corpora	[30]
Atutxa	2018	Manual retrospective review	Hold-out validation (train, test, development)	No	Yes, easily portable to other languages	[31]
Barrett	2013	Manual annotations	10-fold cross validation	Multiple datasets (different provider)	Yes, expect that it is generalizable	[32]
Becker	2016	Existing annotated corpus	Not used	No	Not listed	[33]
Becker	2019	Manual annotations	Hold-out validation (train, test, development)	No	Not listed	[34]
Bejan	2015	Manual annotations	External	i2b2 data (2010)	Yes, good performance on the i2b2 dataset, even though not optimized on it	[35]
Castro	2010	Manual annotations	Not used	No	Not listed	[36]
Catling	2018	Existing annotated corpus	Hold-out validation (train, test, development)	No	Not listed	[37]
Chapman	2004	Manual annotations	Not used	No	Yes, generalizable to other domains within and outside of bio surveillance	[38]
Chen	2016	Manual annotations	10-fold cross validation	No	Not listed	[39]
Chiaramello	2016	Manual annotations	Not used	No	Not listed	[40]
Chodey	2016	Existing annotated corpus	Hold-out validation (train, test)	No	Not listed	[41]
Chung	2005	Manual annotations	Hold-out validation (train, test)	Reports from a second hospital	Not listed	[42]
Combi	2018	Manual annotations	Not used	No	Not listed	[43]
deBruijn	2011	Existing annotated corpus	15-fold cross validation	No	Not listed	[44]
Deisseroth	2019	Manual annotations	Hold-out validation (train, test)	Data from a second hospital	Yes, it can be immediately incorporated into clinical practice	[45]
Demner-Fushman	2017	Existing annotated corpus	External	Multiple datasets	Not listed	[46]
Divita	2014	Manual annotations	Not used	No	Not listed	[47]
Duarte	2018	Manual annotations	Hold-out validation (train, test)	Second dataset	Not listed	[48]
Falis	2019	Existing annotated corpus	Hold-out validation (train, test, development)	No	Yes, method is not specific to an ontology, and could be used for a graph of any formation	[49]
Ferrão	2013	Existing EHR data	Hold-out validation (train, test)	No	Not listed	[50]
Gerbier	2011	Manual annotations	Hold-out validation (train, test)	No	Yes, it could also serve other types of clinical decision support systems	[51]
Goicoechea Salazar	2013	Manual annotations	Hold-out validation (train, test)	No	Not listed	[52]
Hamid	2013	Manual annotations	10-fold cross validation	No	Possible, the classifier may be applicable in academic hospital samples	[53]
Hassanzadeh	2016	Existing annotated corpus	Hold-out validation (train, test)	No	Not applicable	[54]
Helwe	2017	Existing annotated corpus	Hold-out validation (train, test, development)	No	Not listed	[55]
Hersh	2001	Manual annotations	Hold-out validation (train, test)	No	Not listed	[56]
Hoogendoorn	2015	Existing EHR data	5-fold cross validation	No	Not listed	[57]
Jindal	2013	Existing annotated corpus	Hold-out validation (train, test)	No	Yes, broad applicability	[58]
Kang	2009	Manual annotations	Hold-out validation (train, test)	No	Yes, extensible to other languages	[59]
Kersloot	2019	Manual annotations	Hold-out validation (development, test)	No	Possible, but external validation is needed	[60]
König	2019	Existing EHR data	Not used	No	Still to be tested	[61]
Li	2015	Manual annotations	10-fold cross validation	No	Not listed	[62]
Li	2019	Existing annotated corpus	Hold-out validation (train, test, development)	No	Not listed	[63]
Lingren	2016	Manual annotations	Hold-out validation (train, test, development)	No	Not listed	[12]
Liu	2019	Manual annotations	Not used	No (but multiple datasets / non-trained)	No, limited because of NYP/CUIMC and Mayo notes.	[64]
Lowe	2009	Manual retrospective review	Hold-out validation (train, test)	No	Yes, has the potential to index other classes of clinical documents	[65]
Luo	2014	Existing EHR data	10-fold cross validation	No	No, challenging, not currently working on it	[66]
Meystre	2006	Manual retrospective review	Not used	No	Not listed	[67]
Meystre	2010	Existing annotated corpus	Hold-out validation (train, test)	No	Not listed	[68]
Minard	2011	Existing annotated corpus	Hold-out validation (train, test, development)	No	Not listed	[69]
Mishra	2019	Manual annotations	Not used	No	Not listed	[70]
Nguyen	2018	Existing EHR data	Not listed	No	Not listed	[71]
Oellrich	2015	Existing annotated corpus	External	Multiple datasets	Not listed	[72]
Patrick	2011	Existing annotated corpus	10-fold cross validation	No	Yes, adaptable to different requirements in clinical information extraction and classification by choosing relevant feature sets	[73]
Pérez	2018	Existing annotated corpus	Hold-out validation (train, test, development)	No	Yes, extensible to different hospital-sections and hospitals	[74]
Reátegui	2018	Existing annotated corpus	Not used	No	Not listed	[75]
Roberts	2011	Existing annotated corpus	Hold-out validation (train, test)	No	Not listed	[76]
Rousseau	2019	Manual annotations	Not used	No	Not listed	[77]
Savova	2010	Manual annotations	10-fold cross validation	No	Yes, implemented in several applications	[78]
Shivade	2015	Manual annotations	Hold-out validation (train, test)	No	Not listed	[11]
Shoenbill	2019	Manual annotations	Hold-out validation (train, test)	No	Yes, can allow further evaluation and improvement in care delivery models and treatment approaches to multiple chronic illnesses	[79]
Sohn	2014	Manual annotations	Hold-out validation (train, test, development)	No	Yes, with adaptions: create flexible mechanism for adaptation process	[80]
Solti	2008	Manual annotations	Hold-out validation (train, test)	No	Not listed	[81]
Soriano	2019	Manual annotations	Not listed	No	Not listed	[82]
Soysal	2018	Existing annotated corpus	Hold-out validation (train, test)	No	Yes, can be used to quickly develop customized clinical information extraction pipelines	[83]
Spasić	2015	Manual annotations	Hold-out validation (train, test)	No	Not listed	[84]
Strauss	2013	Manual annotations	Not used	No	Yes, can be shared between institutions and used to support clinical + epidemiological research	[85]
Sung	2018	Manual annotations	Not listed	No	Not listed	[86]
Tchechmedjiev	2018	Existing annotated corpus	Hold-out validation (train, test, development)	No	Yes, but not universally	[87]
Ternois	2018	Existing EHR data	5-fold cross validation + Hold-out validation (train, test)	No	Not listed	[88]
Travers	2004	Manual retrospective review	Not used	No	Not listed	[89]
Tulkens	2019	Existing annotated corpus	Hold-out validation (train, test, development)	No	Not listed	[90]
Usui	2018	Manual annotations	Not used	No	Not listed	[91]
Valtchinov	2019	Manual annotations	Not used	No	No	[92]
Wadia	2018	Manual annotations	Not used	No	Not listed	[93]
Walker	2019	Manual retrospective review	Hold-out validation (development, test)	No	Yes, it can be incorporated in institutional data warehouse	[94]
Xie	2019	Existing annotated corpus	Hold-out validation (train, test, development)	No	Not listed	[95]
Xu	2011	Manual annotations	Hold-out validation (train, test)	No	Yes, generable approach to combine information from heterogeneous data sources in EHRs	[96]
Yadav	2013	Manual annotations	Not used	No	Yes, should be broadly applicate to outcomes of clinical interest	[97]
Yao	2019	Existing annotated corpus	Hold-out validation (train, test)	No	Not listed	[98]
Zeng	2018	Manual annotations	5-fold cross validation + Hold-out validation (train, test)	No	Yes, potential to be replicated	[99]
Zhang	2013	Existing annotated corpus	External	Two different sets with same settings	Yes, can be adapted to different semantic categories and text genres	[100]
Zhou	2006	Manual annotations	5-fold cross validation	No	Not listed	[101]
Zhou	2011	Manual retrospective review	Hold-out validation (train, test)	No	Not listed	[102]
Zhou	2014	Manual annotations	Not used	No	Not listed	[103]

^a As reported by authors

Back to article page

ISSN: 2041-1480

Contact us

General enquiries: journalsubmissions@springernature.com