Skip to main content

Table 3 Included publications and their first author, year, title, and country

From: Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies

Author Year Country Challenge Induced objective Data origin Dataset Data language Used system Term. Sys. In use Source code Ref
Afshar 2019 USA No Information extraction Clinical Data Warehouse Data Own English New (+ existing) UMLS (CPT, HCPCS, ICD-10, ICD10CM / ICD9CM, LOINC, MeSH, SNOMED-CT, RxNorm) Not listed No, only links to cTAKES source code [29]
Alnazzawi 2016 UK No Information enrichment PhenoCHF corpus 1 Existing English Existing UMLS Not listed Not applicable [30]
Atutxa 2018 Spain No Information enrichment EHR documents Own Spanish New ICD (SNOMED-CT for normalization) Not yet, aim to embed it in human-supervised loop Not listed [31]
Barrett 2013 USA No Information extraction Palliative care consult letters Own English New SNOMED CT Not listed No, but planned [32]
Becker 2016 Germany No Information extraction ShARe/CLEF corpus (2013) 2 Existing German Existing SNOMED CT (English), UMLS (German) Not yet, still under development Not applicable [33]
Becker 2019 Germany No Information extraction Clinical notes of patients with known colorectal cancer Own German New (+ existing) UMLS Yes, led to improved quality of care for colorectal patients Not listed [34]
Bejan 2015 USA No Information extraction Discharge summaries and i2b2/VA challenge dataset (2010) 3 Own + Existing English Existing UMLS No Not applicable [35]
Castro 2010 Spain No Information extraction Clinical notes with ‘most relevant information’ Own Spanish Existing SNOMED CT Not listed Not applicable [36]
Catling 2018 UK No Software development and evaluation MIMIC-III dataset 4 Existing English New ICD-9-CM Not listed Not listed [37]
Chapman 2004 USA No Information extraction Emergency department reports Own English Existing UMLS Not listed Not applicable [38]
Chen 2016 USA No Information enrichment Discharge summaries and progress notes Own English New (+ existing) UMLS Not listed Not listed [39]
Chiaramello 2016 Italy No Information extraction Clinical notes (cardiology, diabetology, hepatology, nephrology, and oncology) Own Italian Existing UMLS Not listed Not applicable [40]
Chodey 2016 USA SemEval (2014) Information extraction ICU Data: Discharge summaries, ECG, echo, and radiology Existing English New (+ existing) UMLS Not listed Not listed [41]
Chung 2005 USA No Information extraction Echocardiogram reports Own English New (+ existing) UMLS Not yet, it will be used to populate a registry Not listed [42]
Combi 2018 Italy No Information extraction VigiSegn (adverse drug reactions) reports Own Italian + English New MedDRA Yes, implemented in VigiFarmaco Pseudocode [43]
De Bruijn 2011 Canada i2b2/VA (2010) Information extraction Hospital discharge summaries and progress reports Existing English New (+ existing) UMLS Not listed Not listed [44]
Deisseroth 2019 USA No Information extraction Six sets of real patient data from four different medical centers. Own English New HPO Not listed Yes [45]
Demner-Fushman 2017 USA No Software development and evaluation BioScope 5, NCBI disease corpus 6, i2b2/VA challenge corpus (2010) 3, ShARe corpus 7, LHC test collection (biological/clinical journal abstracts) Existing English New (+ existing) UMLS Yes, used in other papers identified in literature search Yes [46]
Divita 2014 USA Parts: i2b2/VA (2010) Software development and evaluation Randomly selected clinical records from the most frequent document types Own English New UMLS (level 0 + 9) Yes, used by VA Informatics and Computing Infrastructure Yes [47]
Duarte 2018 Portugal No Information enrichment Death certificates, clinical bulletins, and autopsy reports Own Portuguese New ICD-10 Yes, used by Portugese Ministry of Health for near real-time death cause surveillance Not listed [48]
Falis 2019 UK No Information extraction MIMIC-III dataset 4 Existing English New ICD-9 Not listed Not listed [49]
Ferrão 2013 Portugal No Information enrichment Inpatient adult episodes from the EHR Own Portuguese New ICD-9-CM Not listed Not listed [50]
Gerbier 2011 France No Information extraction Computerized emergency department medical records Own French New ICD-10, CCAM, SNOMED CT, ATC, MeSH, ICPC-2, DCR Not yet, will be integrated into a CDSS Not listed [51]
Goicoechea Salazar 2013 Spain No Information enrichment Diagnostic text from patient records Own Spanish New ICD-9-CM Not listed Not listed [52]
Hamid 2013 USA No Classification Notes of Iraq and Afghanistan veterans from the VA national clinical database Own English Existing UMLS Not listed Not applicable [53]
Hassanzadeh 2016 Australia No Information extraction ShARe/CLEF corpus (2013) 2 Existing English Existing UMLS, SNOMED CT Not applicable Not applicable [54]
Helwe 2017 Lebanon No Computer-assisted coding MIMIC-III dataset Existing English New UMLS, ICD Not listed Not listed [55]
Hersh 2001 USA No Information enrichment Radiology image reports Own English Existing UMLS No, still in development/testing Pseudocode [56]
Hoogendoorn 2015 Netherlands No Prediction Consultation notes of patients in a primary care setting Own Dutch New SNOMED-CT, UMLS, ICPC Not listed Not listed [57]
Jindal 2013 USA i2b2 (2012) Information extraction i2b2 challenge corpus (2012) 8 Existing English New (+ existing) UMLS, SNOMED CT, MeSH Not listed Not listed [58]
Kang 2009 Korea No Information extraction Discharge summaries Own Korean New KOMET, UMLS Not listed Not listed [59]
Kersloot 2019 Netherlands No Information extraction (Non-small cell) Lung cancer charts Own English New (+ existing) SNOMED CT Not listed Yes [60]
König 2019 Germany No Software development and evaluation Discharge letters from BASE-II study Own German New (+ existing) Wingert-Nomenclature No, still has to prove its value Not listed [61]
Li 2015 USA No Information comparison Clinical notes and discharge prescription lists Own English New (+ existing) UMLS, SNOMED CT, RxNorm Not yet, plans to move to production Pseudocode [62]
Li 2019 USA No Information extraction EHR notes Own English New (+ existing) UMLS, SNOMED CT, MedDRA Not listed Not listed [63]
Lingren 2016 USA No Classification Structured and unstructured data from two EHR databases Own English New (+ existing) UMLS, ICD-9, RxNorm Not listed Not listed [12]
Liu 2019 USA No Information extraction Clinical notes from different institutions + PubMed Case report abstracts Own + Existing English Existing HPO Not listed Not applicable [64]
Lowe 2009 USA No Information extraction Single-specimen pathology reports Own English Existing UMLS, SNOMED CT Not listed Not applicable [65]
Luo 2014 USA No Information extraction Pathology reports Own English New (+ existing) UMLS, SNOMED CT Yes, currently working on project in multiple hospitals Not listed [66]
Meystre 2006 USA No Information enrichment Clinical documents form adult inpatients in a cardiovascular unit Own English New (+ existing) UMLS (level 0), SNOMED CT Not yet, testing in practice Not listed [67]
Meystre 2010 USA i2b2 (2009) Information extraction i2b2 challenge dataset (2009) 9 Existing English New UMLS Not yet, possible integration in research infrastructure Not listed [68]
Minard 2011 France i2b2/VA (2010) Information extraction i2b2/VA challenge corpus (2010) 3 Existing English New (+ existing) UMLS Not listed Not listed [69]
Mishra 2019 USA No Information extraction Clinical notes from NIH Clinical Center data warehouse Own English Existing UMLS, HPO Not listed Not applicable [70]
Nguyen 2018 Australia No Computer-assisted coding Hospital progress notes Own English New (+ existing) SNOMED CT, ICD-10-AM Not listed Not listed [71]
Oellrich 2015 UK No Information extraction PubMed abstracts, clinical trial information, i2b2/VA challenge corpus (2010) 3, SHARE/CLEF (2013) 2 Existing English Existing UMLS Not listed Not applicable [72]
Patrick 2011 Australia i2b2/VA (2010) Information extraction i2b2/VA challenge corpus (2010) 3 Existing English New UMLS, SNOMED CT Not listed Not listed [73]
Pérez 2018 Spain No Text processing Spontaneous DTs randomly selected entries Own Spanish New ICD Not listed Not listed [74]
Reátegui 2018 Canada No Information extraction i2b2 challenge corpus (2008) 10 Existing English New (+ existing) UMLS, SNOMED CT, RxNorm Not listed Not listed [75]
Roberts 2011 USA i2b2/VA (2010) Information extraction i2b2/VA challenge corpus (2010) 3 Existing English New (+ existing) UMLS, ICD-9 Not listed Not listed [76]
Rousseau 2019 USA No Information comparison ED encounters for patients with headaches who received head CT Own English Existing UMLS: SNOMED CT, RadLex Not listed Not applicable [77]
Savova 2010 USA i2b2 (2006, 2008) Information extraction Subset of clinical notes from the EMR Own English New (+ existing) UMLS, SNOMED CT, RxNorm Yes, used in other papers identified in literature search Yes [78]
Shivade 2015 USA i2b2/UTHealth (2014) Classification i2b2 challenge corpus (2014) 11 Existing English Existing UMLS Not listed Not applicable [11]
Shoenbill 2019 USA No Information extraction EHR notes from hypertension patients Own English Existing UMLS, SNOMED CT Not listed Not applicable [79]
Sohn 2014 USA No Information extraction Clinical notes with medication mentions Own English New RxNorm Not listed Yes [80]
Solti 2008 USA No Information enrichment Cardiology ambulatory progress notes Own English Existing UMLS Not listed Not applicable [81]
Soriano 2019 Spain No Information extraction clinical emergency discharge reports Own Spanish New SNOMED CT Not yet Yes [82]
Soysal 2018 USA Parts: i2b2 (2009 + 2010), ShARe/CLEF (2013), Sem-EVAL (2014) Software development and evaluation Discharge summaries from the i2b2/VA challenge corpus (2010) 3, outpatient clinic visit notes, mock clinical documents Own + Existing English New UMLS Yes, used by various institutions and industrial entities Yes [83]
Spasić 2015 UK No Information extraction MRI reports of patients Own English New (+ existing) TRAK, UMLS, MEDCIN, RadLex Not listed Yes [84]
Strauss 2013 USA No Information extraction Pathology reports of breast and prostate cancer patients Own English New SNOMED CT Not listed Yes [85]
Sung 2018 Taiwan No Information extraction Cases of adult patients with AIS Own English Existing UMLS Not listed Not applicable [86]
Tchechmedjiev 2018 France No Information extraction Quaero (French MEDLINE abstract titles + EMEA drug labels) + CépiDC (ICD-10 coding of death certificates) Existing French New (+ existing) UMLS terminologies (ICD-10) Yes, available in SIFR BioPortal Yes [87]
Ternois 2018 France No Classification Endoscopy reports written between 2015 and 2016 Own French New CCAM Not listed Not listed [88]
Travers 2004 USA No Information extraction Chief complaint text entries for all emergency department visits Own English New UMLS Not listed Not listed [89]
Tulkens 2019 Belgium No Information extraction i2b2/VA challenge corpus (2010) 3 Existing English New (+ existing) UMLS Not listed Yes [90]
Usui 2018 Japan No Prediction Electronic medication history data from pharmacy Own Japanese New ICD-10 Not yet, expect to use it Not listed [91]
Valtchinov 2019 USA No Classification Radiology reports, emergency department notes + other clinical reports Own English Existing SNOMED CT, RadLex Not listed Not applicable [92]
Wadia 2018 USA No Classification Chest CT reports Own English Existing SNOMED CT, UMLS Not listed Not applicable [93]
Walker 2019 USA No Information extraction Treatment sites from EMR Own English New UMLS Not listed Not listed [94]
Xie 2019 China No Information extraction MIMIC-III dataset 4 Existing English New ICD-9-CM, ICD-10 Not listed Not listed [95]
Xu 2011 USA No Classification CRC patient cases from the Synthetic Derivative database Own English Existing UMLS No, still under development Not applicable [96]
Yadav 2013 USA No Prediction Emergency department CT imaging reports Own English Existing UMLS Not listed Yes, command line command [97]
Yao 2019 USA No Prediction i2b2 challenge corpus (2008) 10 Existing English New (+ existing) UMLS Not listed Part (Sorl) [98]
Zeng 2018 USA No Classification Progress notes and breast cancer surgical pathology reports Own English New (+ existing) UMLS Not listed Not listed [99]
Zhang 2013 USA No Information extraction i2b2/VA challenge corpus (2010) 3 and GENIA corpus (MEDLINE abstracts) Existing English New UMLS Not listed Not listed [100]
Zhou 2006 USA No Information extraction Records of patients with breast complaints Own English New UMLS No, still under development Not listed [101]
Zhou 2011 USA No Software development and evaluation COPD and CAD patients Own English New SNOMED CT, RxNorm, UMLS, PPL, MDD, HL7 value sets Yes, described in other paper (103]) Not listed [102]
Zhou 2014 USA No Information extraction Admission notes and discharge summaries Own English Existing SNOMED CT, HL7 RoleCodes Not listed Not applicable [103]
  1. 1. PhenoCHF corpus: narrative reports from electronic health records (EHRs) and literature articles
  2. 2. ShARe/CLEF corpus (2013): narrative clinical reports
  3. 3. i2b2/VA challenge dataset (2010): discharge summaries and progress reports
  4. 4. MIMIC-III dataset: demographics, vital sign measurements, laboratory test results, procedures, medications, caregiver notes, imaging reports, and mortality
  5. 5. BioScope corpus: medical free texts, biological full papers and biological scientific abstracts
  6. 6. NCBI disease corpus: PubMed abstracts
  7. 7. ShARe corpus: deidentified clinical free-text notes from the MIMIC II database
  8. 8. i2b2 challenge corpus (2012): discharge summaries
  9. 9. i2b2 challenge dataset (2009): de-identified hospital discharge summaries
  10. 10. i2b2 challenge corpus (2008): discharge summaries of overweight and diabetic patients
  11. 11. i2b2 challenge corpus (2014): longitudinally ordered clinical notes from three cohorts of diabetic patients