Skip to main content

Table 3 Included publications and their first author, year, title, and country

From: Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies

Author

Year

Country

Challenge

Induced objective

Data origin

Dataset

Data language

Used system

Term. Sys.

In use

Source code

Ref

Afshar

2019

USA

No

Information extraction

Clinical Data Warehouse Data

Own

English

New (+ existing)

UMLS (CPT, HCPCS, ICD-10, ICD10CM / ICD9CM, LOINC, MeSH, SNOMED-CT, RxNorm)

Not listed

No, only links to cTAKES source code

[29]

Alnazzawi

2016

UK

No

Information enrichment

PhenoCHF corpus 1

Existing

English

Existing

UMLS

Not listed

Not applicable

[30]

Atutxa

2018

Spain

No

Information enrichment

EHR documents

Own

Spanish

New

ICD (SNOMED-CT for normalization)

Not yet, aim to embed it in human-supervised loop

Not listed

[31]

Barrett

2013

USA

No

Information extraction

Palliative care consult letters

Own

English

New

SNOMED CT

Not listed

No, but planned

[32]

Becker

2016

Germany

No

Information extraction

ShARe/CLEF corpus (2013) 2

Existing

German

Existing

SNOMED CT (English), UMLS (German)

Not yet, still under development

Not applicable

[33]

Becker

2019

Germany

No

Information extraction

Clinical notes of patients with known colorectal cancer

Own

German

New (+ existing)

UMLS

Yes, led to improved quality of care for colorectal patients

Not listed

[34]

Bejan

2015

USA

No

Information extraction

Discharge summaries and i2b2/VA challenge dataset (2010) 3

Own + Existing

English

Existing

UMLS

No

Not applicable

[35]

Castro

2010

Spain

No

Information extraction

Clinical notes with ‘most relevant information’

Own

Spanish

Existing

SNOMED CT

Not listed

Not applicable

[36]

Catling

2018

UK

No

Software development and evaluation

MIMIC-III dataset 4

Existing

English

New

ICD-9-CM

Not listed

Not listed

[37]

Chapman

2004

USA

No

Information extraction

Emergency department reports

Own

English

Existing

UMLS

Not listed

Not applicable

[38]

Chen

2016

USA

No

Information enrichment

Discharge summaries and progress notes

Own

English

New (+ existing)

UMLS

Not listed

Not listed

[39]

Chiaramello

2016

Italy

No

Information extraction

Clinical notes (cardiology, diabetology, hepatology, nephrology, and oncology)

Own

Italian

Existing

UMLS

Not listed

Not applicable

[40]

Chodey

2016

USA

SemEval (2014)

Information extraction

ICU Data: Discharge summaries, ECG, echo, and radiology

Existing

English

New (+ existing)

UMLS

Not listed

Not listed

[41]

Chung

2005

USA

No

Information extraction

Echocardiogram reports

Own

English

New (+ existing)

UMLS

Not yet, it will be used to populate a registry

Not listed

[42]

Combi

2018

Italy

No

Information extraction

VigiSegn (adverse drug reactions) reports

Own

Italian + English

New

MedDRA

Yes, implemented in VigiFarmaco

Pseudocode

[43]

De Bruijn

2011

Canada

i2b2/VA (2010)

Information extraction

Hospital discharge summaries and progress reports

Existing

English

New (+ existing)

UMLS

Not listed

Not listed

[44]

Deisseroth

2019

USA

No

Information extraction

Six sets of real patient data from four different medical centers.

Own

English

New

HPO

Not listed

Yes

[45]

Demner-Fushman

2017

USA

No

Software development and evaluation

BioScope 5, NCBI disease corpus 6, i2b2/VA challenge corpus (2010) 3, ShARe corpus 7, LHC test collection (biological/clinical journal abstracts)

Existing

English

New (+ existing)

UMLS

Yes, used in other papers identified in literature search

Yes

[46]

Divita

2014

USA

Parts: i2b2/VA (2010)

Software development and evaluation

Randomly selected clinical records from the most frequent document types

Own

English

New

UMLS (level 0 + 9)

Yes, used by VA Informatics and Computing Infrastructure

Yes

[47]

Duarte

2018

Portugal

No

Information enrichment

Death certificates, clinical bulletins, and autopsy reports

Own

Portuguese

New

ICD-10

Yes, used by Portugese Ministry of Health for near real-time death cause surveillance

Not listed

[48]

Falis

2019

UK

No

Information extraction

MIMIC-III dataset 4

Existing

English

New

ICD-9

Not listed

Not listed

[49]

Ferrão

2013

Portugal

No

Information enrichment

Inpatient adult episodes from the EHR

Own

Portuguese

New

ICD-9-CM

Not listed

Not listed

[50]

Gerbier

2011

France

No

Information extraction

Computerized emergency department medical records

Own

French

New

ICD-10, CCAM, SNOMED CT, ATC, MeSH, ICPC-2, DCR

Not yet, will be integrated into a CDSS

Not listed

[51]

Goicoechea Salazar

2013

Spain

No

Information enrichment

Diagnostic text from patient records

Own

Spanish

New

ICD-9-CM

Not listed

Not listed

[52]

Hamid

2013

USA

No

Classification

Notes of Iraq and Afghanistan veterans from the VA national clinical database

Own

English

Existing

UMLS

Not listed

Not applicable

[53]

Hassanzadeh

2016

Australia

No

Information extraction

ShARe/CLEF corpus (2013) 2

Existing

English

Existing

UMLS, SNOMED CT

Not applicable

Not applicable

[54]

Helwe

2017

Lebanon

No

Computer-assisted coding

MIMIC-III dataset

Existing

English

New

UMLS, ICD

Not listed

Not listed

[55]

Hersh

2001

USA

No

Information enrichment

Radiology image reports

Own

English

Existing

UMLS

No, still in development/testing

Pseudocode

[56]

Hoogendoorn

2015

Netherlands

No

Prediction

Consultation notes of patients in a primary care setting

Own

Dutch

New

SNOMED-CT, UMLS, ICPC

Not listed

Not listed

[57]

Jindal

2013

USA

i2b2 (2012)

Information extraction

i2b2 challenge corpus (2012) 8

Existing

English

New (+ existing)

UMLS, SNOMED CT, MeSH

Not listed

Not listed

[58]

Kang

2009

Korea

No

Information extraction

Discharge summaries

Own

Korean

New

KOMET, UMLS

Not listed

Not listed

[59]

Kersloot

2019

Netherlands

No

Information extraction

(Non-small cell) Lung cancer charts

Own

English

New (+ existing)

SNOMED CT

Not listed

Yes

[60]

König

2019

Germany

No

Software development and evaluation

Discharge letters from BASE-II study

Own

German

New (+ existing)

Wingert-Nomenclature

No, still has to prove its value

Not listed

[61]

Li

2015

USA

No

Information comparison

Clinical notes and discharge prescription lists

Own

English

New (+ existing)

UMLS, SNOMED CT, RxNorm

Not yet, plans to move to production

Pseudocode

[62]

Li

2019

USA

No

Information extraction

EHR notes

Own

English

New (+ existing)

UMLS, SNOMED CT, MedDRA

Not listed

Not listed

[63]

Lingren

2016

USA

No

Classification

Structured and unstructured data from two EHR databases

Own

English

New (+ existing)

UMLS, ICD-9, RxNorm

Not listed

Not listed

[12]

Liu

2019

USA

No

Information extraction

Clinical notes from different institutions + PubMed Case report abstracts

Own + Existing

English

Existing

HPO

Not listed

Not applicable

[64]

Lowe

2009

USA

No

Information extraction

Single-specimen pathology reports

Own

English

Existing

UMLS, SNOMED CT

Not listed

Not applicable

[65]

Luo

2014

USA

No

Information extraction

Pathology reports

Own

English

New (+ existing)

UMLS, SNOMED CT

Yes, currently working on project in multiple hospitals

Not listed

[66]

Meystre

2006

USA

No

Information enrichment

Clinical documents form adult inpatients in a cardiovascular unit

Own

English

New (+ existing)

UMLS (level 0), SNOMED CT

Not yet, testing in practice

Not listed

[67]

Meystre

2010

USA

i2b2 (2009)

Information extraction

i2b2 challenge dataset (2009) 9

Existing

English

New

UMLS

Not yet, possible integration in research infrastructure

Not listed

[68]

Minard

2011

France

i2b2/VA (2010)

Information extraction

i2b2/VA challenge corpus (2010) 3

Existing

English

New (+ existing)

UMLS

Not listed

Not listed

[69]

Mishra

2019

USA

No

Information extraction

Clinical notes from NIH Clinical Center data warehouse

Own

English

Existing

UMLS, HPO

Not listed

Not applicable

[70]

Nguyen

2018

Australia

No

Computer-assisted coding

Hospital progress notes

Own

English

New (+ existing)

SNOMED CT, ICD-10-AM

Not listed

Not listed

[71]

Oellrich

2015

UK

No

Information extraction

PubMed abstracts, clinical trial information, i2b2/VA challenge corpus (2010) 3, SHARE/CLEF (2013) 2

Existing

English

Existing

UMLS

Not listed

Not applicable

[72]

Patrick

2011

Australia

i2b2/VA (2010)

Information extraction

i2b2/VA challenge corpus (2010) 3

Existing

English

New

UMLS, SNOMED CT

Not listed

Not listed

[73]

Pérez

2018

Spain

No

Text processing

Spontaneous DTs randomly selected entries

Own

Spanish

New

ICD

Not listed

Not listed

[74]

Reátegui

2018

Canada

No

Information extraction

i2b2 challenge corpus (2008) 10

Existing

English

New (+ existing)

UMLS, SNOMED CT, RxNorm

Not listed

Not listed

[75]

Roberts

2011

USA

i2b2/VA (2010)

Information extraction

i2b2/VA challenge corpus (2010) 3

Existing

English

New (+ existing)

UMLS, ICD-9

Not listed

Not listed

[76]

Rousseau

2019

USA

No

Information comparison

ED encounters for patients with headaches who received head CT

Own

English

Existing

UMLS: SNOMED CT, RadLex

Not listed

Not applicable

[77]

Savova

2010

USA

i2b2 (2006, 2008)

Information extraction

Subset of clinical notes from the EMR

Own

English

New (+ existing)

UMLS, SNOMED CT, RxNorm

Yes, used in other papers identified in literature search

Yes

[78]

Shivade

2015

USA

i2b2/UTHealth (2014)

Classification

i2b2 challenge corpus (2014) 11

Existing

English

Existing

UMLS

Not listed

Not applicable

[11]

Shoenbill

2019

USA

No

Information extraction

EHR notes from hypertension patients

Own

English

Existing

UMLS, SNOMED CT

Not listed

Not applicable

[79]

Sohn

2014

USA

No

Information extraction

Clinical notes with medication mentions

Own

English

New

RxNorm

Not listed

Yes

[80]

Solti

2008

USA

No

Information enrichment

Cardiology ambulatory progress notes

Own

English

Existing

UMLS

Not listed

Not applicable

[81]

Soriano

2019

Spain

No

Information extraction

clinical emergency discharge reports

Own

Spanish

New

SNOMED CT

Not yet

Yes

[82]

Soysal

2018

USA

Parts: i2b2 (2009 + 2010), ShARe/CLEF (2013), Sem-EVAL (2014)

Software development and evaluation

Discharge summaries from the i2b2/VA challenge corpus (2010) 3, outpatient clinic visit notes, mock clinical documents

Own + Existing

English

New

UMLS

Yes, used by various institutions and industrial entities

Yes

[83]

Spasić

2015

UK

No

Information extraction

MRI reports of patients

Own

English

New (+ existing)

TRAK, UMLS, MEDCIN, RadLex

Not listed

Yes

[84]

Strauss

2013

USA

No

Information extraction

Pathology reports of breast and prostate cancer patients

Own

English

New

SNOMED CT

Not listed

Yes

[85]

Sung

2018

Taiwan

No

Information extraction

Cases of adult patients with AIS

Own

English

Existing

UMLS

Not listed

Not applicable

[86]

Tchechmedjiev

2018

France

No

Information extraction

Quaero (French MEDLINE abstract titles + EMEA drug labels) + CépiDC (ICD-10 coding of death certificates)

Existing

French

New (+ existing)

UMLS terminologies (ICD-10)

Yes, available in SIFR BioPortal

Yes

[87]

Ternois

2018

France

No

Classification

Endoscopy reports written between 2015 and 2016

Own

French

New

CCAM

Not listed

Not listed

[88]

Travers

2004

USA

No

Information extraction

Chief complaint text entries for all emergency department visits

Own

English

New

UMLS

Not listed

Not listed

[89]

Tulkens

2019

Belgium

No

Information extraction

i2b2/VA challenge corpus (2010) 3

Existing

English

New (+ existing)

UMLS

Not listed

Yes

[90]

Usui

2018

Japan

No

Prediction

Electronic medication history data from pharmacy

Own

Japanese

New

ICD-10

Not yet, expect to use it

Not listed

[91]

Valtchinov

2019

USA

No

Classification

Radiology reports, emergency department notes + other clinical reports

Own

English

Existing

SNOMED CT, RadLex

Not listed

Not applicable

[92]

Wadia

2018

USA

No

Classification

Chest CT reports

Own

English

Existing

SNOMED CT, UMLS

Not listed

Not applicable

[93]

Walker

2019

USA

No

Information extraction

Treatment sites from EMR

Own

English

New

UMLS

Not listed

Not listed

[94]

Xie

2019

China

No

Information extraction

MIMIC-III dataset 4

Existing

English

New

ICD-9-CM, ICD-10

Not listed

Not listed

[95]

Xu

2011

USA

No

Classification

CRC patient cases from the Synthetic Derivative database

Own

English

Existing

UMLS

No, still under development

Not applicable

[96]

Yadav

2013

USA

No

Prediction

Emergency department CT imaging reports

Own

English

Existing

UMLS

Not listed

Yes, command line command

[97]

Yao

2019

USA

No

Prediction

i2b2 challenge corpus (2008) 10

Existing

English

New (+ existing)

UMLS

Not listed

Part (Sorl)

[98]

Zeng

2018

USA

No

Classification

Progress notes and breast cancer surgical pathology reports

Own

English

New (+ existing)

UMLS

Not listed

Not listed

[99]

Zhang

2013

USA

No

Information extraction

i2b2/VA challenge corpus (2010) 3 and GENIA corpus (MEDLINE abstracts)

Existing

English

New

UMLS

Not listed

Not listed

[100]

Zhou

2006

USA

No

Information extraction

Records of patients with breast complaints

Own

English

New

UMLS

No, still under development

Not listed

[101]

Zhou

2011

USA

No

Software development and evaluation

COPD and CAD patients

Own

English

New

SNOMED CT, RxNorm, UMLS, PPL, MDD, HL7 value sets

Yes, described in other paper (103])

Not listed

[102]

Zhou

2014

USA

No

Information extraction

Admission notes and discharge summaries

Own

English

Existing

SNOMED CT, HL7 RoleCodes

Not listed

Not applicable

[103]

  1. 1. PhenoCHF corpus: narrative reports from electronic health records (EHRs) and literature articles
  2. 2. ShARe/CLEF corpus (2013): narrative clinical reports
  3. 3. i2b2/VA challenge dataset (2010): discharge summaries and progress reports
  4. 4. MIMIC-III dataset: demographics, vital sign measurements, laboratory test results, procedures, medications, caregiver notes, imaging reports, and mortality
  5. 5. BioScope corpus: medical free texts, biological full papers and biological scientific abstracts
  6. 6. NCBI disease corpus: PubMed abstracts
  7. 7. ShARe corpus: deidentified clinical free-text notes from the MIMIC II database
  8. 8. i2b2 challenge corpus (2012): discharge summaries
  9. 9. i2b2 challenge dataset (2009): de-identified hospital discharge summaries
  10. 10. i2b2 challenge corpus (2008): discharge summaries of overweight and diabetic patients
  11. 11. i2b2 challenge corpus (2014): longitudinally ordered clinical notes from three cohorts of diabetic patients