Skip to main content

Table 5 Characteristics of the included studies

From: Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies

Description n (%) References
Main objective
Information extraction 45 (58%) [29, 32,33,34,35,36, 38, 40,41,42,43,44,45, 49, 51, 58,59,60, 63,64,65,66, 68,69,70, 72, 73, 75, 76, 78,79,80, 82, 84,85,86,87, 89, 90, 94, 95, 100, 101, 103, 104]
Information enrichment 9 (12%) [30, 31, 39, 48, 50, 52, 56, 67, 81]
Classification 8 (10%) [11, 12, 53, 88, 92, 93, 96, 99]
Software development and evaluation 6 (7.8%) [37, 46, 47, 61, 83, 102]
Prediction 4 (5.2%) [57, 91, 97, 98]
Information comparison 2 (2.6%) [62, 77]
Computer-assisted coding 2 (2.6%) [55, 71]
Text processing 1 (1.3%) [74]
Part of challenge
i2b2
(Informatics for Integrating Biology and the Bedside)
10 (13%) [11, 44, 47, 58, 68, 69, 73, 76, 78, 83]
Entire system 8 (10%) [11, 44, 58, 68, 69, 73, 76, 78]
Parts of the system 2 (2.6%) [47, 83]
SemEval (Semantic Evaluation) 2 (2.6%) [41, 83]
Entire system 1 (1.3%) [41]
Parts of the system 1 (1.3%) [83]
ShARe/CLEF
(Shared Annotated Resources/Conference and Labs of the Evaluation Forum)
1 (1.3%) [83]
Parts of the system 1 (1.3%) [83]
Dataset: language
English 60 (78%) [11, 12, 29, 30, 32, 35, 37,38,39, 41,42,43,44,45,46,47, 49, 53, 55, 56, 58, 60, 62,63,64,65,66,67,68,69,70,71,72,73, 75,76,77,78,79,80,81, 83,84,85,86, 89, 90, 92,93,94,95,96,97,98,99,100,101,102,103,104]
Spanish 5 (6.5%) [31, 36, 52, 74, 82]
French 3 (3.9%) [51, 87, 88]
German 3 (3.9%) [33, 34, 61]
Italian 2 (2.6%) [40, 43]
Portuguese 2 (2.6%) [48, 50]
Dutch 1 (1.3%) [57]
Japanese 1 (1.3%) [91]
Korean 1 (1.3%) [59]
Dataset: Origin
Data present in institute 55 (71%) [12, 29, 31, 32, 34,35,36, 38,39,40, 42, 43, 45, 47, 48, 50,51,52,53, 56, 57, 59,60,61,62,63,64,65,66,67, 70, 71, 74, 77,78,79,80,81,82,83,84,85,86, 88, 89, 91,92,93,94, 96, 97, 99, 101,102,103]
Existing dataset 25 (33%) [11, 30, 33, 35, 37, 41, 44, 46, 49, 55, 58, 64, 68, 69, 72, 73, 75, 76, 83, 87, 90, 95, 98, 100, 104]
Included reference to dataset 21 (27%) [11, 30, 35, 37, 41, 44, 46, 49, 55, 58, 64, 72, 75, 76, 83, 87, 90, 95, 98, 100, 104]
Training of algorithm
Trained 47 (61%) [11, 12, 29, 31, 32, 34, 37, 39, 41, 42, 44, 45, 48,49,50,51,52,53, 55,56,57,58,59, 62, 63, 65, 66, 68, 69, 73, 74, 76, 78,79,80,81,82,83,84, 87, 88, 90, 95, 96, 98, 99, 104]
Not listed 3 (3.9%) [30, 101, 102]
Development of algorithm
Use of development set 16 (21%) [12, 29, 31, 34, 37, 49, 55, 60, 63, 69, 74, 80, 87, 90, 94, 95]
Not listed 4 (5.2%) [30, 82, 83, 101]
Used NLP system or algorithm
New NLP system or algorithm 29 (38%) [31, 32, 37, 43, 45, 47,48,49,50,51,52, 55, 57, 59, 68, 73, 74, 80, 82, 83, 85, 88, 89, 91, 94, 95, 100,101,102]
New NLP system or algorithm with existing components 25 (33%) [12, 29, 34, 39, 41, 42, 44, 46, 58, 60,61,62,63, 66, 67, 69, 71, 75, 76, 78, 84, 87, 90, 98, 99]
Existing NLP system or algorithm 23 (30%) [11, 30, 33, 35, 36, 38, 40, 53, 56, 64, 65, 70, 72, 77, 79, 81, 86, 93, 96, 97, 103, 104]
Use in practice
Plans to implement / still under development and testing 12 (16%) [31, 33, 51, 56, 62, 66,67,68, 82, 91, 96, 101]
Implemented in practice 10 (13%) [34, 42, 43, 46,47,48, 78, 83, 87, 102]
Availability of code
Published algorithm or source code 15 (20%) [31, 45,46,47, 60, 78, 80, 82,83,84,85, 87, 90, 97, 98]
Pseudocode in manuscript 3 (3.9%) [43, 56, 62]
Planning to publish algorithm or source code 1 (1.3%) [32]
Not applicable, used an existing system 20 (26%) [11, 30, 33, 35, 36, 38, 40, 53, 64, 65, 70, 72, 77, 79, 81, 86, 93, 96, 103, 104]