Skip to main content

Table 10 Recommendation regarding the evaluation and validation of Natural Language Processing algorithms

From: Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies

1. Perform an evaluation using generic (i.e., precision, recall, and F-score) performance measures and appropriate aspects of evaluation including discrimination, calibration, and preferably accuracies of predictions (e.g., AUC, calibration graphs, and the Brier score).

 1. Include a motivation for the choice of measures, with references to existing literature where appropriate (e.g., Sokolova and Lapalme’s analysis of performance measures [108]).

2. Perform an error analysis and discuss the errors in the Discussion section of the paper. Include possible changes to the algorithm that could improve its performance for these specific errors.

3. When using a non-probabilistic NLP method: determine the cut-off value (a priori) for a ‘good’ test result before evaluating the algorithm. Elaborate why this cut-off value is chosen.