1. Perform an evaluation using generic (i.e., precision, recall, and F-score) performance measures and appropriate aspects of evaluation including discrimination, calibration, and preferably accuracies of predictions (e.g., AUC, calibration graphs, and the Brier score).  1. Include a motivation for the choice of measures, with references to existing literature where appropriate (e.g., Sokolova and Lapalme’s analysis of performance measures [108]). 2. Perform an error analysis and discuss the errors in the Discussion section of the paper. Include possible changes to the algorithm that could improve its performance for these specific errors. 3. When using a non-probabilistic NLP method: determine the cut-off value (a priori) for a ‘good’ test result before evaluating the algorithm. Elaborate why this cut-off value is chosen. |