Comparison of experimental settings. This graph represents the AUC (higher pair of lines) and the F-score (lower pair of lines) results for two different experimental setups. In the first setup (AUC1, F1), all of the sentences from an abstract are either in training or test data within a fold of the cross-validation experiment. In the second setup, the sentence vectors are randomised first and then cut into cross-validation folds. The experiment shows 10 runs of randomised 10-fold cross-validation experiments. The vertical bars demonstrate the separations between the runs. For the first experiment F1 = 0.7300 ± 0.0058 and AUC = 0.8902 ± 0.0033, while for the second experiment F1 = 0.6808 ± 0.0051 and AUC = 0.8886 ± 0.0023. The t-test shows there is no significant difference between these experimental setups (p = 0.9640 for the F-scores and p = 0.5467 for the AUC).