Skip to main content

Table 9 Recommendation regarding the use of data

From: Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies

1. To ensure that new algorithms can be compared against your system, aim to publish the used training, development, and validation data in a data repository.

 1. In case the data cannot be published, determine if the data can be accessed on request or can be used in a federated learning approach (i.e., a learning process in which the data owners collaboratively train a model in which process any data owner does not expose the data to others [107]).

2. In case a reference standard is used, include information about the origin of the data (external dataset, subset of the dataset) and the characteristics of the data in the dataset. If possible, reference the dataset using a DOI or URL.

3. If an external dataset is used, give a short description of the data present in the dataset and reference the source of the dataset.