Skip to main content

Table 1 Importance of each feature for the prediction according to the Information Gain measure

From: Large scale biomedical texts classification: a kNN and an ESA-based approaches

Feature

Description

Information gain

Feature 1

Number of neighbours in which the label is assigned

0.16

Feature 2

Sum of similarity scores between the document and all the neighbours’ document where the label appears

0.17

Feature 3

Check whether all constituted tokens of the label appear in the target document

0.01

Feature 4

Check whether one of the label entries appears in the target document

0.03

Feature 5

Frequency of the label if it is contained in the document

0.03

Feature 6

Check if the label is contained in the document title

0.02