Large scale biomedical texts classification: a kNN and an ESA-based approaches

Table 1 Importance of each feature for the prediction according to the Information Gain measure

Feature	Description	Information gain
Feature 1	Number of neighbours in which the label is assigned	0.16
Feature 2	Sum of similarity scores between the document and all the neighbours’ document where the label appears	0.17
Feature 3	Check whether all constituted tokens of the label appear in the target document	0.01
Feature 4	Check whether one of the label entries appears in the target document	0.03
Feature 5	Frequency of the label if it is contained in the document	0.03
Feature 6	Check if the label is contained in the document title	0.02

ISSN: 2041-1480