Skip to main content

Table 3 Feature blocks used to build the ensemble model

From: Concept selection for phenotypes and diseases using learn to rank

Feature block

Description

FB1

A Boolean set of features for the system identifiers (i.e. M1 … M9);

FB2

A Boolean set of features for the semantic types that are predicted by the system to appear and not appear in the sentence (i.e. T047, T184, … etc.);

FB3

A set of integer valued features for the counts of vocabulary terms appearing in UMLS concepts that are predicted by the systems to appear in the sentence; In total the set consisted of 1,008 UMLS CUIs;

FB4

A set of integer valued features for the counts of vocabulary terms appearing in the sentence; The vocabulary consisted of 13,565 terms;

FB5

A set of integer valued features for the ‘45 cluster’ distributed semantic classes which match to FB3. The 45 cluster classes derived by Richard Socher and Christoph Manning from PubMed are available at http://nlp. stanford.edu/software/ bionlp2011-distsim-clusters-v1.tar.gz