Skip to main content

Table 1 Detailed information about the datasets used in the methodology

From: SIENA: Semi-automatic semantic enhancement of datasets using concept recognition

Data

About

Subset

Size

Column Number

Column Name

HGNC

Standardized nomenclature to human genes

Subset of complete HGNC

382 KB

49

symbol, locus type, ena

CTD

Manually curated information

Chemical–gene interactions set

326 MB

11

Chemical ID, Gene Forms,

     

PubMed IDs

PGKB

Information about how human genetic

Summary of the gene information

13.6 MB

17

Ensemble Id, Chromosome,

 

variation affects response to medications

   

Cross-references