Skip to main content

Table 1 Statistics of the cohorts before processing - Stanford US dataset and EUH MRI dataset

From: Transfer language space with similar domain adaptation: a case study with hepatocellular carcinoma

 

Stanford US dataset

EUH MRI dataset

Cohort level

Number of unique words

17194

19828

Common words in two domains

1790

Report level

Average number of words (+/- std)

167 (+/- 39)

197 (+/-47)

Average number of sentences (+/- std)

27 (+/-7)

32 ((+/-8)

Number of unique words in templated reports

 

2774

Number of unique words in reports without template

 

7930

Average number of words (+/- std) describing

  

liver related finding in templated reports

36 (+/- 25)

109 (+/- 63)

Average number of words (+/- std) describing

  

liver related finding in reports without template

47 (+/- 27)

104 (+/- 52)