Transfer language space with similar domain adaptation: a case study with hepatocellular carcinoma

Table 1 Statistics of the cohorts before processing - Stanford US dataset and EUH MRI dataset

	Stanford US dataset	EUH MRI dataset
Cohort level
Number of unique words	17194	19828
Common words in two domains	1790
Report level
Average number of words (+/- std)	167 (+/- 39)	197 (+/-47)
Average number of sentences (+/- std)	27 (+/-7)	32 ((+/-8)
Number of unique words in templated reports		2774
Number of unique words in reports without template		7930
Average number of words (+/- std) describing
liver related finding in templated reports	36 (+/- 25)	109 (+/- 63)
Average number of words (+/- std) describing
liver related finding in reports without template	47 (+/- 27)	104 (+/- 52)

ISSN: 2041-1480