Skip to main content

Table 1 Dataset characteristics

From: De-identifying free text of Japanese electronic health records

Dataset name

MedNLP

Dummy-EHRs

Pathology Reports

# of documents

50 reports

32 pairs of records and summaries

1000 reports

# of sentences

2244

8183

3012

# of tokens

42,621

154,132

194,449

# of all tags

490

3017

295

# of age tags

56

39

0

# of hospital tags

75

170

31

# of person tags

0

135

224

# of sex tags

4

16

0

# of time tags

355

2657

40

Example in original Japanese text

工場に勤めている 64歳 < x > 男性 。

施設入所中で寝たきり 86歳 女性 。全介助

<<院外標本 <h > 静大皮フ科クリニック</h > 、 < p > 桑田 智</p>

Example translated into English

A < a > 64-year-old</a > <x > man</x > works in a factory

An <a > 86-year-old</a > <x > woman</x > bedridden in a nursing home. Total assistance required

<<Ex-hospital sample < h > Shizudai Dermatology Clinic</h > , < p > Satoshi Kuwata</p>