Skip to main content

Table 1 Dataset characteristics

From: De-identifying free text of Japanese electronic health records

Dataset name MedNLP Dummy-EHRs Pathology Reports
# of documents 50 reports 32 pairs of records and summaries 1000 reports
# of sentences 2244 8183 3012
# of tokens 42,621 154,132 194,449
# of all tags 490 3017 295
# of age tags 56 39 0
# of hospital tags 75 170 31
# of person tags 0 135 224
# of sex tags 4 16 0
# of time tags 355 2657 40
Example in original Japanese text 工場に勤めている 64歳 < x > 男性 。 施設入所中で寝たきり 86歳 女性 。全介助 <<院外標本 <h > 静大皮フ科クリニック</h > 、 < p > 桑田 智</p>
Example translated into English A < a > 64-year-old</a > <x > man</x > works in a factory An <a > 86-year-old</a > <x > woman</x > bedridden in a nursing home. Total assistance required <<Ex-hospital sample < h > Shizudai Dermatology Clinic</h > , < p > Satoshi Kuwata</p>