From: MedEval — A Swedish medical test collection with doctors and patients user groups
Entire collection | Assessed documents | Doctors assessed | Patients assessed | Common files | Doctors relevant | Patients relevant | |
---|---|---|---|---|---|---|---|
Number of documents | 42,250 | 7,044 | 3,272 | 4,334 | 562 | 1,233 | 1,654 |
Tokens | 12,991,157 | 5,034,323 | 3,232,772 | 2,431,160 | 629,609 | 1,361,700 | 988,236 |
Tokens/document | 307 | 715 | 988 | 561 | 1,120 | 1,104 | 596 |
Average word length | 5.75 | 6.04 | 6.29 | 5.73 | 6.16 | 6.33 | 5.63 |
Full form types | 334,559 | 181,354 | 154,901 | 92,803 | 50,961 | 87,814 | 43,825 |
Lemma types | 267,892 | 146,631 | 126,217 | 73,121 | 40,857 | 71,974 | 34,263 |
Lemma type token ratio | 48.5 | 34.3 | 25.6 | 33.2 | 15.4 | 18.9 | 28.8 |
Compound tokens | 1,273,874 | 573,625 | 412,475 | 237,267 | 76,117 | 179,580 | 92,420 |
Full form compound types | 187,904 | 99,614 | 83,846 | 47,387 | 24,083 | 45,257 | 20,157 |
Lemma compound types | 144,159 | 78,508 | 66,907 | 37,151 | 19,685 | 36,867 | 16,006 |
Ratio of compounds | 0.098 | 0.114 | 0.128 | 0.098 | 0.120 | 0.132 | 0.094 |