From: FlexiTerm: a flexible term recognition method
Data set
Size (KB)
Documents
Sentences
Tokens
Distinct tokens
Distinct stems
1
145
100
906
24,096
3,430
2,720
2
150
949
26,174
3,837
3,049
3
169
1,949
40,461
4,404
3,422
4
300
3,022
55,845
5,402
4,504
5
73
960
13,093
946
824