EURASIP Journal on Wireless Communications and Networking

Table 2 A Statistic of Used Corpora

From: A multi-task learning framework for efficient grammatical error correction of textual messages in mobile communications

Corpus	Sentence number	Avg. token number	Avg. error ratio	Error number	Avg. patch length
Training sets
FCE	28,350	16.0	62.47	2.20	2.17
Lang-8	1,037,561	11.4	47.98	1.94	2.36
NUCLE	57,151	20.3	37.87	1.97	2.12
W+L	34,308	18.3	66.26	2.46	2.20
Development sets
CoNLL-13[24]	1381	21.1	81.4	2.60	2.13
BEA-19 (dev)	4384	19.8	64.3	2.38	2.21
Evaluation Sets
CoNLL-14 A1	1312	23.0	72.2	2.21	2.11
CoNLL-14 A2	1312	23.0	86.1	2.68	2.14
BEA-19 (test)	4477	19.1	–	–	–

Back to article page