Skip to main content

Table 2 A Statistic of Used Corpora

From: A multi-task learning framework for efficient grammatical error correction of textual messages in mobile communications

Corpus

Sentence number

Avg. token number

Avg. error ratio

Error number

Avg. patch length

Training sets

FCE

28,350

16.0

62.47

2.20

2.17

Lang-8

1,037,561

11.4

47.98

1.94

2.36

NUCLE

57,151

20.3

37.87

1.97

2.12

W+L

34,308

18.3

66.26

2.46

2.20

Development sets

CoNLL-13[24]

1381

21.1

81.4

2.60

2.13

BEA-19 (dev)

4384

19.8

64.3

2.38

2.21

Evaluation Sets

CoNLL-14 A1

1312

23.0

72.2

2.21

2.11

CoNLL-14 A2

1312

23.0

86.1

2.68

2.14

BEA-19 (test)

4477

19.1

–

–

–