Corpus | Sentence number | Avg. token number | Avg. error ratio | Error number | Avg. patch length |
---|---|---|---|---|---|
Training sets | |||||
FCE | 28,350 | 16.0 | 62.47 | 2.20 | 2.17 |
Lang-8 | 1,037,561 | 11.4 | 47.98 | 1.94 | 2.36 |
NUCLE | 57,151 | 20.3 | 37.87 | 1.97 | 2.12 |
W+L | 34,308 | 18.3 | 66.26 | 2.46 | 2.20 |
Development sets | |||||
CoNLL-13[24] | 1381 | 21.1 | 81.4 | 2.60 | 2.13 |
BEA-19 (dev) | 4384 | 19.8 | 64.3 | 2.38 | 2.21 |
Evaluation Sets | |||||
CoNLL-14 A1 | 1312 | 23.0 | 72.2 | 2.21 | 2.11 |
CoNLL-14 A2 | 1312 | 23.0 | 86.1 | 2.68 | 2.14 |
BEA-19 (test) | 4477 | 19.1 | – | – | – |