Next we investigated various ngram models, given the likelihood of all sequences of break and non-break up to length N. The following table shows the effect on performance of varying N.
It appears that from these experiments, the value of N is not critical so long as it is above 2, i.e. so long a some context is used. We used a variety of standard ngram smoothing techniques but none had any significant effect on performance.