Stefan Evert - 2017-07-01

I would strongly argue in favour of the BNCweb solution! I think that corpus designers should be encouraged to preserve the original whitespace in the tokenization phase – good tokenizers have an option to do this.