{SG,X}ML tags not properly recognised
Status: Beta
Brought to you by:
gm_hossain
By default, your tokeniser only recognises XML or SGML tags in the form <xxxx>. It attempts to tokenize XML or SGML tags that contain spaces, such as
<tag attr="foo">. This is almost certainly not what an application should do, and it is not what the current release of treeTagger does. Tags containing spaces should be treated just the same as any other tag, and should not be tokenised.
Apologies if this is a configuration option which I missed!