Menu

#4 issues with tokenizer

open
nobody
None
5
2011-04-06
2011-04-06
No

Some glitches were found in 0.3rc2:

1) punctuation tokens are coalesced, e,g. <orth>",</orth>
2) number ranges are split into two numbers, e.g. 12-18 is tagged as <orth>12</orth><orth>-18</orth>

Discussion


Log in to post a comment.

MongoDB Logo MongoDB