Menu

#5 improve sentence spliiter

New
nobody
None
Medium
Defect
2013-11-01
2013-11-01
Anonymous
No

Originally created by: christian.ledermann (code.google.com)

What steps will reproduce the problem?

the current regex SplitSentences=re.compile(u'[.!?]') splits  expressions like '4.5 $US' to ['4', '$US']

split_sentences = re.compile(u'[.!?]\s+')

is more precise as it checks a whitespace following the end sentence mark

Discussion


Log in to post a comment.

MongoDB Logo MongoDB