Change SENTENCE_TOKENIZATION_REGEXP to not affect decimal numbers.
Added the README.txt file.
Fix some help messages.
Fix the preprocessing progress and some regexps
Added the stat function and the license.
Initial commit