For historical reasons OmegaT includes a custom English stop words list. This list was originally intended to be used for teminology extraction, so it contains a very large number of words, many of which are not trivial.
OmegaT uses stop words for two main purposes:
StopList_en.txt
is ill suited to both of these tasks, causing confusion when seemingly quite different sentences appear to be 100% matches, and excluding some non-trivial words from dictionary lookup.
Lucene now includes a reasonable set of English stop words that better align with our goals:
"a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"
Thus I have removed StopList_en.txt
so that we fall back to Lucene instead.
Implemented in the released version 4.0 of OmegaT.
Didier