I've used a configuration down:
config.setConfigurationRule(WVTConfiguration.STEP_INPUT_FILTER, new WVTConfigurationFact(new XMLInputFilter()));
config.setConfigurationRule(WVTConfiguration.STEP_WORDFILTER, new WVTConfigurationFact(new StopWordsWrapper()));
config.setConfigurationRule(WVTConfiguration.STEP_STEMMER, new WVTConfigurationFact(new PorterStemmerWrapper()));
config.setConfigurationRule(WVTConfiguration.STEP_VECTOR_CREATION, new WVTConfigurationFact(new TFIDF()));
config.setConfigurationRule(WVTConfiguration.STEP_OUTPUT, new WVTConfigurationFact(new WordVectorWriter(new FileWriter("docs.txt"), true)));
So, I've used the next statements to create the output:
wlista = wvtTool.createWordList(lista, config);
wvtTool.createVectors(lista, config, wlista);
The problem is:
If I have 20 files, the output is generated with 18 files.
If I have 200 files, the output is generated with 195 files.
and so on..
I'm sure that all files are being read. The file words.txt contains words from all files, but docs.txt has less than this total.
What I've done wrong?
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.