Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

Create Vector

Help
2008-09-01
2013-04-08
  • I've used a configuration down:
            config.setConfigurationRule(WVTConfiguration.STEP_INPUT_FILTER, new WVTConfigurationFact(new XMLInputFilter()));
            config.setConfigurationRule(WVTConfiguration.STEP_WORDFILTER, new WVTConfigurationFact(new StopWordsWrapper()));
            config.setConfigurationRule(WVTConfiguration.STEP_STEMMER, new WVTConfigurationFact(new PorterStemmerWrapper()));
            config.setConfigurationRule(WVTConfiguration.STEP_VECTOR_CREATION, new WVTConfigurationFact(new TFIDF()));
            config.setConfigurationRule(WVTConfiguration.STEP_OUTPUT, new WVTConfigurationFact(new WordVectorWriter(new FileWriter("docs.txt"), true)));

    So, I've used the next statements to create the output:
          wlista = wvtTool.createWordList(lista, config);
          wlista.store(new FileWriter("words.txt"));
          wvtTool.createVectors(lista, config, wlista);

    The problem is:
    If I have 20 files, the output is generated with 18 files.
    If I have 200 files, the output is generated with 195 files.
    and so on..
    I'm sure that all files are being read. The file words.txt contains words from all files, but docs.txt has less than this total.

    What I've done wrong?