Erik Ward - 2012-08-02

Hi!

First of I want to thank the creators of wikipediaminer, I did some investigating and realized that indexing and processing a wikipedia dump by my own software of some of the other software packages is a daunting task indeed! The plug and play feel of the java API is amazing!

The problem is that the javadoc is not very easy to read, In my research I would like to access terms statistics. I.e. a tf vector for wikipedia pages and also for wikipedia as a whole. Where should I start looking for this, should I try and read more about the Berkeley DB and try and go that route?

I am not very interested in the high level functions of wikipedia-miner because I am doing research and need control over what algorithms are used.

Best regards,
Erik