We recommend to use only features that occur with less then 1000 words and specify to use only positive significance scores. We recommend, using the LMI significance measure and keep only the top 1000 features per term with the highest significance scores. Normally it should be sufficient to keep the top 200 most similar terms for each term. This is achieved with:
python generateHadoopScript.py dataset 1000 0 0 1000 LMI 200
The reason for that error is a wrong parameter, when generating the script (generateHadoopScript.py) to run the Hadoop pipeline. Instead of a significance measure a number was given to the script. Following parameters could be used for generating the script:
python generateHadoopScript.py dataset 1000 0 0 1000 LMI 200
If the error still remains, you might use a version before 0.0.6 and should change to the latest version. If you don't want to change, follow the documentation found here.