No. Don't hold your breath. Stopwords have proved to be quite superflous. POPFile's accuracy doesn't increase or decrease if you purge the complete list. They are still around because users insisted on stopwords and the possibility to edit the list. But I don't think that any more work will go into stopwords.
Regards,
Manni
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
>> Stopwords have proved to be quite superflous. <<
I agree. Almost 3 years ago I stopped using stopwords. My current statistics show 15,302 messages classified with 74 errors, yielding a classification accuracy of 99.51%. This is about the same accuracy that I had when I was using stopwords.
>> POPFile's accuracy doesn't increase or decrease if you purge the complete list. <<
This is not what I found when I deleted my stopwords file.
I expected that I would have to do more reclassifying than normal for a while but hoped that things would soon get back to normal. Two months later I was still having to reclassify messages which POPFile had been able to classify reliably up until I removed the stopwords, and my database was getting bigger and bigger without any increase in POPFile's accuracy.
I decided to just throw away my POPFile database and start again, without any stopwords.
To my surprise this turned out to be a very good idea because I ended up doing fewer reclassifications this way. POPFile is a quick learner but I was still surprised at how few messages I had to reclassify when starting from scratch again.
Brian
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi..is there any list of importable stop words for german?
I am new to POPFile and I speak Portuguese. I believe I also have the same problem as you.
I am looking at a new manually tagged as SPAM message that has this inside a HTML message:
"Cobran=E7a de Inadimplentes pelo Telefone: Efic=E1cia e T=E9cnicas de Negocia=E7=E3o"
If corrected unescaped, these should look like this:
"Cobrana de Inadimplementes pelo Telefone:
Eficcia e Tcnicas de Negociao"
From that frase, POPFile only recorded the following words: "Inadimplementes", "pelo" and "Telefone". The other words he seemed to ignore.
Is this correct?
Thanks, Henrique.
Right, and I have dealt with this problem in v0.18.0 which will be out shortly.
John.
SORRY to original topic poster: I completely misunderstood your first post, so my reply makes no sense. Please ignore. Other readers: please help.
Henrique.
Thanks John!
Henrique Pantarotto
Still no "Ignored Words" for german language?
No. Don't hold your breath. Stopwords have proved to be quite superflous. POPFile's accuracy doesn't increase or decrease if you purge the complete list. They are still around because users insisted on stopwords and the possibility to edit the list. But I don't think that any more work will go into stopwords.
Regards,
Manni
>> Stopwords have proved to be quite superflous. <<
I agree. Almost 3 years ago I stopped using stopwords. My current statistics show 15,302 messages classified with 74 errors, yielding a classification accuracy of 99.51%. This is about the same accuracy that I had when I was using stopwords.
>> POPFile's accuracy doesn't increase or decrease if you purge the complete list. <<
This is not what I found when I deleted my stopwords file.
I expected that I would have to do more reclassifying than normal for a while but hoped that things would soon get back to normal. Two months later I was still having to reclassify messages which POPFile had been able to classify reliably up until I removed the stopwords, and my database was getting bigger and bigger without any increase in POPFile's accuracy.
I decided to just throw away my POPFile database and start again, without any stopwords.
To my surprise this turned out to be a very good idea because I ended up doing fewer reclassifications this way. POPFile is a quick learner but I was still surprised at how few messages I had to reclassify when starting from scratch again.
Brian