Menu

Stop Words in German?

Help
Anonymous
2003-01-21
2013-04-15
  • Anonymous

    Anonymous - 2003-01-21

    Hi..is there any list of importable stop words for german?

     
    • Nobody/Anonymous

      I am new to POPFile and I speak Portuguese.  I believe I also have the same problem as you.

      I am looking at a new manually tagged as SPAM message that has this inside a HTML message:

      "Cobran=E7a de Inadimplentes pelo Telefone: Efic=E1cia e T=E9cnicas de Negocia=E7=E3o"

      If corrected unescaped, these should look like this:

      "Cobrana de Inadimplementes pelo Telefone:
      Eficcia e Tcnicas de Negociao"

      From that frase, POPFile only recorded the following words: "Inadimplementes", "pelo" and "Telefone".  The other words he seemed to ignore.

      Is this correct?

      Thanks, Henrique.

       
      • John Graham-Cumming

        Right, and I have dealt with this problem in v0.18.0 which will be out shortly.

        John.

         
    • Nobody/Anonymous

      SORRY to original topic poster: I completely misunderstood your first post, so my reply makes no sense.  Please ignore.  Other readers: please help.

      Henrique.

       
    • Nobody/Anonymous

      Thanks John!

      Henrique Pantarotto

       
    • Morgenstern72

      Morgenstern72 - 2007-12-26

      Still no "Ignored Words" for german language?

       
      • Manni

        Manni - 2008-02-04

        No. Don't hold your breath. Stopwords have proved to be quite superflous. POPFile's accuracy doesn't increase or decrease if you purge the complete list. They are still around because users insisted on stopwords and the possibility to edit the list. But I don't think that any more work will go into stopwords.

        Regards,
        Manni

         
        • Brian Smith

          Brian Smith - 2008-02-05

          >> Stopwords have proved to be quite superflous. <<

          I agree. Almost 3 years ago I stopped using stopwords. My current statistics show 15,302 messages classified with 74 errors, yielding a classification accuracy of 99.51%. This is about the same accuracy that I had when I was using stopwords.

          >> POPFile's accuracy doesn't increase or decrease if you purge the complete list. <<

          This is not what I found when I deleted my stopwords file.

          I expected that I would have to do more reclassifying than normal for a while but hoped that things would soon get back to normal. Two months later I was still having to reclassify messages which POPFile had been able to classify reliably up until I removed the stopwords, and my database was getting bigger and bigger without any increase in POPFile's accuracy.

          I decided to just throw away my POPFile database and start again, without any stopwords.

          To my surprise this turned out to be a very good idea because I ended up doing fewer reclassifications this way. POPFile is a quick learner but I was still surprised at how few messages I had to reclassify when starting from scratch again.

          Brian

           

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.