Welcome to Open Discussion

2005-02-18
2013-04-08
  • Nobody/Anonymous

    Welcome to Open Discussion

     
    • H.X.T

      H.X.T - 2009-05-05

      Hi , Few days ago, I found this framework for the creation of the VSM.

      these day , I was working for a simple chinese text classification(TC) system.

      I used WVTool in my system. However ,I found a problem.

      When I used WVTool In English for TC, it's ok.

      but, whem it comes to chinese ,there is problem.

      You assume that the length of a word should greater than 2.

      That's ok for english.

      In Chinese, The Word is composed of  Characters.

      A character likes a letter.

      In Chinese, the length of the most of chinese words only have 2 characters.

      You put the assumptiong in the Class AbstractStemmer, so ,even the DummyStemmer Class was set in the config ,which supposed to do nothing , in fact filters all the word whose length are less than 3.

      So, when I used this lib in my system, I found I got nothing in the wordlist.

      I was confused util I saw the source code of the AbstractStemmer.

      P.S. WordFilter contains this assumption too.

      So, as a universal lab, not only for the english , do you think there is anything should be done for this problem?

       
      • H.X.T

        H.X.T - 2009-05-05

        Sorry , I made a mistake,

        the Class made that assumption is the

        AbstractStopWordFilter

         

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks