Menu

2 byte character recognition

Help
2003-03-13
2003-03-14
  • Nobody/Anonymous

    I am new to regular expressions.
    Can you guide me to the javadoc for jregex...
    I want to identify 2 byte Japanese characters in a String and replace them with 1 byte characters. Is this possible using jregex?
    Could you please tell me the Pattern and Classes to use for doing so...
    Regards
    Devi

     
    • Sergey A. Samokhodkin

      > I want to identify 2 byte Japanese characters in a String

      The only problem you may encounter is getting the correct String. The jregex should handle it with no problems.
      You work with Japanese string just like with any string:

      String jpString=...;
      String jpPattern=...;
      Pattern myPat=new Pattern(jpPattern);
      Matcher m=myPat.matcher(jpString);
      while(m.find()){ ... }

      As i said above, the most problematic are the things denoted by "...". Getting them working is too big topic to be covered here, so please post the specific problems you have.

      > and replace them with 1 byte characters. Is this possible using jregex?

      There are no such things as 1-byte characters in Java (they are always 2-byte). What do you mean?

      Regards

       
    • Nobody/Anonymous

      Hi Sergey,
         Thank you for the response.
         I created a bit of confusion when I said 1-byte chars. What I meant was characters that lie from 0-255 on the ASCII chart. Sorry about that.

      I tried a little program and I am not getting the desired output...  Can you tell me where I am making a mistake....
      **************************************************
          Pattern p = new Pattern("\"\\w+\"");
          Replacer replacer=p.replacer("\"illegal token\"");

          String str1 = "'102', '102', '102', '102', \"わかりました\"";
          String str2 = "'102', '102', '102', '102', \"abcdegf\"";
          out.println("<br><br>output of replacer str is : "+replacer.replace(str1));
          out.println("<br><br>output of replacer str is : "+replacer.replace(str2));
      *****************************************************

      The output for str2 is correct and "abcdegf" gets replaced. But str1 contains Japanese characters. They do not get replaced.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.