From: Panayotis K. <pan...@pa...> - 2010-02-25 23:25:57
|
Hello, I tried to create a String with UTF-8 characters inside, but instead the parser produces a series of escaped numbers, none of which makes any sense. It is not the unicode hex equivalent, or the decimal equivalent. Is there any way to properly do this? |
From: Sascha H. <sa...@xm...> - 2010-02-26 08:24:08
|
Hi Panayotis, can you give me a concrete example on how to reproduce this? Thank you // Sascha On Fri, Feb 26, 2010 at 12:25 AM, Panayotis Katsaloulis < pan...@pa...> wrote: > Hello, > > I tried to create a String with UTF-8 characters inside, but instead the > parser produces a series of escaped numbers, none of which makes any sense. > It is not the unicode hex equivalent, or the decimal equivalent. > Is there any way to properly do this? > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > xmlvm-users mailing list > xml...@li... > https://lists.sourceforge.net/lists/listinfo/xmlvm-users > |
From: Panayotis K. <pan...@pa...> - 2010-02-26 11:43:31
Attachments:
Main.java
|
On 26 Φεβ 2010, at 10:23 ΠΜ, Sascha Haeberling wrote: > Hi Panayotis, > > can you give me a concrete example on how to reproduce this? > > Thank you > // Sascha Yes, just enter any utf-8 string in a System.out directive For simplicity, I've attached a demo java source, which can be found (it is under package "test"). Under Java it properly displays "Δοκιμή", while after the conversion it is something like r4w7w2w1w4u6 and in the source code @"\1624\1677\1672\1671\1674\1656" I really don't know why it is like this, but a rough suggestion is that the character is escaped but not with something like "\u" |
From: Stand T. <sta...@gm...> - 2010-02-26 14:42:35
|
So there are a few issues going on here. The first one is that Java will not escape those characters when compiling. _you_ must do so, either by going online and doing it on a website; by running your file with native2ascii; or by writing a converter that will read the file and convert the unicode characters to the ascii escape sequences. The second problem is: what exactly do you mean by it displays properly? What OS are you using? Standard System.out.println will not display any UTF-8 characters in a Windows cmd. There are flavors of linux and even some mac users that tout xterm or a custom version of the terminal that displays unicode characters, however, still going back to the Java class, it must be compiled with the escaped characters to begin with. Java, natively, handles everything in Unicode, but the compiler only compiles against ascii...even if the Java file, itself is Unicode (UTF-8 or 16). So, you would need to take your class and do "native2ascii Main.java Main2.java" or whatever you want it and it will then take the Greek text below and transform it to the escaped sequence. System.out.println("\u00ce\u201d\u00ce\u00bf\u00ce\u00ba\u00ce\u00b9\u00ce\u00bc\u00ce\u00ae"); This still will not display in the sysout as anything but garbage. To help me debug what issues you might be seeing and how to reproduce the issue, please include the following: 1) OS - Windows (include if it's a special version, e.g. the Japanese OS, or what have you), Mac and the version, or flavor of linux 2) how you're viewing the sout - is it in an xterminal you're running your code with? or a windows cmd? An IDE debugger pane? 3) Which version of Java you're compiling and running against. This part isn't as important since unicode in Java will run just about the same on all of the JDKs since 1.1 - but might help thx timo On Fri, Feb 26, 2010 at 5:43 AM, Panayotis Katsaloulis < pan...@pa...> wrote: > > On 26 Φεβ 2010, at 10:23 ΠΜ, Sascha Haeberling wrote: > > Hi Panayotis, >> >> can you give me a concrete example on how to reproduce this? >> >> Thank you >> // Sascha >> > > Yes, just enter any utf-8 string in a System.out directive > > For simplicity, I've attached a demo java source, which can be found (it is > under package "test"). > > Under Java it properly displays "Δοκιμή", while after the conversion it is > something like r4w7w2w1w4u6 and in the source code > @"\1624\1677\1672\1671\1674\1656" > > I really don't know why it is like this, but a rough suggestion is that the > character is escaped but not with something like "\u" > > |
From: Sascha H. <sa...@xm...> - 2010-02-26 14:47:01
|
Thank you Time for the good tips, I think you are right. However, one thing I am curious about: Would this example work in Objective-C if you put this string in an .m file? Because we could do some conversion when we take the XMLVM file and convert it to Objective-C. The question is, should it be UTF-8 in the .m file or should it be escaped there? // Sascha On Fri, Feb 26, 2010 at 3:35 PM, Stand Trooper <sta...@gm...>wrote: > So there are a few issues going on here. > > The first one is that Java will not escape those characters when > compiling. _you_ must do so, either by going online and doing it on a > website; by running your file with native2ascii; or by writing a converter > that will read the file and convert the unicode characters to the ascii > escape sequences. > > The second problem is: what exactly do you mean by it displays properly? > What OS are you using? Standard System.out.println will not display any > UTF-8 characters in a Windows cmd. There are flavors of linux and even some > mac users that tout xterm or a custom version of the terminal that displays > unicode characters, however, still going back to the Java class, it must be > compiled with the escaped characters to begin with. > > Java, natively, handles everything in Unicode, but the compiler only > compiles against ascii...even if the Java file, itself is Unicode (UTF-8 or > 16). > > So, you would need to take your class and do "native2ascii Main.java > Main2.java" or whatever you want it and it will then take the Greek text > below and transform it to the escaped sequence. > > > System.out.println("\u00ce\u201d\u00ce\u00bf\u00ce\u00ba\u00ce\u00b9\u00ce\u00bc\u00ce\u00ae"); > > This still will not display in the sysout as anything but garbage. > > To help me debug what issues you might be seeing and how to reproduce the > issue, please include the following: > 1) OS - Windows (include if it's a special version, e.g. the Japanese OS, > or what have you), Mac and the version, or flavor of linux > 2) how you're viewing the sout - is it in an xterminal you're running your > code with? or a windows cmd? An IDE debugger pane? > 3) Which version of Java you're compiling and running against. This part > isn't as important since unicode in Java will run just about the same on all > of the JDKs since 1.1 - but might help > > thx > > timo > > > > On Fri, Feb 26, 2010 at 5:43 AM, Panayotis Katsaloulis < > pan...@pa...> wrote: > >> >> On 26 Φεβ 2010, at 10:23 ΠΜ, Sascha Haeberling wrote: >> >> Hi Panayotis, >>> >>> can you give me a concrete example on how to reproduce this? >>> >>> Thank you >>> // Sascha >>> >> >> Yes, just enter any utf-8 string in a System.out directive >> >> For simplicity, I've attached a demo java source, which can be found (it >> is under package "test"). >> >> Under Java it properly displays "Δοκιμή", while after the conversion it is >> something like r4w7w2w1w4u6 and in the source code >> @"\1624\1677\1672\1671\1674\1656" >> >> I really don't know why it is like this, but a rough suggestion is that >> the character is escaped but not with something like "\u" >> >> > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > xmlvm-users mailing list > xml...@li... > https://lists.sourceforge.net/lists/listinfo/xmlvm-users > > |
From: Panayotis K. <pan...@pa...> - 2010-02-26 15:10:51
|
On 26 Φεβ 2010, at 4:35 ΜΜ, Stand Trooper wrote: > To help me debug what issues you might be seeing and how to > reproduce the issue, please include the following: > 1) OS - Windows (include if it's a special version, e.g. the > Japanese OS, or what have you), Mac and the version, or flavor of > linux > 2) how you're viewing the sout - is it in an xterminal you're > running your code with? or a windows cmd? An IDE debugger pane? > 3) Which version of Java you're compiling and running against. This > part isn't as important since unicode in Java will run just about > the same on all of the JDKs since 1.1 - but might help I know all the problems of utf-8 in java files and know these (and other) conversion solutions. The problem is not with display issues but with bad conversion by xmlvm. I am a long time supporter of UTF-8 and I have helped a lot to properly handle UTF-8 and especially greek encodings under Linux and X11. Just for reference though, since you asked it, I am under Mac OSX. On 26 Φεβ 2010, at 4:46 ΜΜ, Sascha Haeberling wrote: > Thank you Time for the good tips, I think you are right. > > However, one thing I am curious about: Would this example work in > Objective-C if you put this string in an .m file? Because we could > do some conversion when we take the XMLVM file and convert it to > Objective-C. The question is, should it be UTF-8 in the .m file or > should it be escaped there? > > // Sascha Exactly that's my point. Java compiler (with a proper -encoding option) is able to handle non- ASCII source files for some time now. So the characters inside the class are correct (and display correctly if run as java application). The conversion though has problems and produces something with errors. Sascha, as you asked, if I use the same UTF-8 characters in a .m file, (i.e. if I go to the source code and replace it), then the file is compiled correctly. |
From: Panayotis K. <pan...@pa...> - 2010-03-05 12:35:33
|
So any idea, how to correctly pass a unicode character to obj-c? Let's take for example the euro symbol "€", how can create an application which will display it? |
From: Gergely K. <ger...@ma...> - 2010-03-05 13:34:44
|
Hi, I don't have much time, so just a quick FYI: I think we included some UTF-8 fixes in our patch. I am sorry that I cannot be more specific now, I would need to check what we did exactly, but I am sure that our application loads and displays UTF-8 text correctly (german characters, copyright symbol ...etc.) I will try to sum it up when I get the time. Best Regards, Gergely 2010/3/5 Panayotis Katsaloulis <pan...@pa...> > So any idea, how to correctly pass a unicode character to obj-c? > Let's take for example the euro symbol "€", how can create an > application which will display it? > -- Kis Gergely MattaKis Consulting Email: ger...@ma... Web: http://www.mattakis.com Phone: +36 70 408 1723 Fax: +36 27 998 622 |
From: Sascha H. <sa...@xm...> - 2010-03-21 12:48:03
|
Hi guys, does any of you have some update on UTF-8? I am planning to look into this problem now in order so solve it as soon as possible, but don't want to duplicate any of your work. // Sascha On Fri, Mar 5, 2010 at 2:34 PM, Gergely Kis <ger...@ma...>wrote: > Hi, > > I don't have much time, so just a quick FYI: I think we included some UTF-8 > fixes in our patch. I am sorry that I cannot be more specific now, I would > need to check what we did exactly, but I am sure that our application loads > and displays UTF-8 text correctly (german characters, copyright symbol > ...etc.) > I will try to sum it up when I get the time. > > Best Regards, > Gergely > > 2010/3/5 Panayotis Katsaloulis <pan...@pa...> > > So any idea, how to correctly pass a unicode character to obj-c? >> Let's take for example the euro symbol "€", how can create an >> application which will display it? >> > > > -- > Kis Gergely > MattaKis Consulting > Email: ger...@ma... > Web: http://www.mattakis.com > Phone: +36 70 408 1723 > Fax: +36 27 998 622 > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > xmlvm-users mailing list > xml...@li... > https://lists.sourceforge.net/lists/listinfo/xmlvm-users > > |