From: Ian L. <dr...@gm...> - 2010-05-02 18:50:20
|
Even if the Utf8 functions were slower, I think simplicity would still be more of a factor than speed in this case. How often is performance dependent on manipulating strings in a tight loop? Better a performance hit than having to explain Unicode to a new programmer. -Ian On Sun, May 2, 2010 at 1:42 PM, james reneau <ji...@re...> wrote: > Wow, > > You are right Ian. I just ran the following benchmark and the U functions > were faster 9 to 16 seconds. I ran it multiple times (and reversed) and got > the same results. I will change all of the existing string functions to the > new UNICODE safe code, remove my test/working U functions, and will commit > later. > > I am glad you were there to tell me to keep it simple. > > Jim > > a$ = "abcdef ghijklmno pqrstuvwxyzabcd efghijklmnopqr stuvwxyzabcd > efghijklmnopq rs tuvwxyza bcdefghijklm nopqrstuvwxyza bcdefghijklmno > pqrstuvwxyzabcde fghijklmnopqrstuvwxy zabcdefghijklmnopq rstuvwxyzabcdefgh > ijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwx yzabcdefghijklmnopqrstuvwxyza > bcdefghijklmnopq rstuvwxyzabcdefghi jklmnopqrstuvwxyz" > > start = hour * 60 * 60 + minute * 60 + second > > for r = 1 to 1000 > > for n = 1 to ulength(a$) > > c = useq(umid(a$,n,1)) > > c$ = uchr(c) > > b$ = uleft(a$,n) > > b$ = uright(a$,n) > > next n > > next r > > finish = hour * 60 * 60 + minute * 60 + second > > print finish - start > > start = hour * 60 * 60 + minute * 60 + second > > for r = 1 to 1000 > > for n = 1 to length(a$) > > c = asc(mid(a$,n,1)) > > c$ = chr(c) > > b$ = left(a$,n) > > b$ = right(a$,n) > > next n > > next r > > finish = hour * 60 * 60 + minute * 60 + second > > print finish - start > > > On Sun, May 2, 2010 at 1:26 PM, james reneau <ji...@re...> wrote: >> >> Ian, >> >> The U functions do a whole bunch of additional conversion from the char* >> in utf8 to qstring and then back again to char* utf8. For those of us in >> the ASCII/English world I thought it would be slower. Havn't done a >> benchmark. Let me do one before I push. >> >> Jim >> >> >> >> On Sun, May 2, 2010 at 12:51 PM, Ian Larsen <dr...@gm...> wrote: >>> >>> For the sake of simplicity, why don't you just have your U* functions >>> just replace the default? For ASCII strings they should do the same >>> thing as the regular ones. >>> >>> -Ian >>> >>> On Sun, May 2, 2010 at 12:45 PM, james reneau <ji...@re...> wrote: >>> > Guys, >>> > >>> > I have gotten the save and load to work with UTF8 and I am adding >>> > string >>> > functions to handle unicode (ULENGTH, USEQ, >>> UCHR, UMUD, ULEFT, URIGHT, >>> > UINSTR) and should have them committed later today. >>> > >>> > Jim >>> > >>> > On Sat, May 1, 2010 at 5:39 PM, Ian Larsen <dr...@gm...> wrote: >>> >> >>> >> Here's a screenshot of the result. >>> >> >>> >> On Sat, May 1, 2010 at 5:34 PM, Ian Larsen <dr...@gm...> wrote: >>> >> > It's even simpler than that. You just have to change most of the >>> >> > QString::toAscii calls to QString::toUtf8. >>> >> > >>> >> > My changes are committed. I've tried to test everything I could but >>> >> > more extensive testing would ensure I've got everything. If you see >>> >> > question marks instead of extended characters anywhere, please let >>> >> > me >>> >> > know. >>> >> > >>> >> > -Ian >>> >> > >>> >> > On Sat, May 1, 2010 at 4:57 PM, james reneau <ji...@re...> wrote: >>> >> >> Ian, >>> >> >> >>> >> >> That was my thought, too. I was going to email you to see if we >>> >> >> could >>> >> >> change all the char* stuff in the stack and interpreter to >>> >> >> QStrings? >>> >> >> >>> >> >> Looking forward to your commit. >>> >> >> >>> >> >> Jim >>> >> >> >>> >> >> On Fri, Apr 30, 2010 at 9:59 PM, Ian Larsen <dr...@gm...> >>> >> >> wrote: >>> >> >>> >>> >> >>> All, >>> >> >>> >>> >> >>> I was wrong about Flex; it handles Utf8 just fine. The problem >>> >> >>> was >>> >> >>> with the way QStrings were being converted. I have a working >>> >> >>> version >>> >> >>> that I'm going to test some more and commit tomorrow. >>> >> >>> >>> >> >>> -Ian >>> >> >>> >>> >> >>> On Fri, Apr 30, 2010 at 5:53 PM, Ian Larsen <dr...@gm...> >>> >> >>> wrote: >>> >> >>> > All, >>> >> >>> > >>> >> >>> > I believe the reason you're seeing the question marks in the >>> >> >>> > output >>> >> >>> > is >>> >> >>> > because Gnu Flex and Bison, which the basic256 parser is written >>> >> >>> > in, >>> >> >>> > doesn't support Unicode at all. >>> >> >>> > >>> >> >>> > There are no simple fixes for this, unfortunately. Here are >>> >> >>> > some >>> >> >>> > possibilities: >>> >> >>> > >>> >> >>> > 1) Encode ALL strings in a program's source code using base64 >>> >> >>> > and >>> >> >>> > then >>> >> >>> > decode them prior to pushing them onto the operand stack. This >>> >> >>> > is >>> >> >>> > an >>> >> >>> > ugly hack, but right now would be the path of least resistance. >>> >> >>> > 2) Find a drop-in replacement for Flex and Bison that supports >>> >> >>> > Unicode >>> >> >>> > 3) Write a custom parser that supports Unicode. This would be a >>> >> >>> > *lot* >>> >> >>> > of work, but would be a lot of fun for someone interested in >>> >> >>> > learning >>> >> >>> > compiler design. >>> >> >>> > >>> >> >>> > If anyone has any other ideas, please let me know. >>> >> >>> > >>> >> >>> > -Ian >>> >> >>> > >>> >> >>> > >>> >> >>> > >>> >> >>> > On Fri, Apr 30, 2010 at 11:06 AM, <web...@bi...> >>> >> >>> > wrote: >>> >> >>> >> Ian, >>> >> >>> >> >>> >> >>> >> I am very glad that you have returned to the development >>> >> >>> >> BASIC256! >>> >> >>> >> I >>> >> >>> >> would just tell you about a serious problem that exists for >>> >> >>> >> users >>> >> >>> >> who >>> >> >>> >> use the Russian language. Attached - screenshot. >>> >> >>> >> >>> >> >>> >> I made a patch for version 0.9.5 which was published 12/2009 >>> >> >>> >> for >>> >> >>> >> the >>> >> >>> >> distribution of ALT Linux. Of course, this patch is not urgent, >>> >> >>> >> since >>> >> >>> >> you have done a lot of changes. Can I ask you to make necessary >>> >> >>> >> changes (because I have little experience) or the provision of >>> >> >>> >> Russian-speaking users - only my problem? :-) >>> >> >>> >> >>> >> >>> >>> On this list about two weeks ago we got a french translation >>> >> >>> >>> if >>> >> >>> >>> anyone >>> >> >>> >>> would like to add that in. If not, I'll get around to it >>> >> >>> >>> eventually. >>> >> >>> >> >>> >> >>> >> I have a little more experience, so it's better if you did. >>> >> >>> >> >>> >> >>> >> -- >>> >> >>> >> Blessing, >>> >> >>> >> Sergei Irupin >>> >> >>> >> http://rnd-lug.blogspot.com/ >>> >> >>> > >>> >> >>> > >>> >> >>> > >>> >> >>> > -- >>> >> >>> > My PGP Public Key: >>> >> >>> > http://www.scrapshark.com/pubkey.txt >>> >> >>> > >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> -- >>> >> >>> My PGP Public Key: >>> >> >>> http://www.scrapshark.com/pubkey.txt >>> >> >>> >>> >> >> >>> >> >> >>> >> > >>> >> > >>> >> > >>> >> > -- >>> >> > My PGP Public Key: >>> >> > http://www.scrapshark.com/pubkey.txt >>> >> > >>> >> >>> >> >>> >> >>> >> -- >>> >> My PGP Public Key: >>> >> http://www.scrapshark.com/pubkey.txt >>> > >>> > >>> >>> >>> >>> -- >>> My PGP Public Key: >>> http://www.scrapshark.com/pubkey.txt >>> >> > > -- My PGP Public Key: http://www.scrapshark.com/pubkey.txt |