The compiler appears to sometimes fall through the end of an ELSE IF block into the ELSE, around line 82 of this program. I'm using the 1.26-55-unstable.apk. The "DEAD CODE" is printed without the "*" being printed first or the Curr variable being incremented. Weird.
#!/usr/bin/xbasicProgramIncrementalSieveShowKInput"How many primes? ",N%HideKStart=TimerDimPrimes(N%)DimPrNext(N%)ClrPrCount%,Curr,Temp,i%,j%,k%Curr=1WhilePrCount%<N%IncCurrIfPrCount%=0ThenPrintCurr;" is our first prime"Primes(PrCount%)=CurrPrNext(PrCount%)=Curr*CurrIncPrCount%ElseIfCurr<PrNext(0)ThenPrimes(PrCount%)=CurrPrNext(PrCount%)=Curr*CurrIncPrCount%PrintCurr;" is our ";SelectPrCount%Mod100Case1,21,31,41,51,61,71,81,91PrintPrCount%;"st prime"Case2,22,32,42,52,62,72,82,92PrintPrCount%;"nd prime"Case3,23,33,43,53,63,73,83,93PrintPrCount%;"rd prime"DefaultPrintPrCount%;"th prime"EndSelect' No sift-up should ever be needed, no matter whati%=PrCount%-1IfPrNext(i%)<PrNext(ShR(i%-1,1))ThenPrint"Heap error!"StopEndIfElseIfCurr=PrNext(0)ThenPrintCurr;" is composite:";' Look for its prime factorsWhileCurr=PrNext(0)Print" ";Primes(0);' Be on lookout for the next multipleTemp=Primes(0)' Print Temp, Primes(0), PrNext(0),AddPrNext(0),Temp!+Temp' PrNext(0) = PrNext(0) + Temp' Print PrNext(0)' Now sift it down the heapClri%,j%,k%Repeatj%=i%Fork%=i%*2+1Toi%*2+2IfNot((i%<=j%)And(j%<k%))ThenStopElseIfNot(k%<PrCount%)Then' Skip, no such child!No-opElseIfPrNext(k%)<PrNext(j%)Then' Child is smallerj%=k%EndIfNextk%Ifi%<j%ThenSwapPrimes(i%),Primes(j%)SwapPrNext(i%),PrNext(j%)i%=j%Else' We're doneClri%,j%,k%EndIfUntili%=0WendPrint" / Next is ";PrNext(0);PrintElse' This should never happenPrint"DEAD CODE",Curr,PrNext(0),Curr-PrNext(0)' StopEndIfPrint"* ";WendPrintPrCount%;" primes found in ";Round(Timer-Start,2)Pause20' Heap integrity checkFori%=PrCount%-1DownTo1IfPrNext(i%)<PrNext(ShR(i%-1,1))ThenPrint"Heap error!"StopEndIfNexti%' Currently not in Primes%() order, heap sort is an optionFori%=0ToPrCount%-1Printi%,Primes(i%),PrNext(i%)Nexti%Print"OK"End
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Confirmed. This is a bad thing and I need to look into it. My suspicion at the moment is, that something is wrong with the SELECT CASE ENDSELECT statements. Compiler messes this up... Will see, be patient.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It is not SELECT / ENDSELECT. Here is a smaller program reproducing this error:
a=1
b=3
if a=0
print "a0"
ELSE if a<b
print "a<b"
if c>0
print "c>0"
ENDIF
ELSE if a=b
print "a=b"
ELSE
PRINT "DEAD code"
ENDIF
quit
The problem is the inner IF block. If c>0 is flase it jumps to the ELSE IF a=b, which is wrong. If one adds an empty line between the ENDIF and ELSE if a=b, then it works.
Last edit: Markus Hoffmann 2019-06-11
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you very much for tracking this down. Am I the only one beating on xbasic like this? Are others just working around bugs and not reporting them? For over ten years?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In the last week I only got one crash report via email and your detected errors here in this forum. And one issue on the issue tracker of github. So thank you all.
The user base of X11-Basic is small, I think, however the X11-Basic app for Android had 50000+ installs (before it was removed from the G* play store).
The number of installs are not counted by F-droid, so I dont know how popular that app actually is. I am happy that X11-Basic found its way into the OPenSuse and Fedora linux distribution recently. Also some seem to use it on the Raspberry-Pi.
I am not sure, if the user base has been larger in the past than it is today.
BASIC is not really tought anymore.
Nowerdays there is python, which is a de-facto standard and has taken over.
Only very few people helped be find and fix bugs. Maybe the users live with them if they find some, maybe they do workarounds or give up. Or maybe not every feature will be used. (Since I use X11-BASIC still a lot, I have found most of the bugs myself, but only for a very narrow programming style.)
You have used X11-Basic in very interesting ways.
I knew, that the compiler is still not perfect. But the interpreter should be in better shape. So I was surprized, that you have found these quirks. But such are things, debugging is a never ending story....
open source projects are most often one- or few-men-shows. Especially old-school BASIC is something the younger people do not like anymore. They want object oriented stuff.
Last edit: Markus Hoffmann 2019-06-11
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The trouble with Python, like many scripting languages and the funkier features of RFO Basic, is unexpected complexity, like C's strlen(), which can surprise people with its need to scan the entire string just to find its length, and who knows what Java and C++ are doing under the hood. At least most modern dialects of BASIC seem to hit a sweet spot between hand-holding like bounds checking and automatic memory management and predictability. Things like Len(S$) that one would expect to take constant time even for Space$(4000000000) should. I have not tested this in particular with xbasic, though my longest palindrome substrings scanner seemed fast enough with strings of length 40 000 000 or so even on a low end device. The longest palindromic subsequences scanner is a different story. It can find one of the longest palindromic subsequences in reasonable time, but it cannot yet find all of them in reasonable time while verifying the correctness of the original calculation.
Last edit: Yet Another Troll 2019-06-11
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Great! Is there a speed advantage to using it, perhaps when adding to a very long string? When A$ might be very long, in VB it seemed to help to parenthesize multiple string catenations like this, A$ = A$ + (B$ + C$ + D$)
Is that how ADD A$, B$ + C$ + D$ will work? Maybe with some luck realloc() can expand A$ in place?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Wut. X11 BASIC strings are clearly not just C ASCIIZ, but what are they? Did I resort to PEEK too soon, before I really dug into xbasic strings? Shortly after this screenshot, xbasic aborted and dumped me back to my home screen.
As you can see, the excecution time of the LEN() operation is independant of the length of the string. on the other hand, dealing with longer and longer Strings will
1. fragment the systems memory, so the opertion system has a lot to do (allocating and freeing memory blocks), thats why your loop with the LEN() takes different times each time, depends on background operations.
2. Eventually all of the RAM is eaten up and then, first the operating system slows down and finally the app crashes or is stooed by the operating system to free the memory.
3. Sting assignment and evaluation expressions ist costly: A copy of S$ will be made, maybe even more copys when adding something to it and finally the old content of s$ is freed and then the new is put to the variable.
A string in X11-Basic is a memory buffer of a certain lenght, It can be as large as 4 Gigabytes, which is the limit for 32bit lengths.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2019-06-12
Oh, I was wrong. The LEN() operation is indeed dependent of the string content lenght, because the string will be duplicated before the length is evaluated. And the duplication involves allocating new memory and copying the whole content into it. This is not necessarx in this particular case, but what about: a=LEN(b$+s$) ?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In this particular case, it could be optimized to a = len(b$) + len(c$) but I don't know if you want to open that door. Next you'll be doing loop unrolling and array bounds checking elimination in FOR loops when the lower and upper bounds of the loop are between 0 and ubound - 1, where does it end? Where is the string length kept anyway?
Could the string be passed to LEN and other string commands by VAR, and expressions stuffed in an anonymous temporary stack variable which is cleared afterwards? I think that's what I did back in my compilers class.
Last edit: Yet Another Troll 2019-06-13
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The string length is kept in an internal structure, not officially accessable. There are other information stored as well, like the type of the variable, its name, if they are local or not etc, static or dynamic... the adress of this structure could be returned by arrptr(), but arrptr() only allows arrays as argument.
Last edit: Markus Hoffmann 2019-06-13
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
len does not operate on variables, but on content. a string itself need not be linked to a variable, it can as well be an intermediate result in an expression.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Aha. The door is thus opened to refcounting, weakrefs, copy-on-write (or call to VarPtr) magic memory, potentially making passing even very long strings or huge arrays by value in recursive subroutines nearly as fast as by VAR, arbitrary lower and upper bounds in arrays, bounds checked read-only or write-through aliased sub-ranges of arrays for sorting experiments, what else? Much of the needed infrastructure is already in place.
VarPtr should force a copy of a string or array if its hypothetical refcount is greater than one, plus flagging it somehow to prevent new shared references, preventing the scenario I presented before,
a$ = "foo"
a% = varptr(a$)
b$ = a$ ! do both a$ and b$ now share the same refcounted buffer?
poke a%, asc("b")
print b$
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Are other string operations, such as MID$, LEFT$, RIGHT$, also affected? There are cases where I figured MID$(Huge$, i%, 1) would be faster than
CHR$(PEEK(VPHuge% + i% - 1) AND 0xFF) but I could have done the test instead of typing this BRB
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
No, the string content of Huge$ would get copied by the parser. So a PEEK(VP) is faster in nearly any case. On the other hand the eypression (CHR$, PEEK, AND, +, .) is more complicated, so for small strings probably the MID$() would be faster.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, the break-even point is somewhere between 32KB and 64KB strings. I could simplify the expression some, but Peek sometimes sign-extends, and I don't want to get too clever with the VP or i%, lose easy eyeball correspondence between the Peek and its Asc Mid equivalent, and end up turning a difficult debugging task impossible. I've done it to myself before.
Should this be broken out into a new topic, MIDnight madness, Loony bin LENacy?
Too many edits: Both Peek and Asc will return negative numbers. Is this documented? The example in the help for Peek hints at it, but there's nothing in the help for Asc about it, even though Asc(Chr$(254)) returns -2.
Last edit: Yet Another Troll 2019-06-13
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, ASC returns a signed byte, which is later converted to int and then double. Do you think it is a problem? Should it be like this or would the user prefer unsigned values in any case?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Does existing code depend on this? I don't mind peppering And 0xFF everywhere or using a DefFn UAsc(S$) = Asc(S$) And 0xFF but the current behavior is bad news if arrays can't have negative lower bounds.
I cannot post a new topic for some reason. No error message, it just fails. Strange.
Sample code time. Asc and Peek could return -128..127 if Dim Alpha%(-128..127) could be possible, no problem. Otherwise, this is a bug,
' Belt, suspenders, drones, crane, cargo helicopter, orbital skyhookDimAlpha%(256)ClrAlpha%()Fori%=1ToLen(Alpha$)' Parallelizable, but would it be worth it?' Don't convert this to PEEK, just in caseAlpha%(Asc(Mid$(Alpha$,i%,1)))=True!Oops,noAnd255Nexti%
Last edit: Yet Another Troll 2019-06-14
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I think I see the problem. LEN works reasonably for reasonably sized strings, but my tests started off unreasonably and only got worse. I do tend to play in the corner cases, don't I. Where is the string length stored in memory? You don't seem to play exactly the same pointer offset games Microsoft did with their BASICs. Maybe you store it past the end of the string and use the malloc buffer size to find it, but for very long strings, this blows up somehow. Black box testing is fun.
I think I'm seeing it in the interpreter too in the new Windows build, X11-BASIC Version: 1.27 Jul 1 2019 15:30:25, Windows version 10.0.17763.593 32-bit. I'll try to confirm.
Never mind, it was a bug in my own code. I miss VB/VBA's Option Explicit.
Last edit: Yet Another Troll 2019-07-02
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The compiler appears to sometimes fall through the end of an ELSE IF block into the ELSE, around line 82 of this program. I'm using the 1.26-55-unstable.apk. The "DEAD CODE" is printed without the "*" being printed first or the Curr variable being incremented. Weird.
Confirmed. This is a bad thing and I need to look into it. My suspicion at the moment is, that something is wrong with the SELECT CASE ENDSELECT statements. Compiler messes this up... Will see, be patient.
It is not SELECT / ENDSELECT. Here is a smaller program reproducing this error:
The problem is the inner IF block. If c>0 is flase it jumps to the ELSE IF a=b, which is wrong. If one adds an empty line between the ENDIF and ELSE if a=b, then it works.
Last edit: Markus Hoffmann 2019-06-11
Thank you very much for tracking this down. Am I the only one beating on xbasic like this? Are others just working around bugs and not reporting them? For over ten years?
In the last week I only got one crash report via email and your detected errors here in this forum. And one issue on the issue tracker of github. So thank you all.
The user base of X11-Basic is small, I think, however the X11-Basic app for Android had 50000+ installs (before it was removed from the G* play store).
The number of installs are not counted by F-droid, so I dont know how popular that app actually is. I am happy that X11-Basic found its way into the OPenSuse and Fedora linux distribution recently. Also some seem to use it on the Raspberry-Pi.
I am not sure, if the user base has been larger in the past than it is today.
BASIC is not really tought anymore.
Nowerdays there is python, which is a de-facto standard and has taken over.
Only very few people helped be find and fix bugs. Maybe the users live with them if they find some, maybe they do workarounds or give up. Or maybe not every feature will be used. (Since I use X11-BASIC still a lot, I have found most of the bugs myself, but only for a very narrow programming style.)
You have used X11-Basic in very interesting ways.
I knew, that the compiler is still not perfect. But the interpreter should be in better shape. So I was surprized, that you have found these quirks. But such are things, debugging is a never ending story....
open source projects are most often one- or few-men-shows. Especially old-school BASIC is something the younger people do not like anymore. They want object oriented stuff.
Last edit: Markus Hoffmann 2019-06-11
The trouble with Python, like many scripting languages and the funkier features of RFO Basic, is unexpected complexity, like C's strlen(), which can surprise people with its need to scan the entire string just to find its length, and who knows what Java and C++ are doing under the hood. At least most modern dialects of BASIC seem to hit a sweet spot between hand-holding like bounds checking and automatic memory management and predictability. Things like Len(S$) that one would expect to take constant time even for Space$(4000000000) should. I have not tested this in particular with xbasic, though my longest palindrome substrings scanner seemed fast enough with strings of length 40 000 000 or so even on a low end device. The longest palindromic subsequences scanner is a different story. It can find one of the longest palindromic subsequences in reasonable time, but it cannot yet find all of them in reasonable time while verifying the correctness of the original calculation.
Last edit: Yet Another Troll 2019-06-11
String concatenation is often slow in Basic. Can ADD be extended to help in future versions?
OOps, it looks like, i have broken something with the recent fixes... Need to work on the error.
Error is fixed. ADD with strings should be fine in the next release 1.26-57.
Great! Is there a speed advantage to using it, perhaps when adding to a very long string? When A$ might be very long, in VB it seemed to help to parenthesize multiple string catenations like this, A$ = A$ + (B$ + C$ + D$)
Is that how ADD A$, B$ + C$ + D$ will work? Maybe with some luck realloc() can expand A$ in place?
yes exactly. a copy of the content of a$ is not needed when adding b$+c$+d$ to it.
Wut. X11 BASIC strings are clearly not just C ASCIIZ, but what are they? Did I resort to PEEK too soon, before I really dug into xbasic strings? Shortly after this screenshot, xbasic aborted and dumped me back to my home screen.
As you can see, the excecution time of the LEN() operation is independant of the length of the string. on the other hand, dealing with longer and longer Strings will
1. fragment the systems memory, so the opertion system has a lot to do (allocating and freeing memory blocks), thats why your loop with the LEN() takes different times each time, depends on background operations.
2. Eventually all of the RAM is eaten up and then, first the operating system slows down and finally the app crashes or is stooed by the operating system to free the memory.
3. Sting assignment and evaluation expressions ist costly: A copy of S$ will be made, maybe even more copys when adding something to it and finally the old content of s$ is freed and then the new is put to the variable.
A string in X11-Basic is a memory buffer of a certain lenght, It can be as large as 4 Gigabytes, which is the limit for 32bit lengths.
Oh, I was wrong. The LEN() operation is indeed dependent of the string content lenght, because the string will be duplicated before the length is evaluated. And the duplication involves allocating new memory and copying the whole content into it. This is not necessarx in this particular case, but what about: a=LEN(b$+s$) ?
In this particular case, it could be optimized to a = len(b$) + len(c$) but I don't know if you want to open that door. Next you'll be doing loop unrolling and array bounds checking elimination in FOR loops when the lower and upper bounds of the loop are between 0 and ubound - 1, where does it end? Where is the string length kept anyway?
Could the string be passed to LEN and other string commands by VAR, and expressions stuffed in an anonymous temporary stack variable which is cleared afterwards? I think that's what I did back in my compilers class.
Last edit: Yet Another Troll 2019-06-13
The string length is kept in an internal structure, not officially accessable. There are other information stored as well, like the type of the variable, its name, if they are local or not etc, static or dynamic... the adress of this structure could be returned by arrptr(), but arrptr() only allows arrays as argument.
Last edit: Markus Hoffmann 2019-06-13
len does not operate on variables, but on content. a string itself need not be linked to a variable, it can as well be an intermediate result in an expression.
Aha. The door is thus opened to refcounting, weakrefs, copy-on-write (or call to VarPtr) magic memory, potentially making passing even very long strings or huge arrays by value in recursive subroutines nearly as fast as by VAR, arbitrary lower and upper bounds in arrays, bounds checked read-only or write-through aliased sub-ranges of arrays for sorting experiments, what else? Much of the needed infrastructure is already in place.
VarPtr should force a copy of a string or array if its hypothetical refcount is greater than one, plus flagging it somehow to prevent new shared references, preventing the scenario I presented before,
a$ = "foo"
a% = varptr(a$)
b$ = a$ ! do both a$ and b$ now share the same refcounted buffer?
poke a%, asc("b")
print b$
Are other string operations, such as MID$, LEFT$, RIGHT$, also affected? There are cases where I figured MID$(Huge$, i%, 1) would be faster than
CHR$(PEEK(VPHuge% + i% - 1) AND 0xFF) but I could have done the test instead of typing this BRB
No, the string content of Huge$ would get copied by the parser. So a PEEK(VP) is faster in nearly any case. On the other hand the eypression (CHR$, PEEK, AND, +, .) is more complicated, so for small strings probably the MID$() would be faster.
Yes, the break-even point is somewhere between 32KB and 64KB strings. I could simplify the expression some, but Peek sometimes sign-extends, and I don't want to get too clever with the VP or i%, lose easy eyeball correspondence between the Peek and its Asc Mid equivalent, and end up turning a difficult debugging task impossible. I've done it to myself before.
Should this be broken out into a new topic, MIDnight madness, Loony bin LENacy?
Too many edits: Both Peek and Asc will return negative numbers. Is this documented? The example in the help for Peek hints at it, but there's nothing in the help for Asc about it, even though Asc(Chr$(254)) returns -2.
Last edit: Yet Another Troll 2019-06-13
Yes, ASC returns a signed byte, which is later converted to int and then double. Do you think it is a problem? Should it be like this or would the user prefer unsigned values in any case?
Does existing code depend on this? I don't mind peppering And 0xFF everywhere or using a DefFn UAsc(S$) = Asc(S$) And 0xFF but the current behavior is bad news if arrays can't have negative lower bounds.
I cannot post a new topic for some reason. No error message, it just fails. Strange.
Sample code time. Asc and Peek could return -128..127 if Dim Alpha%(-128..127) could be possible, no problem. Otherwise, this is a bug,
Last edit: Yet Another Troll 2019-06-14
I think I see the problem. LEN works reasonably for reasonably sized strings, but my tests started off unreasonably and only got worse. I do tend to play in the corner cases, don't I. Where is the string length stored in memory? You don't seem to play exactly the same pointer offset games Microsoft did with their BASICs. Maybe you store it past the end of the string and use the malloc buffer size to find it, but for very long strings, this blows up somehow. Black box testing is fun.
I think I'm seeing it in the interpreter too in the new Windows build, X11-BASIC Version: 1.27 Jul 1 2019 15:30:25, Windows version 10.0.17763.593 32-bit. I'll try to confirm.
Never mind, it was a bug in my own code. I miss VB/VBA's Option Explicit.
Last edit: Yet Another Troll 2019-07-02