but my data corruption paranoia seems to have slowed this palindrome finder down at least as much as replacing MID$ with PEEK and POKE has speeded it up. At least it no longer uses four bytes per character with tens of millions of characters like the version using a Corpus%() array instead of Corpus$ did.
This uses a naive O(N * N) palindrome matching algorithm instead of Manacher's or one of the parallelizable variants, but it's much simpler to analyze. There's a SPAWN command I'm looking at, but how can SPAWNed threads communicate results back without stepping all over each other?
Be on the lookout for lingering debugging cruft,
ProgramPalindromesInput"How long? ",N%Start=TimerAlpha$="GATTACA"LenA%=Len(Alpha$)' Belt, suspenders, drones, crane, cargo helicopter, orbital skyhookDimAlpha%(256)ClrAlpha%()Fori%=1ToLen(Alpha$)' Parallelizable, but would it be worth it?Alpha%(Asc(Mid$(Alpha$,i%,1)))=TrueNexti%Corpus$=Space$(N%+2)!MakeroomforSentinelsatfrontandendPrint"Allocated ";Len(Corpus$);" in ";Timer-StartClri%,j%,k%' Randomize' Watch out on 64-bitVPA%=VarPtr(Alpha$)VPC%=VarPtr(Corpus$)PokeVPC%,Asc("{")!InsistongenuineTraskIndustriesSentinelsPokeVPC%+N%+1,Asc("}")Fori%=1ToN%' Parallelizable, I guessj%=Random(LenA%)' k% = Asc(Mid$(Alpha$, j% + 1, 1))k%=Peek(VPA%+j%)' Caution, this sanity check adds significant overheadIfNotAlpha%(k%)!InStr(Alpha$,Chr$(k%))=0ThenPrint"Invalid character at 0x";Hex$(VPA%);" + ";j%,k%StopEndIfPokeVPC%+i%,k%Nexti%Print"Corpus$ initialized in ";Timer-StartIfN%<=600ThenPrint"'";Corpus$;"'"EndIfClrB0rk3d%IfLeft$(Corpus$,1)<>"{"ThenPrint"Left sentinel missing!"B0rk3d%=TrueEndIfIfRight$(Corpus$,1)<>"}"ThenPrint"Right sentinel missing!"B0rk3d%=TrueEndIfIfB0rk3d%ThenPrint"Where are my dragons?!"StopEndIf' Corpus$ = Corpus$ + "}"IfGlob(Corpus$,"{*[!"+Alpha$+"]*}")ThenPrint"Invalid characters!"StopEndIfPrint"Sanity checked in ";Timer-StartClri%,j%,k%,l%,Lo%,Hi%' Hi% = 1 ! Skip uninteresting one or two character 'palindromes'VPC%=VarPtr(Corpus$)Forj%=N%DownTo1!1ToN%!Eitherwayworksfine' Consider divide and conquer recursive thread launcherClri%,k%,l%' Caution, this sanity check adds significant overheadk%=Peek(VPC%+j%)IfNotAlpha%(k%)!InStr(Alpha$,Chr$(k%))=0ThenPrint"Invalid character at 0x";Hex$(VPC%);" + ";j%,k%StopEndIfForl%=0To1' I can haz free thread?' Naive check for palindrome, will blow up on certain long patterns' When l% = 0, checks for odd length palindrome centered at j%' When l% = 1, checks for even length palindrome centered at j%-1,j%i%=j%k%=j%-l%' Caution, loop depends upon the sentinels to stopWhilePeek(VPC%+i%-1)=Peek(VPC%+k%+1)Deci%Inck%Wend' Synchronize Swatches hereIf0AndNot((i%<j%)And(j%<k%))Then' Ignore one or two character 'palindromes'!No-opElseIf(Hi%-Lo%)<=(k%-i%)Then' This would be the critical sectionLo%=i%Hi%=k%' Offsets for PEEK are zero based but strings are one based.' Palindrome display includes surrounding characters, possibly' even a sentinel. Delete the '- 1' and '+ 2' to elide them.Printi%,k%,"'";Mid$(Corpus$,i%+1-1,k%-i%+1+2);"'"EndIfNextl%Nextj%Print"Done in ";Timer-StartPrint"OK"End
Last edit: Yet Another Troll 2019-05-31
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
A first try at cloning MS-style lvalue MID. I prioritize safety over speed or replicating the overlapping range behavior. Hopefully xbasic strings aren't copy on write reference counted magic memory, or if they are, VARPTR triggers a string copy if the refcount is greater than one.
VARPTR() just returns the current address of a string. It does not trigger any other action. The strings adress stays the same, as long as it will not have been assigned any new value.
In case a string is passed by reference, the receiving variable gets the same memory pointer than the calling varible (same VARPTR). Insteadof passing a string by reference, You could also have passd a pointer variable (adr%=VARPTR(s$)) by value, and then do the PEEK/POKE. This way, the original variable (s$) cannot change inside the procedure, and the adress is garanteed to stay the same.
More interesting would be a test, if the content of a variable can change if changed inside a procedure where it has been passed to by reference and where it was assigned a new value.
Last edit: Markus Hoffmann 2019-06-06
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
By default, the assignment operator ("=") makes a copy of the content of the variable. For strings this means, the memory area is duplicated. If you do not want this, you may use the ABSOLUTE command. This can link two Variables together so that they have the same content (and the same VARPTR). I have never used this feature. It might however be usefull when using the SPAWN command, or EVERY and AFTER. ABSOLUTE is more useful when accessing a (foreign) memory location like a variable.
You could p.ex. allocate a shared memory segment and put a variable into it by using the ABSOLUTE command. This way it can be access by another process.
Last edit: Markus Hoffmann 2019-06-06
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Could be very useful for bottom-up approximate string matching dynamic programming algorithms like Smith-Waterman DNA alignment that build up many copies of large strings very quickly, but ABSOLUTE looks very dangerous. I went with a separate string table array that's resized as needed, with the ginormous matrix of strings converted to a matrix of integers pointing into the string array. It cut down the memory use by nearly two orders of magnitude! It's a good thing I remembered that trick from sixth grade and fiddling around with old 8-bit BASIC.
but my data corruption paranoia seems to have slowed this palindrome finder down at least as much as replacing MID$ with PEEK and POKE has speeded it up. At least it no longer uses four bytes per character with tens of millions of characters like the version using a Corpus%() array instead of Corpus$ did.
This uses a naive O(N * N) palindrome matching algorithm instead of Manacher's or one of the parallelizable variants, but it's much simpler to analyze. There's a SPAWN command I'm looking at, but how can SPAWNed threads communicate results back without stepping all over each other?
Be on the lookout for lingering debugging cruft,
Last edit: Yet Another Troll 2019-05-31
A first try at cloning MS-style lvalue MID. I prioritize safety over speed or replicating the overlapping range behavior. Hopefully xbasic strings aren't copy on write reference counted magic memory, or if they are, VARPTR triggers a string copy if the refcount is greater than one.
Last edit: Yet Another Troll 2019-06-06
VARPTR() just returns the current address of a string. It does not trigger any other action. The strings adress stays the same, as long as it will not have been assigned any new value.
In case a string is passed by reference, the receiving variable gets the same memory pointer than the calling varible (same VARPTR). Insteadof passing a string by reference, You could also have passd a pointer variable (adr%=VARPTR(s$)) by value, and then do the PEEK/POKE. This way, the original variable (s$) cannot change inside the procedure, and the adress is garanteed to stay the same.
More interesting would be a test, if the content of a variable can change if changed inside a procedure where it has been passed to by reference and where it was assigned a new value.
Last edit: Markus Hoffmann 2019-06-06
I was mainly afraid of an optimization for large strings, seen in languages like Java and C# and some C++ string classes, causing an issue like this,
A$ = Space$(128 * 4096)
A% = VarPtr(A$)
B$ = A$
Poke A%, Asc("A")
Print Left$(A$, 1), Left$(B$, 1)
Both A$ and B$ may see the change, depending upon fiddly language implementation details.
Is this kind of what you mean? It appears to work as intended, at least in Windows and Android
By default, the assignment operator ("=") makes a copy of the content of the variable. For strings this means, the memory area is duplicated. If you do not want this, you may use the ABSOLUTE command. This can link two Variables together so that they have the same content (and the same VARPTR). I have never used this feature. It might however be usefull when using the SPAWN command, or EVERY and AFTER. ABSOLUTE is more useful when accessing a (foreign) memory location like a variable.
You could p.ex. allocate a shared memory segment and put a variable into it by using the ABSOLUTE command. This way it can be access by another process.
Last edit: Markus Hoffmann 2019-06-06
Could be very useful for bottom-up approximate string matching dynamic programming algorithms like Smith-Waterman DNA alignment that build up many copies of large strings very quickly, but ABSOLUTE looks very dangerous. I went with a separate string table array that's resized as needed, with the ginormous matrix of strings converted to a matrix of integers pointing into the string array. It cut down the memory use by nearly two orders of magnitude! It's a good thing I remembered that trick from sixth grade and fiddling around with old 8-bit BASIC.
https://en.m.wikipedia.org/wiki/Smith%E2%80%93Waterman_algorithm