I'd like to know the technical details. I already read about deprecated libraries like pygame.
The fact is that 1.2.14 is far faster then 2.0 branch.
As example, paste a 60KB program into 1.2 happens almost instantly, in 2.0 it takes one minute or so.
It is neither an issue nor a complaint, I use both versions depending on needs, just curious.
Keep up the good work
regards
Stefano
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, thanks for your message. It's true that the 2.x branch is slower. The 1.x branch had become unmaintainable due to circular dependencies, in order to disentangle the code I have made large changes to the structure, including the use of threading to separate interface and engine. Unfortunately the code is not as highly optimised as the 1.x branch was.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I know pygame and numpy are well known for their slowness (even if they have tricks to make some optimizations), so I would have expected the speed difference to be the opposite between the two branches, now I know why, thanks.
I even noticed you compiled every .py files into pyc, that should make a slight speed increase, I do not know how much noticable is in a so complex program like yours (complexity that never stops to amazes me, sincerely).
Could you enlight me on what circular dependencies actually caused to the programming flow of 1.2 branch?
Do you have plans to migrate to Python 3.9? I see 2.0.x still uses 2.7 version, but I don't know if it would better benefit the code or not.
Thank you for your patience.
Stefano
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Actually, numpy is quite well optimized, and faster if you have the sort of task it is meant for - i.e. vectorised calculations. I removed it not so much for performance but to get rid of a large dependency (it was steadily ballooning and added something like 10 MiB to the package) and for maintainability - I want the basic functionality to be available using the python standard library only, but that meant having to maintain separate code for numpy and non-numpy installs and it was just not doable anymore. For similar reasons I'll get rid of the 1.x branch (i.e it'll become unsupported) and python 2.x - I just don't have the time to fix bugs and test on all different platforms. In the end, removing numpy didn't really hurt performance as loop comprehensions can be really fast, if well written. Re pygame, the bits I used were just an API on top of SDL so I removed the middle man. It was fine while it lasted but depended on an old SDL 1.2, and the SDL 2.0 version of pygame lacked some of the 8-bit features that I heavily used, so it wasn't worth migrating for me.
The screen update speed is not CPU bound, it's slower because of the update cycle in the thread that the back-end now needs to wait for. You'll find e.g. in 1.x if you do a large listing it's nearly immediate, but you only seethe last few lines. In 2.x you will see it scrolling, which is closer to what GW-BASIC did.
The dirty secret is that GW-BASIC under DOSbox is much faster than PC-BASIC, so if performance is your concern, you'd better use that. Python is just so much slower than compiled C.
Re maintainability, 1.2 is just a total mess, seriously. You 'd notice once you try to make a change. The circular dependencies are just one example - this relates to module X importing module Y, while module Y in turn imports modules that depend on module X. That works, but as soon as you change something the import order can get changed and things may break, and it becomes horrible to debug.
Including .pyc files only makes a difference on the first run, working from the source Python will produce .pyc files automatically and only rebuild them if there is a change to the source file.
The 2.0 branch uses Python 3.x in the packages and if installed with pip3 - though I haven't yet tested on 3.9. I think 3.7 and 3.8 should work fine. 2.7 will be phased out at some point.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
FWIW, though I didn't benchmark things, I've had good results using PyPy (https://www.pypy.org/) to run PC-BASIC with great speedups. Might be worth a shot depending upon needs.
Last edit: Justin R. Miller 2021-08-12
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks Rob for your reply, much appreciated and clear.
I tried your suggestion and used python 3.9; even if the program seemed to run a bit faster, there was a sort of lag. Something that should do with the way 3.9 handle the timings.
Didn't tried pypy but if ctypes doesn't cause issues, integrate pypy libs into the windows package should help the project, I would say.
I also noticed different memory management between 2.0 and 1.2.
Same 600 lines/50KB program, with FRE(1) command, after RUN and CTRL-C , 2.0 reads 3215 (the exact same amount of GWBASIC v3.23), 1.2 reads 3034 Bytes free.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for checking, not sure what could cause the lag. Regarding the memory management - thanks for reporting, it's more or less expected that 1.2 performs differently. One of the big changes in the 2.0 branch is the way arrays and (in particular) strings are handled, in order to follow the behaviour of the original more closely. It's far from perfect though, some parts of how the memory allocation behaves in GW-BASIC I just haven't been able to understand so far.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Just for fun I benchmarked three version with a simple FOR NEXT cycle, no math involved, just waits:
10 FOR A=1 TO 10
20 T1=TIMER
30 FOR X=1 TO 100000:NEXT
40 T2=TIMER
50 T(A)=T2-T1:?T(A)
60 NEXT:?"Average="(T(1)+T(2)+T(3)+T(4)+T(5)+T(6)+T(7)+T(8)+T(9)+T(10))/10
Same hardware, win10 x64, i7-6700@3.4GHz - 16GB, ssd
the average results are:
v 1.2.14 win = 3.625 sec
v 2.0.3 win = 4.234 sec
v 2.0.3 python 3.9 x64 = 2.554 sec
Unfortunately, I was not able to run it on PYPY, due to dependency fails with winsound. For the records:
Traceback (most recent call last):
File "d:\utility\pypy3.7-v7.3.5-win64\lib-python\3\runpy.py", line 196, in _run_module_as_main
"main", mod_spec)
File "d:\utility\pypy3.7-v7.3.5-win64\lib-python\3\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "d:\utility\pypy3.7-v7.3.5-win64\Scripts\pcbasic.exe__main__.py", line 4, in <module>
from pcbasic import main
File "d:\utility\pypy3.7-v7.3.5-win64\site-packages\pcbasic__init__.py", line 18, in <module>
from .main import run, main
File "d:\utility\pypy3.7-v7.3.5-win64\site-packages\pcbasic\main.py", line 22, in <module>
from .interface import Interface, InitFailed
File "d:\utility\pypy3.7-v7.3.5-win64\site-packages\pcbasic\interface__init__.py", line 22, in <module>
from .audio_beep import AudioBeep
File "d:\utility\pypy3.7-v7.3.5-win64\site-packages\pcbasic\interface\audio_beep.py", line 18, in <module>
import winsound # pylint: disable=import-error
ModuleNotFoundError: No module named 'winsound'</module></module></module></module></module>
Last edit: Theruler76 2021-08-26
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
things chages a little bit when it comes to fill the screen with characters.
Same machine same version under test but with this similar prog:
10 FOR A=1 TO 10
20 T1=TIMER
30 FOR YD=1 TO 23:FOR XD=1 TO 80:LOCATE YD,XD:?CHR$(129+A);:NEXT:NEXT
40 T2=TIMER
50 T(A)=T2-T1
60 NEXT:?"Average="(T(1)+T(2)+T(3)+T(4)+T(5)+T(6)+T(7)+T(8)+T(9)+T(10))/10
v 1.2.14 win = 0.614 sec
v 2.0.3 win = 1.256 sec
v 2.0.3 python 3.9 x64 = 0.875 sec
compiled version actually display the chars changing, but the pure python one outputs nothing and then gives the screen filled with last chars and the average result.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Everton, I do not have experience with your compiler but knew it exists, and that's a great thing!
It could be handy for some quick test i have to do.
I am making some changes in a game I wrote in 2018 (MANOR 2, with permission of Leon Baradat, the 2.22 version is downloadable in his page https://peyre.x10.mx/GWBASIC/index.htm) namely a formula that calculate random natural events and determine how many grain you lose based to a parameter that mitigates the loss (SI).
I test my formulas with most iterations I can get and it would be nice to have a compiled version to do millions in few seconds.
At the moment 50000 iterations in pure python takes 155secs.
Here is the test (with 1MIL for cycle I):
5RANDOMIZETIMER:CLS:?"Statistical distribution of random natural events (SI=0 TO 4) ongoing...":ONERRORGOTO256?:?" SI":?" 0%":?" 10%":?" 20%":?" 30%":?" 40%":?" 50%":?" 60%":?" 70%":?" 80%":?" 90%":?"100%"7T1=TIMER:FORSI=0TO4:G10=0:G20=0:G30=0:G40=0:G50=0:G60=0:G70=0:G80=0:G90=0:G100=0:FORI=1TO100000010EV=INT(RND*5)+1:G=INT(RND*10000):RA=INT((RND*G/2)+(RND*EV*250))/((RND*SI)+1):PERC=RA*100/(G+0.1)11IFPERC<=0THENG0=G0+112IFPERC>0ANDPERC<=10THENG10=G10+113IFPERC>10ANDPERC<=20THENG20=G20+114IFPERC>20ANDPERC<=30THENG30=G30+115IFPERC>30ANDPERC<=40THENG40=G40+116IFPERC>40ANDPERC<=50THENG50=G50+117IFPERC>50ANDPERC<=60THENG60=G60+118IFPERC>60ANDPERC<=70THENG70=G70+119IFPERC>70ANDPERC<=80THENG80=G80+120IFPERC>80ANDPERC<=90THENG90=G90+121IFPERC>90ANDPERC<=100THENG100=G100+122NEXT:LOCATE3,(SI+1)*12:?SI:LOCATE4,(SI+1)*12:?G0:LOCATE5,(SI+1)*12:?G10:LOCATE6,(SI+1)*12:?G20:LOCATE7,(SI+1)*12:?G30:LOCATE8,(SI+1)*12:?G40:LOCATE9,(SI+1)*12:?G5023LOCATE10,(SI+1)*12:?G60:LOCATE11,(SI+1)*12:?G70:LOCATE12,(SI+1)*12:?G80:LOCATE13,(SI+1)*12:?G90:LOCATE14,(SI+1)*12:?G100:NEXT24T2=TIMER:?"Execution time for"I*5"iterations: ";:?INT(T2-T1):END25LOCATE15+SI,1:?"SI="SI" EV="EV" G="G" RA="RA" PERC="PERC:RESUMENEXT
Last edit: Theruler76 2021-08-31
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Due to limitations in the compiler, I had to tweak the code in order to compile it.
This is the final code I compiled.
However I am afraid I broke the math in line 10 when inserting parenthesis for RND(). The compiler is unable to parse RNDnumber, it requires RND(number).
I think I might have mistranslated RNDG/2, RNDEV250, and PERC=RA100/(G+0.1).
Can you help me to fix properly line 10?
1screen05RANDOMIZETIMER:CLS:?"Statistical distribution of random natural events (SI=0 TO 4) ongoing...":'ONERRORGOTO256?:?" SI":?" 0%%":?" 10%%":?" 20%%":?" 30%%":?" 40%%":?" 50%%":?" 60%%":?" 70%%":?" 80%%":?" 90%%":?"100%%"7T1=TIMER:FORSI=0TO4:G10=0:G20=0:G30=0:G40=0:G50=0:G60=0:G70=0:G80=0:G90=0:G100=0:FORI=1TO100000010EV=INT(RND(5))+1:G=INT(RND(10000)):RA=INT((RND(G)/2)+(RND(EV)*250))/((RND(SI))+1):PERC=RA*100/(G+0.1)11IFPERC<=0THENG0=G0+112IFPERC>0ANDPERC<=10THENG10=G10+113IFPERC>10ANDPERC<=20THENG20=G20+114IFPERC>20ANDPERC<=30THENG30=G30+115IFPERC>30ANDPERC<=40THENG40=G40+116IFPERC>40ANDPERC<=50THENG50=G50+117IFPERC>50ANDPERC<=60THENG60=G60+118IFPERC>60ANDPERC<=70THENG70=G70+119IFPERC>70ANDPERC<=80THENG80=G80+120IFPERC>80ANDPERC<=90THENG90=G90+121IFPERC>90ANDPERC<=100THENG100=G100+122NEXT:LOCATE3,12:?SI:LOCATE4,12:?G0:LOCATE5,12:?G10:LOCATE6,12:?G20:LOCATE7,12:?G30:LOCATE8,12:?G40:LOCATE9,12:?G5023LOCATE10,12:?G60:LOCATE11,12:?G70:LOCATE12,12:?G80:LOCATE13,12:?G90:LOCATE14,12:?G100:NEXT24T2=TIMER:?"Execution time for"I5"iterations: ";:?T2-T125whilei$<>"q":?"hit q to exit":i$=input$(1):wend:END250LOCATE15+SI,1:?"SI="SI" EV="EV" G="G" RA="RA" PERC="PERC:RESUMENEXT
1screen05RANDOMIZETIMER:CLS:?"Statistical distribution of random natural events (SI=0 TO 4) ongoing...":'ONERRORGOTO256?:?" SI":?" 0%%":?" 10%%":?" 20%%":?" 30%%":?" 40%%":?" 50%%":?" 60%%":?" 70%%":?" 80%%":?" 90%%":?"100%%"7T1=TIMER:FORSI=0TO4:G10=0:G20=0:G30=0:G40=0:G50=0:G60=0:G70=0:G80=0:G90=0:G100=0:FORI=1TO100000010EV=INT(RND*5)+1:G=INT(RND*10000):RA=INT((RND*G/2)+(RND*EV*250))/((RND*SI)+1):PERC=RA*100/(G+0.1)11IFPERC<=0THENG0=G0+112IFPERC>0ANDPERC<=10THENG10=G10+113IFPERC>10ANDPERC<=20THENG20=G20+114IFPERC>20ANDPERC<=30THENG30=G30+115IFPERC>30ANDPERC<=40THENG40=G40+116IFPERC>40ANDPERC<=50THENG50=G50+117IFPERC>50ANDPERC<=60THENG60=G60+118IFPERC>60ANDPERC<=70THENG70=G70+119IFPERC>70ANDPERC<=80THENG80=G80+120IFPERC>80ANDPERC<=90THENG90=G90+121IFPERC>90ANDPERC<=100THENG100=G100+122NEXT:LOCATE3,(SI+1)*12:?SI:LOCATE4,(SI+1)*12:?G0:LOCATE5,(SI+1)*12:?G10:LOCATE6,(SI+1)*12:?G20:LOCATE7,(SI+1)*12:?G30:LOCATE8,(SI+1)*12:?G40:LOCATE9,(SI+1)*12:?G5023LOCATE10,(SI+1)*12:?G60:LOCATE11,(SI+1)*12:?G70:LOCATE12,(SI+1)*12:?G80:LOCATE13,(SI+1)*12:?G90:LOCATE14,(SI+1)*12:?G100:NEXT24T2=TIMER:?"Execution time for"I*5"iterations: ";:?T2-T125whilei$<>"q":?"hit q to exit":i$=input$(1):wend:END26LOCATE15+SI,1:?"SI="SI" EV="EV" G="G" RA="RA" PERC="PERC:RESUMENEXT
My Windows environment is temporarily broken. When it gets fixed, I will be able to send you the compiled code for Windows, or the directions to run the compiler on Windows by yourself.
Everton
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'd like to know the technical details. I already read about deprecated libraries like pygame.
The fact is that 1.2.14 is far faster then 2.0 branch.
As example, paste a 60KB program into 1.2 happens almost instantly, in 2.0 it takes one minute or so.
It is neither an issue nor a complaint, I use both versions depending on needs, just curious.
Keep up the good work
regards
Stefano
Hi, thanks for your message. It's true that the 2.x branch is slower. The 1.x branch had become unmaintainable due to circular dependencies, in order to disentangle the code I have made large changes to the structure, including the use of threading to separate interface and engine. Unfortunately the code is not as highly optimised as the 1.x branch was.
I know pygame and numpy are well known for their slowness (even if they have tricks to make some optimizations), so I would have expected the speed difference to be the opposite between the two branches, now I know why, thanks.
I even noticed you compiled every .py files into pyc, that should make a slight speed increase, I do not know how much noticable is in a so complex program like yours (complexity that never stops to amazes me, sincerely).
Could you enlight me on what circular dependencies actually caused to the programming flow of 1.2 branch?
Do you have plans to migrate to Python 3.9? I see 2.0.x still uses 2.7 version, but I don't know if it would better benefit the code or not.
Thank you for your patience.
Stefano
2.0.x uses Python 2.7 if you install it for 2.7. Just install it for Python 3.
Actually, numpy is quite well optimized, and faster if you have the sort of task it is meant for - i.e. vectorised calculations. I removed it not so much for performance but to get rid of a large dependency (it was steadily ballooning and added something like 10 MiB to the package) and for maintainability - I want the basic functionality to be available using the python standard library only, but that meant having to maintain separate code for numpy and non-numpy installs and it was just not doable anymore. For similar reasons I'll get rid of the 1.x branch (i.e it'll become unsupported) and python 2.x - I just don't have the time to fix bugs and test on all different platforms. In the end, removing numpy didn't really hurt performance as loop comprehensions can be really fast, if well written. Re pygame, the bits I used were just an API on top of SDL so I removed the middle man. It was fine while it lasted but depended on an old SDL 1.2, and the SDL 2.0 version of pygame lacked some of the 8-bit features that I heavily used, so it wasn't worth migrating for me.
The screen update speed is not CPU bound, it's slower because of the update cycle in the thread that the back-end now needs to wait for. You'll find e.g. in 1.x if you do a large listing it's nearly immediate, but you only seethe last few lines. In 2.x you will see it scrolling, which is closer to what GW-BASIC did.
The dirty secret is that GW-BASIC under DOSbox is much faster than PC-BASIC, so if performance is your concern, you'd better use that. Python is just so much slower than compiled C.
Re maintainability, 1.2 is just a total mess, seriously. You 'd notice once you try to make a change. The circular dependencies are just one example - this relates to module X importing module Y, while module Y in turn imports modules that depend on module X. That works, but as soon as you change something the import order can get changed and things may break, and it becomes horrible to debug.
Including .pyc files only makes a difference on the first run, working from the source Python will produce .pyc files automatically and only rebuild them if there is a change to the source file.
The 2.0 branch uses Python 3.x in the packages and if installed with pip3 - though I haven't yet tested on 3.9. I think 3.7 and 3.8 should work fine. 2.7 will be phased out at some point.
FWIW, though I didn't benchmark things, I've had good results using PyPy (https://www.pypy.org/) to run PC-BASIC with great speedups. Might be worth a shot depending upon needs.
Last edit: Justin R. Miller 2021-08-12
Great tip, thanks! Did everything work or are there gaps? I seem to recall there used to be issues with ctypes under PyPy, but that was a while ago...
Thanks Rob for your reply, much appreciated and clear.
I tried your suggestion and used python 3.9; even if the program seemed to run a bit faster, there was a sort of lag. Something that should do with the way 3.9 handle the timings.
Didn't tried pypy but if ctypes doesn't cause issues, integrate pypy libs into the windows package should help the project, I would say.
I also noticed different memory management between 2.0 and 1.2.
Same 600 lines/50KB program, with FRE(1) command, after RUN and CTRL-C , 2.0 reads 3215 (the exact same amount of GWBASIC v3.23), 1.2 reads 3034 Bytes free.
Thanks for checking, not sure what could cause the lag. Regarding the memory management - thanks for reporting, it's more or less expected that 1.2 performs differently. One of the big changes in the 2.0 branch is the way arrays and (in particular) strings are handled, in order to follow the behaviour of the original more closely. It's far from perfect though, some parts of how the memory allocation behaves in GW-BASIC I just haven't been able to understand so far.
Just for fun I benchmarked three version with a simple FOR NEXT cycle, no math involved, just waits:
10 FOR A=1 TO 10
20 T1=TIMER
30 FOR X=1 TO 100000:NEXT
40 T2=TIMER
50 T(A)=T2-T1:?T(A)
60 NEXT:?"Average="(T(1)+T(2)+T(3)+T(4)+T(5)+T(6)+T(7)+T(8)+T(9)+T(10))/10
Same hardware, win10 x64, i7-6700@3.4GHz - 16GB, ssd
the average results are:
v 1.2.14 win = 3.625 sec
v 2.0.3 win = 4.234 sec
v 2.0.3 python 3.9 x64 = 2.554 sec
Unfortunately, I was not able to run it on PYPY, due to dependency fails with winsound. For the records:
Traceback (most recent call last):
File "d:\utility\pypy3.7-v7.3.5-win64\lib-python\3\runpy.py", line 196, in _run_module_as_main
"main", mod_spec)
File "d:\utility\pypy3.7-v7.3.5-win64\lib-python\3\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "d:\utility\pypy3.7-v7.3.5-win64\Scripts\pcbasic.exe__main__.py", line 4, in <module>
from pcbasic import main
File "d:\utility\pypy3.7-v7.3.5-win64\site-packages\pcbasic__init__.py", line 18, in <module>
from .main import run, main
File "d:\utility\pypy3.7-v7.3.5-win64\site-packages\pcbasic\main.py", line 22, in <module>
from .interface import Interface, InitFailed
File "d:\utility\pypy3.7-v7.3.5-win64\site-packages\pcbasic\interface__init__.py", line 22, in <module>
from .audio_beep import AudioBeep
File "d:\utility\pypy3.7-v7.3.5-win64\site-packages\pcbasic\interface\audio_beep.py", line 18, in <module>
import winsound # pylint: disable=import-error
ModuleNotFoundError: No module named 'winsound'</module></module></module></module></module>
Last edit: Theruler76 2021-08-26
things chages a little bit when it comes to fill the screen with characters.
Same machine same version under test but with this similar prog:
10 FOR A=1 TO 10
20 T1=TIMER
30 FOR YD=1 TO 23:FOR XD=1 TO 80:LOCATE YD,XD:?CHR$(129+A);:NEXT:NEXT
40 T2=TIMER
50 T(A)=T2-T1
60 NEXT:?"Average="(T(1)+T(2)+T(3)+T(4)+T(5)+T(6)+T(7)+T(8)+T(9)+T(10))/10
v 1.2.14 win = 0.614 sec
v 2.0.3 win = 1.256 sec
v 2.0.3 python 3.9 x64 = 0.875 sec
compiled version actually display the chars changing, but the pure python one outputs nothing and then gives the screen filled with last chars and the average result.
I experimented with compiling your bechmark code to native code with my toy compiler for basic: https://github.com/udhos/basgo
Not filling the screen, your code took Average= 0.00011061930126743391
Filling the screen, it took: Average= 0.1786323386011645
Compiled to native executable, under Linux, x86_64, i7-8750H CPU @ 2.20GHz
Hi Everton, I do not have experience with your compiler but knew it exists, and that's a great thing!
It could be handy for some quick test i have to do.
I am making some changes in a game I wrote in 2018 (MANOR 2, with permission of Leon Baradat, the 2.22 version is downloadable in his page https://peyre.x10.mx/GWBASIC/index.htm) namely a formula that calculate random natural events and determine how many grain you lose based to a parameter that mitigates the loss (SI).
I test my formulas with most iterations I can get and it would be nice to have a compiled version to do millions in few seconds.
At the moment 50000 iterations in pure python takes 155secs.
Here is the test (with 1MIL for cycle I):
Last edit: Theruler76 2021-08-31
Hi,
Due to limitations in the compiler, I had to tweak the code in order to compile it.
This is the final code I compiled.
However I am afraid I broke the math in line 10 when inserting parenthesis for RND(). The compiler is unable to parse RNDnumber, it requires RND(number).
I think I might have mistranslated RNDG/2, RNDEV250, and PERC=RA100/(G+0.1).
Can you help me to fix properly line 10?
This is the output:
Everton
Yeah, sorry. I should have used codebox like you did. (I edited last post)
there is a moltiplication symbol between RND ang G variable.
This is the output from compiled code:
This is the code with minor changes:
My Windows environment is temporarily broken. When it gets fixed, I will be able to send you the compiled code for Windows, or the directions to run the compiler on Windows by yourself.
Everton
It reads right? less then half a second? The power of machine code.
Not a problem Everton, but thanks. I'd like to be able to compile myself.
Cheers
Yes, it reads right.
I have sketched this recipe for compiling in Windows:
https://github.com/udhos/basgo/blob/master/README-windows.md
If you face difficulties with the recipe, you can reach me at: everton.marques(at)gmail.com
Everton
Very comprehensive guide, Everton. thank you so much.