There are many ways to loop a program in Great Cow BASIC
for next
do loop
repeat end repeat
These loops all do similar things but the question was 'the performance of each loop?'
Is it a really complex answer. Each type of loop has many parameters that impact the performance and the compiler optimises the assembly based on the specific conditions of the definition of each loop.
A simple example showing all three types of loops follows. This uses the LGT chipset because I have the greatest control over the frequency but any chip will work.
Code
#option Explicit#chip LGT8F328P, 1 '#include <millis.h> ' Include the Library'USART settings for USART1#define USART_BAUD_RATE 9600#define USART_TX_BLOCKING#define USART_DELAY OFFDimcountasbyteDimCurMs,LstMsasword' declare working variables' Main 'Thislooprunsoverandoverforever.LstMs=0CurMs=0Wait2sHSerPrintCRLFLstMs=millis()Forcount=0to254'wait 1 msNextCurMs=millis()HSerPrintCurMs-LstMsHSerPrintCrlfLstMs=millis()count=0dountilcount=254count++'wait 1 msloopCurMs=millis()HSerPrintCurMs-LstMsHSerPrintCrlfLstMs=millis()count=255Repeatcount'wait 1 msEndRepeatCurMs=millis()HSerPrintCurMs-LstMsHSerPrintCrlf
Yields
8:17:22.869> 2
8:17:22.869> 3
8:17:22.869> 2
The time and the number of millisecs per loop type. 2ms for the for-next and the repeat loop. With the do-loop taking 3ms.
note the frequency.. it is very low to get a meaningful result.
Consider the code.
Remember - each loop type is handled within the compiler by different sections of the compiler. And, the methods used are specific to PIC or AVR/LGT and there is heavy optimisation of the assembly generated per chip family, variables/constant used, performance of chip in terms of handling specific assembly instructions - the performance of the LGT over an AVR is very different, and the memory access methods per chip family.
Hugh wrote the compiler sections many years ago, and, I have maintained the level of optimisation as I have revised the compiler. Hugh's work is stunningly good.
Oh boy .. this is complex.
for-next loop. This example uses For count = 0 to 254 where the startvalue and endvalue are constants. The compiler handles this by using the constants in the assembly, but consider if startvalue and endvalue where not constants but variables (byte, word, long) then timing to complete the loop will increase as with each type of variable. If STEP is included then the loop timing increases even more.
The fastest for-next loop? Use constants
do-loop. The user program looks simple. But, on test this is the slowest. Why? I am very surprised to. But, the compiler generates assembly for this chip uses different branches and this must account for the performance decrease. I would be interested for other to test this on other chip families - these branches will perform differently on other chip families.
repeat-end repeat. The tests show this performs as good as the the for-next loop. The assembly is very different from the other two loops. However, this loop is the least optimised loop. The loop has little optimisation for use of variable types or constants. So, you can rely of this loop in terms of performance - it will use the same approach every time.
So, now consider a for-next loop with the range of 0 to 1024. What is faster ?
1.
for loopvar = 0 to 1024
or
2.
for loopvar1 = 0 to 255
for loopvar 2 = 0 to 3
or
3.
do loop loopvar1 < 1025
.... well is it 2 the multiple for next loops - these are bytes and they are constants - highly optimised. Hence, the GLCD code uses nested for-next loops to increase performance.
Hope this helps. An interesting but complex subject.
And, I hope the analysts out there wade into the specifics of each chip to dig even deeper into the performance of each instruction.
Last edit: Anobium 2022-02-19
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I tried repeat and it took 566mS ... 4 times faster?!
Can repeat be nested?
Does the compiler see if the repeat count is <= 255 and use a byte otherwise it uses a word?
I should try byte values for count1 and count2.
Using Repeat seems a clear winner for speed from what I found.
I've never used Repeat but think I will as sometimes I need code to run as fast as possible.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thinking though, only for-next lets you use the counter value for using arrays.
do loop and repeat would need an extra variable as a pointer and need incrementing so in practical terms it depends.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Nothing's complex in gcb ;) That's why we use it!
No-one knows what the compiler does with our code.
I stick with arduino 328 for consistency so I got a working , I hope, target model.
It's fast enough for most stuff but I try to some times get it to be as efficient/fast as possible.
And oh dear , I did use repeat, I just found out. I thought to optimise ili9341 code cos lots of for -next . This was ripped from the glcd.
make sprite_size a constant not a var
Same program ... same frequency.. different chip ,, an 18F27Q84
Same program ... same frequency.. different chip ,, an 18F25K50
Same program ... same frequency.. different chip ,, an 18F24K42
That is interesting. I would expect all of them to produce pretty much the same timing.
The K50 is a "classic 18F", and seems to be on par with the 16F1778 timing, give or take.
For simple loop code that's probably to be expected.
The K42 and Q84 are the new "xv18" core devices, and appear to be almost twice as slow!
Is the asm radically different than the K50? In theory it shouldn't be...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Jerry it would take a much larger analysis to compare - I was testing the results.
Many things are different, interrupt cache being one of the major differences. The newer chips have smart interrupt caching, but, this may come at the cost of performance?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The newer chips have smart interrupt caching, but, this may come at the cost of performance?
If anything I would think that should make them faster since all of the cpu regs are saved in a single cycle. Plus, doesn't the millis() tick operate at 1ms? If so, that would only be a handful of interrupts during the test.
Do the new chips use a lot of MOVFFL instructions? That might account for the difference, but I wouldn't think that code would need to use it since everything should be within the MOVFF range.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Only if the src/dest is outside the 4K range of MOVFF. MOVFFL is larger and slower, so I prefer to only use it as req'd.
The initial xv18 chips (ie K42, K83) put the SFR registers way at the top of memory, so MOVFFL was needed. The newer ones moved them down to bank 0 where MOVFF can get to them. Much better IMHO.
Any interns hanging around??
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have a pic 18f25K22 to test as it's 64MHz and don't pics do an instruction every 4 clocks?
that's the nearest to a mega328p at 16MHz but does an instruction every clock,
or is there no comparison?
Just wondered how much the 328p is optimised. GCB was always pic orientated imho.
Was gcb ported from pic to 328p or was it ground up?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Good. Arduino came out in 2005 so hoping 328p was considered serious and when gcb came out arduino was established so someone thought let's write for that not oh well we'll have to support it and not put the same effort as for pics.
I like 328p and gcb, no probs, well happy, not moaning.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Cool, looks interesting. thanks. When I installed it I backed up gcb.
I will now need latest version. What version of GCB am I getting please.
I don't have a clue about this. I thought it was about using alternate ide like Geany... wrong!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There are many ways to loop a program in Great Cow BASIC
These loops all do similar things but the question was 'the performance of each loop?'
Is it a really complex answer. Each type of loop has many parameters that impact the performance and the compiler optimises the assembly based on the specific conditions of the definition of each loop.
A simple example showing all three types of loops follows. This uses the LGT chipset because I have the greatest control over the frequency but any chip will work.
Code
Yields
8:17:22.869> 2
8:17:22.869> 3
8:17:22.869> 2
The time and the number of millisecs per loop type. 2ms for the for-next and the repeat loop. With the do-loop taking 3ms.
note the frequency.. it is very low to get a meaningful result.
Consider the code.
Remember - each loop type is handled within the compiler by different sections of the compiler. And, the methods used are specific to PIC or AVR/LGT and there is heavy optimisation of the assembly generated per chip family, variables/constant used, performance of chip in terms of handling specific assembly instructions - the performance of the LGT over an AVR is very different, and the memory access methods per chip family.
Hugh wrote the compiler sections many years ago, and, I have maintained the level of optimisation as I have revised the compiler. Hugh's work is stunningly good.
Oh boy .. this is complex.
for-next loop. This example uses For count = 0 to 254 where the startvalue and endvalue are constants. The compiler handles this by using the constants in the assembly, but consider if startvalue and endvalue where not constants but variables (byte, word, long) then timing to complete the loop will increase as with each type of variable. If STEP is included then the loop timing increases even more.
The fastest for-next loop? Use constants
do-loop. The user program looks simple. But, on test this is the slowest. Why? I am very surprised to. But, the compiler generates assembly for this chip uses different branches and this must account for the performance decrease. I would be interested for other to test this on other chip families - these branches will perform differently on other chip families.
repeat-end repeat. The tests show this performs as good as the the for-next loop. The assembly is very different from the other two loops. However, this loop is the least optimised loop. The loop has little optimisation for use of variable types or constants. So, you can rely of this loop in terms of performance - it will use the same approach every time.
So, now consider a for-next loop with the range of 0 to 1024. What is faster ?
1.
for loopvar = 0 to 1024
or
2.
for loopvar1 = 0 to 255
for loopvar 2 = 0 to 3
or
3.
do loop loopvar1 < 1025
.... well is it 2 the multiple for next loops - these are bytes and they are constants - highly optimised. Hence, the GLCD code uses nested for-next loops to increase performance.
Hope this helps. An interesting but complex subject.
And, I hope the analysts out there wade into the specifics of each chip to dig even deeper into the performance of each instruction.
Last edit: Anobium 2022-02-19
Same program ... same frequency.. different chip ,, an 18F27Q84
6ms
7ms
3ms
Interesting.
Same program ... same frequency.. different chip ,, an 18F25K50
4ms
4ms
2ms
Even more interesting
Same program ... same frequency.. different chip ,, an 18F24K42
6ms
8ms
3ms
But, cannot try this using an Arduino... stuck at 16mhz!
Last edit: Anobium 2022-02-19
Same program ... same frequency.. different chip ,, an 16F1778 - go figure this one!!
3ms
4ms
1ms
Last edit: Anobium 2022-02-19
I tried this and the for-next took 2233 mS
the do-loop took 2075 mS
Do-loop was always faster.
I'll try repeat .
I tried repeat and it took 566mS ... 4 times faster?!
Can repeat be nested?
Does the compiler see if the repeat count is <= 255 and use a byte otherwise it uses a word?
Last edit: stan cartwright 2022-02-19
I tried this and it took 2228 mS.
I should try byte values for count1 and count2.
Using Repeat seems a clear winner for speed from what I found.
I've never used Repeat but think I will as sometimes I need code to run as fast as possible.
Thinking though, only for-next lets you use the counter value for using arrays.
do loop and repeat would need an extra variable as a pointer and need incrementing so in practical terms it depends.
Told you it was complex!
Nothing's complex in gcb ;) That's why we use it!
No-one knows what the compiler does with our code.
I stick with arduino 328 for consistency so I got a working , I hope, target model.
It's fast enough for most stuff but I try to some times get it to be as efficient/fast as possible.
And oh dear , I did use repeat, I just found out. I thought to optimise ili9341 code cos lots of for -next . This was ripped from the glcd.
make sprite_size a constant not a var
Last edit: stan cartwright 2022-02-19
That is interesting. I would expect all of them to produce pretty much the same timing.
The K50 is a "classic 18F", and seems to be on par with the 16F1778 timing, give or take.
For simple loop code that's probably to be expected.
The K42 and Q84 are the new "xv18" core devices, and appear to be almost twice as slow!
Is the asm radically different than the K50? In theory it shouldn't be...
Jerry it would take a much larger analysis to compare - I was testing the results.
Many things are different, interrupt cache being one of the major differences. The newer chips have smart interrupt caching, but, this may come at the cost of performance?
If anything I would think that should make them faster since all of the cpu regs are saved in a single cycle. Plus, doesn't the millis() tick operate at 1ms? If so, that would only be a handful of interrupts during the test.
Do the new chips use a lot of MOVFFL instructions? That might account for the difference, but I wouldn't think that code would need to use it since everything should be within the MOVFF range.
Re MOVFFL nope... some new chips need this always. Found this last week.
It would be a good project for an Intern!
Only if the src/dest is outside the 4K range of MOVFF. MOVFFL is larger and slower, so I prefer to only use it as req'd.
The initial xv18 chips (ie K42, K83) put the SFR registers way at the top of memory, so MOVFFL was needed. The newer ones moved them down to bank 0 where MOVFF can get to them. Much better IMHO.
Any interns hanging around??
Q43 needs it regardless.
I have a pic 18f25K22 to test as it's 64MHz and don't pics do an instruction every 4 clocks?
that's the nearest to a mega328p at 16MHz but does an instruction every clock,
or is there no comparison?
Just wondered how much the 328p is optimised. GCB was always pic orientated imho.
Was gcb ported from pic to 328p or was it ground up?
AVR is optimised and is the PIC
I do not know which came first. Back in 2009 the code supported AVR and PIC.... so, my guess from day one.
Good. Arduino came out in 2005 so hoping 328p was considered serious and when gcb came out arduino was established so someone thought let's write for that not oh well we'll have to support it and not put the same effort as for pics.
I like 328p and gcb, no probs, well happy, not moaning.
No moaning, no probs.
Uber fast UNO coding..... the new IDE
YouTube for you Stan
https://youtu.be/095AIvr7b_A
Cool, looks interesting. thanks. When I installed it I backed up gcb.
I will now need latest version. What version of GCB am I getting please.
I don't have a clue about this. I thought it was about using alternate ide like Geany... wrong!
In the 1st video 328p needed no ,16 ie 328p,16.
in 2nd video it's 328p,16.
Evan, is there some unpublished errata/special conditions for this? I've used the Q43 w/MOVFF and it seems to work fine.
Really sorry I was incorrect - Q40 and q41