Menu

Loop Performance

Anobium
2022-02-19
2022-02-26
1 2 > >> (Page 1 of 2)
  • Anobium

    Anobium - 2022-02-19

    There are many ways to loop a program in Great Cow BASIC

    • for next
    • do loop
    • repeat end repeat

    These loops all do similar things but the question was 'the performance of each loop?'


    Is it a really complex answer. Each type of loop has many parameters that impact the performance and the compiler optimises the assembly based on the specific conditions of the definition of each loop.

    A simple example showing all three types of loops follows. This uses the LGT chipset because I have the greatest control over the frequency but any chip will work.

    Code

    #option Explicit
    #chip LGT8F328P, 1        
    '
    #include <millis.h>       ' Include the Library
    
        'USART settings for USART1
        #define USART_BAUD_RATE 9600
        #define USART_TX_BLOCKING
        #define USART_DELAY OFF
    
        Dim count as byte
        Dim CurMs, LstMs as word  ' declare working variables
        ' Main                    ' This loop runs over and over forever.
        LstMs = 0
        CurMs = 0
    
    
        Wait 2 s
    
        HSerPrintCRLF
    
        LstMs = millis()
        For count = 0 to 254
          'wait 1 ms
        Next
    
        CurMs = millis()
        HSerPrint CurMs - LstMs
        HSerPrintCrlf
    
        LstMs = millis()
        count=0
        do until count=254
          count++
          'wait 1 ms
        loop
        CurMs = millis()
        HSerPrint CurMs - LstMs
        HSerPrintCrlf
    
        LstMs = millis()
        count=255
        Repeat count
          'wait 1 ms
        End Repeat
        CurMs = millis()
        HSerPrint CurMs - LstMs
        HSerPrintCrlf
    

    Yields

    8:17:22.869> 2
    8:17:22.869> 3
    8:17:22.869> 2

    The time and the number of millisecs per loop type. 2ms for the for-next and the repeat loop. With the do-loop taking 3ms.

    note the frequency.. it is very low to get a meaningful result.


    Consider the code.

    Remember - each loop type is handled within the compiler by different sections of the compiler. And, the methods used are specific to PIC or AVR/LGT and there is heavy optimisation of the assembly generated per chip family, variables/constant used, performance of chip in terms of handling specific assembly instructions - the performance of the LGT over an AVR is very different, and the memory access methods per chip family.

    Hugh wrote the compiler sections many years ago, and, I have maintained the level of optimisation as I have revised the compiler. Hugh's work is stunningly good.

    Oh boy .. this is complex.

    for-next loop. This example uses For count = 0 to 254 where the startvalue and endvalue are constants. The compiler handles this by using the constants in the assembly, but consider if startvalue and endvalue where not constants but variables (byte, word, long) then timing to complete the loop will increase as with each type of variable. If STEP is included then the loop timing increases even more.

    The fastest for-next loop? Use constants

    do-loop. The user program looks simple. But, on test this is the slowest. Why? I am very surprised to. But, the compiler generates assembly for this chip uses different branches and this must account for the performance decrease. I would be interested for other to test this on other chip families - these branches will perform differently on other chip families.

    repeat-end repeat. The tests show this performs as good as the the for-next loop. The assembly is very different from the other two loops. However, this loop is the least optimised loop. The loop has little optimisation for use of variable types or constants. So, you can rely of this loop in terms of performance - it will use the same approach every time.


    So, now consider a for-next loop with the range of 0 to 1024. What is faster ?

    1.
    for loopvar = 0 to 1024

    or

    2.
    for loopvar1 = 0 to 255
    for loopvar 2 = 0 to 3

    or
    3.
    do loop loopvar1 < 1025

    .... well is it 2 the multiple for next loops - these are bytes and they are constants - highly optimised. Hence, the GLCD code uses nested for-next loops to increase performance.


    Hope this helps. An interesting but complex subject.

    And, I hope the analysts out there wade into the specifics of each chip to dig even deeper into the performance of each instruction.

     

    Last edit: Anobium 2022-02-19
  • Anobium

    Anobium - 2022-02-19

    Same program ... same frequency.. different chip ,, an 18F27Q84

    6ms
    7ms
    3ms

    Interesting.

     
  • Anobium

    Anobium - 2022-02-19

    Same program ... same frequency.. different chip ,, an 18F25K50

    4ms
    4ms
    2ms

    Even more interesting

    Same program ... same frequency.. different chip ,, an 18F24K42

    6ms
    8ms
    3ms

    But, cannot try this using an Arduino... stuck at 16mhz!

     

    Last edit: Anobium 2022-02-19
    • Anobium

      Anobium - 2022-02-19

      Same program ... same frequency.. different chip ,, an 16F1778 - go figure this one!!

      3ms
      4ms
      1ms

       

      Last edit: Anobium 2022-02-19
  • stan cartwright

    stan cartwright - 2022-02-19

    I tried this and the for-next took 2233 mS
    the do-loop took 2075 mS
    Do-loop was always faster.
    I'll try repeat .

    #chip mega328p, 16
    dim count1,count2 as Word
    Dim CurMs, LstMs,totalms as word
    CurMs = millis()
    for count1=0 to 1000
      for count2=0 to 1000
      next count2
    next count1
    LstMs=millis()
    totalms=LstMs-CurMs
    GLCDPrint 0,0,str(totalms)
    ;
    count1=0:count2=0
    CurMs = millis()
    do until count1=1000
      count1++
      count2=0
      do until count2=1000
        count2++
      loop
    loop
    LstMs=millis()
    totalms=LstMs-CurMs
    GLCDPrint 0,24,str(totalms)
    
     
  • stan cartwright

    stan cartwright - 2022-02-19

    I tried repeat and it took 566mS ... 4 times faster?!
    Can repeat be nested?
    Does the compiler see if the repeat count is <= 255 and use a byte otherwise it uses a word?

    CurMs = millis()
    Repeat 1000
      Repeat 1000
      end repeat
    end repeat
    LstMs=millis()
    totalms=LstMs-CurMs
    GLCDPrint 0,48,str(totalms)
    
     

    Last edit: stan cartwright 2022-02-19
  • stan cartwright

    stan cartwright - 2022-02-19

    I tried this and it took 2228 mS.

    count1=0
    CurMs = millis()
    loop1:
    count2=0
    loop2:
    count2++
    if count2<1000 then goto loop2
    count1++
    if count1<1000 then goto loop1
    LstMs=millis()
    totalms=LstMs-CurMs
    GLCDPrint 0,64,str(totalms)
    
     
  • stan cartwright

    stan cartwright - 2022-02-19

    I should try byte values for count1 and count2.
    Using Repeat seems a clear winner for speed from what I found.
    I've never used Repeat but think I will as sometimes I need code to run as fast as possible.

     
  • stan cartwright

    stan cartwright - 2022-02-19

    Thinking though, only for-next lets you use the counter value for using arrays.
    do loop and repeat would need an extra variable as a pointer and need incrementing so in practical terms it depends.

     
    • Anobium

      Anobium - 2022-02-19

      Told you it was complex!

       
  • stan cartwright

    stan cartwright - 2022-02-19

    Nothing's complex in gcb ;) That's why we use it!
    No-one knows what the compiler does with our code.
    I stick with arduino 328 for consistency so I got a working , I hope, target model.
    It's fast enough for most stuff but I try to some times get it to be as efficient/fast as possible.
    And oh dear , I did use repeat, I just found out. I thought to optimise ili9341 code cos lots of for -next . This was ripped from the glcd.
    make sprite_size a constant not a var

    sub erase_sprite (sprite_x,sprite_y,sprite_width,sprite_height,sprite_size) ;fills window background colour
      SetAddressWindow_ILI9341 ( sprite_x,sprite_y,sprite_x +sprite_width-1,sprite_y +sprite_height-1 )
      repeat sprite_size
        SendWord_ILI9341 GLCDBackground
      end repeat
    end sub
    
     

    Last edit: stan cartwright 2022-02-19
  • Jerry Messina

    Jerry Messina - 2022-02-21

    Same program ... same frequency.. different chip ,, an 18F27Q84
    Same program ... same frequency.. different chip ,, an 18F25K50
    Same program ... same frequency.. different chip ,, an 18F24K42

    That is interesting. I would expect all of them to produce pretty much the same timing.

    The K50 is a "classic 18F", and seems to be on par with the 16F1778 timing, give or take.
    For simple loop code that's probably to be expected.

    The K42 and Q84 are the new "xv18" core devices, and appear to be almost twice as slow!
    Is the asm radically different than the K50? In theory it shouldn't be...

     
    • Anobium

      Anobium - 2022-02-21

      Jerry it would take a much larger analysis to compare - I was testing the results.

      Many things are different, interrupt cache being one of the major differences. The newer chips have smart interrupt caching, but, this may come at the cost of performance?

       
  • Jerry Messina

    Jerry Messina - 2022-02-21

    The newer chips have smart interrupt caching, but, this may come at the cost of performance?

    If anything I would think that should make them faster since all of the cpu regs are saved in a single cycle. Plus, doesn't the millis() tick operate at 1ms? If so, that would only be a handful of interrupts during the test.

    Do the new chips use a lot of MOVFFL instructions? That might account for the difference, but I wouldn't think that code would need to use it since everything should be within the MOVFF range.

     
    • Anobium

      Anobium - 2022-02-21

      Re MOVFFL nope... some new chips need this always. Found this last week.


      It would be a good project for an Intern!

       
  • Jerry Messina

    Jerry Messina - 2022-02-21

    some new chips need this always.

    Only if the src/dest is outside the 4K range of MOVFF. MOVFFL is larger and slower, so I prefer to only use it as req'd.

    The initial xv18 chips (ie K42, K83) put the SFR registers way at the top of memory, so MOVFFL was needed. The newer ones moved them down to bank 0 where MOVFF can get to them. Much better IMHO.

    Any interns hanging around??

     
    • Anobium

      Anobium - 2022-02-21

      Q43 needs it regardless.

       
  • stan cartwright

    stan cartwright - 2022-02-21

    I have a pic 18f25K22 to test as it's 64MHz and don't pics do an instruction every 4 clocks?
    that's the nearest to a mega328p at 16MHz but does an instruction every clock,
    or is there no comparison?
    Just wondered how much the 328p is optimised. GCB was always pic orientated imho.
    Was gcb ported from pic to 328p or was it ground up?

     
    • Anobium

      Anobium - 2022-02-21

      AVR is optimised and is the PIC

      I do not know which came first. Back in 2009 the code supported AVR and PIC.... so, my guess from day one.

       
  • stan cartwright

    stan cartwright - 2022-02-21

    Good. Arduino came out in 2005 so hoping 328p was considered serious and when gcb came out arduino was established so someone thought let's write for that not oh well we'll have to support it and not put the same effort as for pics.
    I like 328p and gcb, no probs, well happy, not moaning.

     
    • Anobium

      Anobium - 2022-02-21

      No moaning, no probs.

      Uber fast UNO coding..... the new IDE

      YouTube for you Stan

      https://youtu.be/095AIvr7b_A

       
      • stan cartwright

        stan cartwright - 2022-02-21

        Cool, looks interesting. thanks. When I installed it I backed up gcb.
        I will now need latest version. What version of GCB am I getting please.
        I don't have a clue about this. I thought it was about using alternate ide like Geany... wrong!

         
      • stan cartwright

        stan cartwright - 2022-02-21

        In the 1st video 328p needed no ,16 ie 328p,16.
        in 2nd video it's 328p,16.

         
  • Jerry Messina

    Jerry Messina - 2022-02-21

    Q43 needs it regardless.

    Evan, is there some unpublished errata/special conditions for this? I've used the Q43 w/MOVFF and it seems to work fine.

     
    • Anobium

      Anobium - 2022-02-21

      Really sorry I was incorrect - Q40 and q41

       
1 2 > >> (Page 1 of 2)

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.