GCBASIC / Discussion / Open Discussion: Produce "DMA optimized" with new 18FXXQ43 PICs

ikonsgr74 - 2021-05-11

Well, as they say : GIVE IT A TRY AND WE'LL SEE :-)
Perhaps setting some prerequisites (like use only byte tables/arrays) would ease the task at least at first stages of development, in order to find out sooner, how much is the actual benefit of using DMA! :-)

Last edit: ikonsgr74 2021-05-11

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

ikonsgr74 - 2021-05-11

I agree. The above DMA example is very good to get a first "Taste" of how to use DMA in general, but in "real life" projects where you don't have simple fix addresses but variables, tables, arrays, things become much more complicated, and i came to the same conclusion that this can't be done "manually", but it needs fundamental changes into important GCB commands in order to support DMA "internally"!
For example, if your source is an array or a table, you need an extra pointer variable that shows which of the elements you want to read for DMA transfer. Moreover, if a byte array variable is used as a UART input ring buffer, you need to increase value of pointer to next element, check if reaches last element and reset to the 1st element!
Normally all these are done in a rather small routine like:

Sub readUSART buffer(next_in) = HSerReceive next_in = ( next_in + 1 ) IF (NEXT_IN>BUFFER_SIZE) Then NEXT_IN=1 END IF End Sub

This routine is called asynchronously by enabling an " On Interrupt UsartRXReady" event handler.
Obviously, the same thing using DMA, can't be done ,by just setting the various DMA registers, and enabling a DMA event handler, for one thing, there is no way to handle the pointer of ring buffer! So even with DMA, you will probably still need an interrupt routine to call, which means that many cpu cycles will be also lost in the save/restore of CPU state proccess (which somehow cancels theMAJOR advantage of using DMA e.g. no need to call an interrupt routine, but the task is done "invisibly" to the CPU....)
Same goes for reading a specific byte from a byte table too, or for having an array/table as destination in a DMA transfer.
In a more general pespective, almost any variable assignment can be done using DMA, down to simple assignments like:
var1=10
Could be done using DMA! But of course, this can't be done "manually", but it must be "fundamentally supported" by the compiler itself! Of course i understand that, this could be a HUGE task to accomplish, but, as more and more PIC mcu's will support DMA in the future, it might worth the trouble, as it would most likely greatly boost performance! ;-)

Last edit: ikonsgr74 2021-05-11
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anobium - 2021-05-12

@ikonsgr You hilite the issue. The use case.

Use of DMA is use case specific. Take this use case.

Data stored in TABLEs is to be sent via the microcontroller Serial port to a serially attached device.

Non DMA solution

do until end of table data
byte read table element into ram variable
send ram variable via serial port
loop

DMA solution

do until end of table data
read multiple byte table elements into ram (multiple ram locations) using DMA transfer
send ram (multiple ram locations) via serial port
loop

The difference look superficial but the low level is very different.
1. The reading of the table locations is direct to the table (resolvable on 18f, not practical other PICs). I think this is resolvable using indirect addressing.
2. The destination RAM location could be the Buffer RAM (see Figure 9-3. Data Memory Map of the Datasheet) or another RAM buffer. We would need to determine if the Buffer RAM is impacted by the DMA operation - we have already excluded this area from Great Cow BASIC memory to prevent Buffer RAM corruption by 'standard' variable.
3. Sending from the destination RAM location ( multiple byte values) in an optimised manner using an approach that sends the destination RAM locations when the DMA has completed (interrupt)

We would need to build a prototype to examine the time saved. As the real performance gain can only be step #2. Step #1 is the same, and, there may be a saving at step #3 by optimisation of the send operation.

So, as I do no think there is a generic use case. Can you give me an overview (three/four lines long) of the use case? or, is my assumed use case (shown above) what you need to achieve?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

ikonsgr74 - 2021-05-12

The user cases that can have a MAJOR performance impact in my project is:

Read from byte table (large ones up to 16Kb, and use of an integer index variable) in program memory, and send it to a PIC's 8bit port (which actually is connected to Amstrad's CPC 8bit Data bus).

Read byte from a PIC's 8bit port and send it to HW UART port.

Read byte from HW UART port and send it to a ring input buffer (of ~3000bytes) array variable, in PIC's RAM.

Read byte from a ring input buffer (of ~3000bytes) array variable and send it to a PIC's 8bit port

PIC for testing would be a 18F47Q43.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anobium - 2021-05-12
  
  Read from byte table (large ones up to 16Kb, and use of an integer index variable) in program memory, and send it to a PIC's 8bit port (which actually is connected to Amstrad's CPC 8bit Data bus).
  
  Sounds DMAable.
  
  We would have to determine the table addressing using indirect addressing using the same approach as detailed in the appnote. I would have thought a new serial buffer write will be needed to clear down the buffer ( send the data to the serial).
  
  It will be a balance to right size the buffer with respect to the total available RAM in the context of the rest of your program.
  
  Read byte from a PIC's 8bit port and send it to HW UART port.
  
  Question. Is there a DMA source module/trigger event for an 8 PIC ports? If not then this is would be a no-go in terms of DMA source.
  
  This would not prevent using the same serial clear down approach.
  
  Read byte from HW UART port and send it to a ring input buffer (of ~3000bytes) array variable, in PIC's RAM.
  
  Sounds DMAable. But, would need a new Serial read method to move the data from the incoming serial input to the buffer.
  
  Read byte from a ring input buffer (of ~3000bytes) array variable and send it to a PIC's 8bit port
  
  Sounds DMAable. But, would need a new Serial write method to move the data from the incoming serial input to the buffer.
  
  If I assume you have the RAM to support a 3000 byte buffer ring. The concept of random access to the buffer ring may not be possible. We may be able to use to use two DMA channel controllers that point to the same memory space to put and get data using DMA. Even more complex.
  
  I would be starting with the Table read to new serial out method.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - ikonsgr74 - 2021-05-12
    
    Great!
    I'm really looking forward for this, keep us updated! ;-)
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Anobium - 2021-05-12
      
      It will be the summer. As I am away from home and I have a pipeline of work to do in June.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

William Roth - 2021-05-12

Is there a way to determine where in Flash Memory the table begins? Is there a method to tell the compiler where to start the table? If this could be sorted then ,,,

The Q43 supports page reads of Flash memory where a 256 byte page of flash can be read directly into "buffer ram" which is located at address 0x2500 on a 47Q43.

So a select portion of the table can then be placed in RAM for access via indirect addressing or for use with DMA operations bypassing Readtable.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anobium - 2021-05-12
  
  I dont know the answers but I would approach this by examination of the MPLAB-x libraries.
  
  I know the serial write is totally different (via a buffee), so, I would expect to see buffers used throughout the chips libraries.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bogdan Zavate - 2021-06-15
  
  You can do this "manually" by editing the hex file and add the table values at the back of the flash memory, and you know exactly where it start and where it ends, this assuming that generated hex code doesnt go in that flash area where you put the values. Compiler should have an option to put a constant or a table at a specific flash address, at least at the back of the memory flash ( this should be verry simple ), but i guess it is a linker problem thing, because if you chose an address where normaly the linker would put code, it must be clever enough to go "around" that memory area.
  
  Last edit: Bogdan Zavate 2021-06-15
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anobium - 2021-09-11
    
    Easily done. No need to edit HEX. See GITHUB DMA demos.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

ikonsgr74 - 2021-09-11

Any news on the subject?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anobium - 2021-09-11
  
  Have a look at https://github.com/Anobium/Great-Cow-BASIC-Demonstration-Sources/tree/master/DMA_Solutions
  
  I have added some DMA examples. :-)
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anobium - 2021-09-12

@ikonsgr74 I had not added the DMA serial demo. I have now.

The DMA demo sends a table of data direct to the USART. You will see a table of 'Lorem ipsum' (a table of five paragraphs of text but this can be any data). The DMA is set up to point the source to the table and the destination to the USART. The table is sent once because upon completion the ISR is called to disable the DMA. You can use this ISR method to cascade many tables ( to be sent ). Simply change the source addresses and restart the DMA transfer.

Enjoy

Last edit: Anobium 2021-09-12

Screenshot 2021-09-12 094738.jpg

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anobium - 2021-09-15

I have just add a new demo to the DMA suite... this demo should how to send massive tables of data to the serial via DMA.

The DMA can only handle 4096 bytes per transfer. So, the demo indexes through the table of data creating page address for the DMA transfers. All very easy really.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

ikonsgr74 - 2021-09-29

Nice! But how exactly this is executed? It sends the hole table to serial port at once?
What if you want to trigger a byte transfer from table to serial port on demand, inside a "real life" code?
Also, did you make any performance tests to see how faster (if any) is this code, compared to the usual code without using DMA?

Btw, it would be most useful if you can offer an example of having a ring input buffer in RAM as source, and an 8bit PORT as destination, using DMA.
Also, having serial port as source, and a ring input buffer in RAM as destination too.

Last edit: ikonsgr74 2021-09-29

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anobium - 2021-09-29
  
  Nice! But how exactly this is executed? It sends the hole table to serial port at once?
  
  It was not to hard. :-) Nice.
  
  How it is executed? By the Start Trigger. The 'Start DMA start sequence. The End trigger is not set in the init section of the code but in the ISR. This means the whole table is sent properly.
  
  What if you want to trigger a byte transfer from table to serial port on demand, inside a "real life" code?
  
  Then, you would change the init section (only the parameters that need to change like the table name and length, then, repeat a 'Start DMA start sequence.
  
  Also, did you make any performance tests to see how faster (if any) is this code, compared to the usual code without using DMA?
  
  Serial is serial... but, I did not analyse the serial line. I was focused on the outcome.
  
  Btw, it would be most useful if you can offer an example of having a ring input buffer in RAM as source, and an 8bit PORT as destination, using DMA.
  
  I do not understand the use case. A ring buffer is FIFO, so, there is no length as such. So, if you have one byte in then one byte out then use the demo for memory DMA transfers just get the ring buffer address, and other parameters and set the start DMA sequence.
  
  Also, having serial port as source, and a ring input buffer in RAM as destination too.
  
  See above.
  
  Sounds like would set up three DMAs. One for tables out, one for serial byte out and one for serial byte in but this sounds a mess. One for tables out, one for byte serial out and an ISR for serial in to load the ring buffer.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

ikonsgr74 - 2021-09-29

Ok, if i get it right, any REAL benefit from using DMA approach would be mainly for BATCH transfers of data.
Unfortunately this is not the case for my huge PIC 18F47Q10 project: https://sourceforge.net/p/gcbasic/discussion/projects%26guides/thread/a5705ad282/

Most data transfers are triggered from Amstrad CPC by reading/writing to specific I/O ports.
When this happens, ONLY ONE byte is transferred between input ring buffer in RAM, table in Flash memory and PIC's port (which is connected to computer's 8bit data bus).
So,i guess, no performance benefit would exist if we have to trigger a DMA transfer for only one byte each time, right?

The ONLY exception to that, is the byte receive from hardware serial port to input ring buffer in RAM, which is triggered asynchronously by enabling a USARTreceive interrupt.
So, i was wondering, would it be possible to implement at least the above procedure by using DMA?

Last edit: ikonsgr74 2021-09-29

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anobium - 2021-09-29
  
  You can send one byte in response to an in byte event. Just try to code based on the demos. You may need the ISR to handle the I/O ports to start the DMA channel.
  
  You get a performance increase but you will have to examine.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Produce "DMA optimized" with new 18FXXQ43 PICs

Forums

Help

Produce "DMA optimized" with new 18FXXQ43 PICs

Produce "DMA optimized" with new 18FXXQ43 PICs

Forums

Help

Produce "DMA optimized" with new 18FXXQ43 PICs document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Produce "DMA optimized" with new 18FXXQ43 PICs