I was taking a look at the features of the new 18FXXQ43 family, and one that looks very promising in boosting performance, is the existance of 6 (DMA) Controllers that can be used for Data transfers to SFR/GPR spaces from either Program Flash Memory, Data EEPROM or SFR/GPR spaces.
I suppose that Cow Basic functions like readtable or hsersend/hserreceive could benefit a lot by utilizing DMA.
So i was wondering, does this new feature is utilized by the compiler, and if not, is there any schedule of adding it in the future, in order for cow basic to produce "DMA optimized" code, whenever a PIC equipped with it is used?
Last edit: ikonsgr74 2020-12-27
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
As the new release candidates support the Q43s you can investigate the methods that need to be adapted to support DMA.
Bill Roth is working on the DMA on a project and he will be sharing his insights in the coming days. But, the essentials are already in the compiler we may need to tweak to enable DMA.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Do you think that "DMA supported" methods will be available with the coming release of Cow Basic then? As i'm heavily using readtable method in my project (btw i suppose that access of large table variables implemented in RAM, could also benefit from DMA, right?) , and if DMA can offer significant speed gain, it willl surely boost a lot the performance!
Take a look here if you want, to see a small presentation i made.
Last edit: ikonsgr74 2020-12-27
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks my friend! True, especially the 765 Floppy Disk Controller low level emulation, was really a tough job to do, but thanks to covid19 quarantines and plenty of free time,i've manage to make it work! :-)
I may also upload the CB code for this project in a new topic! ;-)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
@Anobium ,I took a more thorough look on DMA details from 18F47Q43 datasheet, i don't know if i get this right, but it seems that the utilization of DMA controllers can have a MAJOR impact on performance!
For example, i take a look on the asm code of my project generated by Cow Basic, regarding the interrupt trigger on receiving a byte from hardware UART module (On Interrupt UsartRX2Ready Call readUSART ):
As you can see, dozens of instructions are needed for saving CPU state before executing interrupt routine (which moves a byte from UART input buffer to a large buffer table variable in RAM) and restoring it after finishing.
Is it correct to assume that, if a DMA controller is used to service the interrupt routine, there is no need to save/restore CPU state?
Moreover, since the actual code of the interrupt routine:
will be modified for DMA utilization, maybe it will be faster to execute too ?
(from what i read on datasheet, you only need to set a bunch of DMA registeres and then the actual DMA transfer of 1 byte takes only 2 instructions!)
Btw, i just ordered a PICKIT4 and a couple of 18F47Q43 from Microchip direct, so when i get them, i might be able to give you extra feedback on DMA testing! ;-)
Last edit: ikonsgr74 2020-12-29
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
And here is an example code i found from datasheet: This code example illustrates using DMA1 to transfer 10 bytes of data from 0x1000 in Flash memory to the UART transmit buffer.
voidinitializeDMA(){
//SelectDMA1bysettingDMASELECTregisterto0x00DMASELECT=0x00;//DMAnCON1-DPTRremains, SourceMemoryRegionPFM, SPTRincrements, SSTPDMAnCON1=0x0B;//Sourceregisters//SourcesizeDMAnSSZH=0x00;DMAnSSZL=0x0A;//Sourcestartaddress, 0x1000DMAnSSAU=0x00;DMAnSSAH=0x10;DMAnSSAL=0x00;//Destinationregisters//DestinationsizeDMAnDSZH=0x00;DMAnDSZL=0x01;//Destinationstartaddress,
DMAnDSA=&U1TXB;//StarttriggersourceU1TX. ReferthedatasheetforthecorrectcodeDMAnSIRQ=0xnn;//ChangearbiterpriorityifneededandperformlockoperationDMA1PR=0x01; // Change the priority only if neededPRLOCK=0x55; // This sequencePRLOCK=0xAA; // is mandatoryPRLOCKbits.PRLOCKED=1; // for DMA operation//EnabletheDMA&thetriggertostartDMAtransferDMAnCON0=0xC0;
}
So,it seems that any routine implementation using DMA, is practically only a bunch of DMA register sets! ;-)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This is pretty simple to write in Great Cow BASIC. A few changes to support the word pointer addresses (bu using the alias).
#chip18f26Q43initializeDMA'do stuffendsubinitializeDMA'create an word alias to support thedimDMAnDSAWordaswordaliasDMAnDSAH, DMAnDSAL//SelectDMA1bysettingDMASELECTregisterto0x00DMASELECT=0x00;//DMAnCON1-DPTRremains, SourceMemoryRegionPFM, SPTRincrements, SSTPDMAnCON1=0x0B;//Sourceregisters//SourcesizeDMAnSSZH=0x00;DMAnSSZL=0x0A;//Sourcestartaddress, 0x1000DMAnSSAU=0x00;DMAnSSAH=0x10;DMAnSSAL=0x00;//Destinationregisters//DestinationsizeDMAnDSZH=0x00;DMAnDSZL=0x01;//Destinationstartaddress,
'change to word pointer - as this would have only pointed to the lower address byte of U1TXBDMAnDSAWord= @U1TXB;//StarttriggersourceU1TX. ReferthedatasheetforthecorrectcodeDMAnSIRQ=0xnn;//ChangearbiterpriorityifneededandperformlockoperationDMA1PR=0x01; // Change the priority only if neededPRLOCK=0x55; // This sequencePRLOCK=0xAA; // is mandatory'remove PRLOCKbits.PRLOCKED=1; // for DMA operation//EnabletheDMA&thetriggertostartDMAtransferDMAnCON0=0xC0;endsub
I used the latest RC candidate (RC34) and PICInfo to figure that I need to create the alias. The pointer assignment was
This would fail as & is invalid, and, the assignment would only move the (low) address of U1TXB (in Great Cow BASIC) as DMAnDSA is a byte (address 240/0x00F0).
//Destinationstartaddress,DMAnDSA=&U1TXB;
Yields in the assembly, with the change of & to @.
Thanks for the "insight" Evan!
So it seems that modification of various COW BASIC routines to include DMA utilization (whenever supported by the selected PIC) would be rather easy and simple after all!
Can you make a rough estimate on performance increase when using DMA?
For example,how faster an "on interrupt" HW UART byte read or a readtable byte read will be, using for example a 18FXXQ43, compared to current Routines usde for 18FXXQ10 (e.g. like the ones i post earlier)?
Last edit: ikonsgr74 2020-12-30
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In my experience using DMA, on a PIC32 device not the 18f26Q43, trying to estimate a performance improvement is a mute point as DMA is effectively hardware multi tasking,
On the PIC32 at least when you executed the DMA transfer it was fire and forget, working in the background whilst the user program continued at full speed in the foreground. It was several years ago and my memory is not what it was so I don't recall any of the c++ code that I used but it was fast.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Depending on system arbitration used, this is true for 18FXXQ43 family too. Quoted from datasheet: Depending on the priority of the DMA with respect to CPU execution (Refer to section “Memory Access Scheme” in
the “PIC18 CPU” chapter for more information), the DMA Controller can move data through two methods:
• Stalling the CPU execution until it has completed its transfers (DMA has higher priority over the CPU in this
mode of operation).
• Utilizing unused CPU cycles for DMA transfers (CPU has higher priority over the DMA in this mode of
operation). Unused CPU cycles are referred to as bubbles, which are instruction cycles available for use by the
DMA to perform read and write operations. In this way, the effective bandwidth for handling data is increased; at
the same time, DMA operations can proceed without causing a processor stall.
If you use the 2nd method, it practically executes DMA transfer without any speed penalty for CPU
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I never developed any code using MPLAB, only Cow Basic :-)
But, i have installed MPLAB X IDE 5.20 and mainly use it for MCC code configurator (in order to configure the various CLC's needed for my project), but i see that the new 18FXXQ43 are not supported, maybe i need to install a newer version.
Last edit: ikonsgr74 2020-12-30
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Any news about DMA support?
I receive a couple of 18F47Q43's and have a pickit4 programmer too, so i'm really looking forward to test a... "DMA optimized" code! :-)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ok then, maybe you can write a "how to" code guide (based on 18F4XQ43 as this is the 1st PIC family supporting DMA, and most probable all that follows, like Q83/Q84 and future PIC's, will use same methods too) , with specific DMA examples like:
- Read from HWuart and place byte to a single variable/array variable
- Read a byte from a a single variable/array variable/table and write to HWuart/PIC port
Then,i will try to incorporate these codes to my GCB code, and make tests to see if they work right, and what impact will have in performance.
Last edit: ikonsgr74 2021-05-10
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I was wondering, is there a way to access a table element directly, without using "readtable" command? Reading bytes from byte tables and place them to a PIC's PORT or a variable, is done all the time in my project, but in order to use DMA for that, i need a way to read specific element without calling readtable, as this command does the transaction directly, but without using DMA....
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
When a table is defined with TABLE the compiler looks for a related Readtable command
IF there is none , the table is never written to memory. So this is a "NO"
However when a readtable is executed, even if only to initialize the table in memory then the table will then be written to program memory.
But where in memory is the question.
There is no way to tell the compiler where in memory to put the table;
The compiler decides based upon how much memory the rest of the code uses. I cannot tell by looking at the ASM where in memory the table begins. Someone else might.
However if you know the data you are looking for you can look at the hex and see where the first byte of the table is located. But if your code changes, this this memory address location will change as well.
But for the sake of argument, Let's say the code never changes and the table never changes. You could then read the data directly via the TBLRD* command as described in the Chip's Datasheet. See the section on the Nonvolatile Memory (NVM) module
Not worth the trouble IMO
Last edit: William Roth 2021-05-11
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
My name was mentioned somewhere in regards to adding DMA support to GCB for chips that support it. To be clear, I have no plans now or in the future to do so.
It would be a rather huge, time consuming effort that in the end would likely only be utilized by a handful of advanced users.
I am not saying that it will not be done eventually, just that I will not be the one doing it.
As far as an estimated time for adding DMA support, Anobium or Hugh can answer that better than I can. However, I would not think it would be any sooner than 6 months if not a year or more.
Bill
Last edit: William Roth 2021-05-11
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If the "Table" was written to storage area flash, could the location in the PIC be specified and therefore be a known value? The storage area flash looks to be limited to 128 words, which might restrain the size of any table.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
We can look into this soon, but, looks rather simple to use, but, this would require a fundamental change change to the serial write (in this example).
But, there is nothing to stop you from using the code shown in the DMA posts (above) in the latest release candidate.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I read AppNote TB3164 today. This AppNote lays out the basics in a total vacuum of other practices used with an overall solution.
To use DMA requires a total architectural approach/impact analysis. Example. Move data from a table to serial looks easy. But, what is the data to be moved to the serial and the format (byte or word data) ? If byte data then it may work, if word ...then, the table data in the Progmem would need to formatted (laid out) so the DMA is usable.
Then, assuming the data is byte data then moving the data out the serial would still be one byte at a time. So, what is the time advantage of RAM buffer read (loaded by the DMA activity) verses the existing Table read ? It is really a huge benefit?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I was taking a look at the features of the new 18FXXQ43 family, and one that looks very promising in boosting performance, is the existance of 6 (DMA) Controllers that can be used for Data transfers to SFR/GPR spaces from either Program Flash Memory, Data EEPROM or SFR/GPR spaces.
I suppose that Cow Basic functions like readtable or hsersend/hserreceive could benefit a lot by utilizing DMA.
So i was wondering, does this new feature is utilized by the compiler, and if not, is there any schedule of adding it in the future, in order for cow basic to produce "DMA optimized" code, whenever a PIC equipped with it is used?
Last edit: ikonsgr74 2020-12-27
As the new release candidates support the Q43s you can investigate the methods that need to be adapted to support DMA.
Bill Roth is working on the DMA on a project and he will be sharing his insights in the coming days. But, the essentials are already in the compiler we may need to tweak to enable DMA.
Great! If you remember, i'm the guy who had some issues with large tables, on a rather big project (currently using a 18F47Q10):
https://sourceforge.net/p/gcbasic/discussion/596084/thread/5398dd8bc1/?limit=25&page=0
Do you think that "DMA supported" methods will be available with the coming release of Cow Basic then? As i'm heavily using readtable method in my project (btw i suppose that access of large table variables implemented in RAM, could also benefit from DMA, right?) , and if DMA can offer significant speed gain, it willl surely boost a lot the performance!
Take a look here if you want, to see a small presentation i made.
Last edit: ikonsgr74 2020-12-27
That disk interface you have made is incredible. Congratulations, I'm very, very impressed.
Thanks my friend! True, especially the 765 Floppy Disk Controller low level emulation, was really a tough job to do, but thanks to covid19 quarantines and plenty of free time,i've manage to make it work! :-)
I may also upload the CB code for this project in a new topic! ;-)
@Anobium ,I took a more thorough look on DMA details from 18F47Q43 datasheet, i don't know if i get this right, but it seems that the utilization of DMA controllers can have a MAJOR impact on performance!
For example, i take a look on the asm code of my project generated by Cow Basic, regarding the interrupt trigger on receiving a byte from hardware UART module (On Interrupt UsartRX2Ready Call readUSART ):
As you can see, dozens of instructions are needed for saving CPU state before executing interrupt routine (which moves a byte from UART input buffer to a large buffer table variable in RAM) and restoring it after finishing.
Is it correct to assume that, if a DMA controller is used to service the interrupt routine, there is no need to save/restore CPU state?
Moreover, since the actual code of the interrupt routine:
will be modified for DMA utilization, maybe it will be faster to execute too ?
(from what i read on datasheet, you only need to set a bunch of DMA registeres and then the actual DMA transfer of 1 byte takes only 2 instructions!)
Btw, i just ordered a PICKIT4 and a couple of 18F47Q43 from Microchip direct, so when i get them, i might be able to give you extra feedback on DMA testing! ;-)
Last edit: ikonsgr74 2020-12-29
And here is an example code i found from datasheet:
This code example illustrates using DMA1 to transfer 10 bytes of data from 0x1000 in Flash memory to the UART transmit buffer.
So,it seems that any routine implementation using DMA, is practically only a bunch of DMA register sets! ;-)
This is pretty simple to write in Great Cow BASIC. A few changes to support the word pointer addresses (bu using the alias).
I used the latest RC candidate (RC34) and PICInfo to figure that I need to create the alias. The pointer assignment was
This would fail as
&
is invalid, and, the assignment would only move the (low) address of U1TXB (in Great Cow BASIC) as DMAnDSA is a byte (address 240/0x00F0).Yields in the assembly, with the change of
&
to@
.So, the changes:
Create a word alias and then use a similar assignment.
creates the word at the correct address, as follows:
And, the assignment.
Yields in the assembly: Show the low and high address being loaded into the correct DMA addresses.
Enjoy. Hope this makes sense.
Evan
PICInfo shows the addresses to cross-reference to the alias addresses.
Last edit: Anobium 2020-12-29
Thanks for the "insight" Evan!
So it seems that modification of various COW BASIC routines to include DMA utilization (whenever supported by the selected PIC) would be rather easy and simple after all!
Can you make a rough estimate on performance increase when using DMA?
For example,how faster an "on interrupt" HW UART byte read or a readtable byte read will be, using for example a 18FXXQ43, compared to current Routines usde for 18FXXQ10 (e.g. like the ones i post earlier)?
Last edit: ikonsgr74 2020-12-30
I would have to test.
Do you have any MPLAB-X code as a baseline?
In my experience using DMA, on a PIC32 device not the 18f26Q43, trying to estimate a performance improvement is a mute point as DMA is effectively hardware multi tasking,
On the PIC32 at least when you executed the DMA transfer it was fire and forget, working in the background whilst the user program continued at full speed in the foreground. It was several years ago and my memory is not what it was so I don't recall any of the c++ code that I used but it was fast.
Depending on system arbitration used, this is true for 18FXXQ43 family too. Quoted from datasheet:
Depending on the priority of the DMA with respect to CPU execution (Refer to section “Memory Access Scheme” in
the “PIC18 CPU” chapter for more information), the DMA Controller can move data through two methods:
• Stalling the CPU execution until it has completed its transfers (DMA has higher priority over the CPU in this
mode of operation).
• Utilizing unused CPU cycles for DMA transfers (CPU has higher priority over the DMA in this mode of
operation). Unused CPU cycles are referred to as bubbles, which are instruction cycles available for use by the
DMA to perform read and write operations. In this way, the effective bandwidth for handling data is increased; at
the same time, DMA operations can proceed without causing a processor stall.
If you use the 2nd method, it practically executes DMA transfer without any speed penalty for CPU
I never developed any code using MPLAB, only Cow Basic :-)
But, i have installed MPLAB X IDE 5.20 and mainly use it for MCC code configurator (in order to configure the various CLC's needed for my project), but i see that the new 18FXXQ43 are not supported, maybe i need to install a newer version.
Last edit: ikonsgr74 2020-12-30
Any news about DMA support?
I receive a couple of 18F47Q43's and have a pickit4 programmer too, so i'm really looking forward to test a... "DMA optimized" code! :-)
The post https://sourceforge.net/p/gcbasic/discussion/579125/thread/b0baec8294/#6acb shows the method. Unless someone writes a DMA editor (like PPSTool or PICINFO tool) then you will have to hack through the datasheet to setup the registers.
and, Q43 is supported by PICKit2 & 3 .... using PICKitPlus. :-)
Ok then, maybe you can write a "how to" code guide (based on 18F4XQ43 as this is the 1st PIC family supporting DMA, and most probable all that follows, like Q83/Q84 and future PIC's, will use same methods too) , with specific DMA examples like:
- Read from HWuart and place byte to a single variable/array variable
- Read a byte from a a single variable/array variable/table and write to HWuart/PIC port
Then,i will try to incorporate these codes to my GCB code, and make tests to see if they work right, and what impact will have in performance.
Last edit: ikonsgr74 2021-05-10
I was wondering, is there a way to access a table element directly, without using "readtable" command? Reading bytes from byte tables and place them to a PIC's PORT or a variable, is done all the time in my project, but in order to use DMA for that, i need a way to read specific element without calling readtable, as this command does the transaction directly, but without using DMA....
To answer your question, Yes and No
When a table is defined with TABLE the compiler looks for a related Readtable command
IF there is none , the table is never written to memory. So this is a "NO"
However when a readtable is executed, even if only to initialize the table in memory then the table will then be written to program memory.
But where in memory is the question.
There is no way to tell the compiler where in memory to put the table;
The compiler decides based upon how much memory the rest of the code uses. I cannot tell by looking at the ASM where in memory the table begins. Someone else might.
However if you know the data you are looking for you can look at the hex and see where the first byte of the table is located. But if your code changes, this this memory address location will change as well.
But for the sake of argument, Let's say the code never changes and the table never changes. You could then read the data directly via the TBLRD* command as described in the Chip's Datasheet. See the section on the Nonvolatile Memory (NVM) module
Not worth the trouble IMO
Last edit: William Roth 2021-05-11
Last edit: ikonsgr74 2021-05-10
My name was mentioned somewhere in regards to adding DMA support to GCB for chips that support it. To be clear, I have no plans now or in the future to do so.
It would be a rather huge, time consuming effort that in the end would likely only be utilized by a handful of advanced users.
I am not saying that it will not be done eventually, just that I will not be the one doing it.
As far as an estimated time for adding DMA support, Anobium or Hugh can answer that better than I can. However, I would not think it would be any sooner than 6 months if not a year or more.
Bill
Last edit: William Roth 2021-05-11
My error attributing you to writing some DMA stuff. I dont know what I was thinking.
Not fully understanding the concept, but...
If the "Table" was written to storage area flash, could the location in the PIC be specified and therefore be a known value? The storage area flash looks to be limited to 128 words, which might restrain the size of any table.
We can look into this soon, but, looks rather simple to use, but, this would require a fundamental change change to the serial write (in this example).
But, there is nothing to stop you from using the code shown in the DMA posts (above) in the latest release candidate.
I read AppNote TB3164 today. This AppNote lays out the basics in a total vacuum of other practices used with an overall solution.
To use DMA requires a total architectural approach/impact analysis. Example. Move data from a table to serial looks easy. But, what is the data to be moved to the serial and the format (byte or word data) ? If byte data then it may work, if word ...then, the table data in the Progmem would need to formatted (laid out) so the DMA is usable.
Then, assuming the data is byte data then moving the data out the serial would still be one byte at a time. So, what is the time advantage of RAM buffer read (loaded by the DMA activity) verses the existing Table read ? It is really a huge benefit?