From: Philipp K. K. <pk...@sp...> - 2012-04-20 15:12:49
|
Dear users of the hc08 port, Erik and me have been working on some improvements in the hc08 code generation and register allocator. The work can be found in the optralloc-hc08 branch: svn co https://sdcc.svn.sourceforge.net/svnroot/sdcc/branches/optralloc-hc08/sdcc sdcc-ohc08 All regression tests pass for me, and in the regression tests I currently see a code size reduction of about 6% compared to sdcc from trunk. You might want to try this branch and see if it improves code size for you, if compilation speed is ok, and if there are any bugs. After some more testing and polishing this work will probably make it to trunk, and then appear in the 3.2.0 or 3.3.0 release. Philipp |
From: Philipp K. K. <pk...@sp...> - 2012-05-05 16:48:07
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Dear users of the hc08 port, as of revision #7668, there is a new register allcoator used in the hc08 port. It is enabled by default (the old allocator can be used by - --oldralloc). There also were many changes to code generation. This results in the following: * Better code being generated: Faster, smaller code. * Probably some new, not yet dicovered bugs. * Somewhat slower compilation. Please test this with your code (using sdcc from svn or snapshots from http://sdcc.sourceforge.net/snap.php dated 6-5-2012 or later). If there are any minor issues (compilation speed too low or code size increasing compared to older sdcc, please report them here or at the sourceforge tracker. For any major issues, such as wrong behaviour of the generated code, please report them at the sourceforge bug tracker, so they can be fixed before the 3.2.0 release. Philipp -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk+lWcAACgkQbtUV+xsoLprX6wCgyJbpTclcvPjbfhQNpfmM02np k/QAoNVzcW0bIbxTJOP7y7Be/17StRcX =8AtV -----END PGP SIGNATURE----- |
From: Maarten B. <sou...@ds...> - 2012-04-21 11:04:54
|
Hello Philipp and Erik, Have you tried to enable the disabled regression tests for the hc08? Do they still fail or do they now pass with the new register allocator? Just curious. Greets, Maarten > Dear users of the hc08 port, > > Erik and me have been working on some improvements in the hc08 code > generation and register allocator. The work can be found in the > optralloc-hc08 branch: > > svn co > https://sdcc.svn.sourceforge.net/svnroot/sdcc/branches/optralloc-hc08/sdcc > sdcc-ohc08 > > All regression tests pass for me, and in the regression tests I > currently see a code size reduction of about 6% compared to sdcc from > trunk. You might want to try this branch and see if it improves code > size for you, if compilation speed is ok, and if there are any bugs. > > After some more testing and polishing this work will probably make it to > trunk, and then appear in the 3.2.0 or 3.3.0 release. > > Philipp > > ------------------------------------------------------------------------------ > For Developers, A Lot Can Happen In A Second. > Boundary is the first to Know...and Tell You. > Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! > http://p.sf.net/sfu/Boundary-d2dvs2 > _______________________________________________ > sdcc-devel mailing list > sdc...@li... > https://lists.sourceforge.net/lists/listinfo/sdcc-devel > |
From: Philipp K. K. <pk...@sp...> - 2012-04-21 13:24:08
|
On 21.04.2012 12:37, Maarten Brock wrote: > Hello Philipp and Erik, > > Have you tried to enable the disabled regression tests > for the hc08? Do they still fail or do they now pass > with the new register allocator? Just curious. AFAIK all tests disabled on hc08 are disabled due to one of - Restricted support for function pointers - Lack of memory - Bug #3514097 related to long long variables which won't be placed into registers either way. There were a few others, but I was able to just enable them in trunk without failing. Philipp |
From: Maarten B. <sou...@ds...> - 2012-04-21 15:00:07
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head> <title></title> <meta http-equiv="content-type" content="text/html;charset=utf-8"/> <meta http-equiv="Content-Style-Type" content="text/css"/> </head> <body> <div align="left"> <font face="Arial" color="#7f0000" size="2"> <span style=" font-size:10pt"> > On 21.04.2012 12:37, Maarten Brock wrote:</span></font> </div> <div align="left"> <font face="Arial" color="#7f0000" size="2"> <span style=" font-size:10pt"> > > Hello Philipp and Erik,</span></font> </div> <div align="left"> <font face="Arial" color="#7f0000" size="2"> <span style=" font-size:10pt"> > > </span></font> </div> <div align="left"> <font face="Arial" color="#7f0000" size="2"> <span style=" font-size:10pt"> > > Have you tried to enable the disabled regression tests </span></font> </div> <div align="left"> <font face="Arial" color="#7f0000" size="2"> <span style=" font-size:10pt"> > > for the hc08? Do they still fail or do they now pass </span></font> </div> <div align="left"> <font face="Arial" color="#7f0000" size="2"> <span style=" font-size:10pt"> > > with the new register allocator? Just curious.</span></font> </div> <div align="left"> <font face="Arial" color="#7f0000" size="2"> <span style=" font-size:10pt"> > </span></font> </div> <div align="left"> <font face="Arial" color="#7f0000" size="2"> <span style=" font-size:10pt"> > AFAIK all tests disabled on hc08 are disabled due to one of</span></font> </div> <div align="left"> <font face="Arial" color="#7f0000" size="2"> <span style=" font-size:10pt"> > </span></font> </div> <div align="left"> <font face="Arial" color="#7f0000" size="2"> <span style=" font-size:10pt"> > - Restricted support for function pointers</span></font> </div> <div align="left"> <font face="Arial" color="#7f0000" size="2"> <span style=" font-size:10pt"> > - Lack of memory</span></font> </div> <div align="left"> <font face="Arial" color="#7f0000" size="2"> <span style=" font-size:10pt"> > - Bug #3514097 related to long long variables which won't be placed into</span></font> </div> <div align="left"> <font face="Arial" color="#7f0000" size="2"> <span style=" font-size:10pt"> > registers either way.</span></font> </div> <div align="left"> <font face="Arial" color="#7f0000" size="2"> <span style=" font-size:10pt"> > </span></font> </div> <div align="left"> <font face="Arial" color="#7f0000" size="2"> <span style=" font-size:10pt"> > There were a few others, but I was able to just enable them in trunk</span></font> </div> <div align="left"> <font face="Arial" color="#7f0000" size="2"> <span style=" font-size:10pt"> > without failing.</span></font> </div> <div align="left"> <font face="Arial" color="#7f0000" size="2"> <span style=" font-size:10pt"> <br /> </span> </font> </div> <div align="left"> <font face="Courier New" size="2"> <span style=" font-size:10pt"> I seem to remember there were also some trigonometric floating point tests disabled. They are in float_single.c and even generate a warning about being skipped (unlike all new skipped gcc-torture tests).</span></font> </div> <div align="left"> <font face="Courier New" size="2"> <span style=" font-size:10pt"> <br /> </span> </font> </div> <div align="left"> <font face="Courier New" size="2"> <span style=" font-size:10pt"> Maarten</span></font> </div> </body> </html> |
From: Philipp K. K. <pk...@sp...> - 2012-04-21 15:12:50
|
On 21.04.2012 17:00, Maarten Brock wrote: >> On 21.04.2012 12:37, Maarten Brock wrote: >> > Hello Philipp and Erik, >> > >> > Have you tried to enable the disabled regression tests >> > for the hc08? Do they still fail or do they now pass >> > with the new register allocator? Just curious. >> >> AFAIK all tests disabled on hc08 are disabled due to one of >> >> - Restricted support for function pointers >> - Lack of memory >> - Bug #3514097 related to long long variables which won't be placed into >> registers either way. >> >> There were a few others, but I was able to just enable them in trunk >> without failing. > > I seem to remember there were also some trigonometric floating point > tests disabled. They are in float_single.c and even generate a warning > about being skipped (unlike all new skipped gcc-torture tests). Now that you mention it, I must've overlooked that one. The expf test is still failing in trunk, tanf isn't and I just enabled it. I'll have a look at expf in the branch later. The new gcc-torture tests don't have warning, since I thought that having loads warning for all the ports that don't support long long yet (or, such as mcs51 can't handle large arrays) would be too distracting; for te other issues I just filed bug reports (which reference the failing tests), should I add warnings for those in the regression tests, too? Philipp |
From: Philipp K. K. <pk...@sp...> - 2012-04-21 16:10:19
|
On 21.04.2012 17:12, Philipp Klaus Krause wrote: > On 21.04.2012 17:00, Maarten Brock wrote: >>> On 21.04.2012 12:37, Maarten Brock wrote: >>>> Hello Philipp and Erik, >>>> >>>> Have you tried to enable the disabled regression tests >>>> for the hc08? Do they still fail or do they now pass >>>> with the new register allocator? Just curious. >>> >>> AFAIK all tests disabled on hc08 are disabled due to one of >>> >>> - Restricted support for function pointers >>> - Lack of memory >>> - Bug #3514097 related to long long variables which won't be placed into >>> registers either way. >>> >>> There were a few others, but I was able to just enable them in trunk >>> without failing. >> >> I seem to remember there were also some trigonometric floating point >> tests disabled. They are in float_single.c and even generate a warning >> about being skipped (unlike all new skipped gcc-torture tests). expf fails for the optralloc-hc08 branch just like it does for trunk. Philipp |
From: Maarten B. <sou...@ds...> - 2012-04-21 16:22:10
|
> On 21.04.2012 17:00, Maarten Brock wrote: > >> On 21.04.2012 12:37, Maarten Brock wrote: > >> > Hello Philipp and Erik, > >> > > >> > Have you tried to enable the disabled regression tests > >> > for the hc08? Do they still fail or do they now pass > >> > with the new register allocator? Just curious. > >> > >> AFAIK all tests disabled on hc08 are disabled due to one of > >> > >> - Restricted support for function pointers > >> - Lack of memory > >> - Bug #3514097 related to long long variables which won't be placed into > >> registers either way. > >> > >> There were a few others, but I was able to just enable them in trunk > >> without failing. > > > > I seem to remember there were also some trigonometric floating point > > tests disabled. They are in float_single.c and even generate a warning > > about being skipped (unlike all new skipped gcc-torture tests). > > Now that you mention it, I must've overlooked that one. The expf test is > still failing in trunk, tanf isn't and I just enabled it. I'll have a > look at expf in the branch later. One down, one to go ;-) > The new gcc-torture tests don't have warning, since I thought that > having loads warning for all the ports that don't support long long yet > (or, such as mcs51 can't handle large arrays) would be too distracting; > for te other issues I just filed bug reports (which reference the > failing tests), should I add warnings for those in the regression tests, > too? I don't think we need warnings for tests that are just impossible, like large local arrays in reentrant functions for mcs51 with its small stack. And maybe bug reports are just as good as or even better than warnings in the regression results. So no need to add them. > expf fails for the optralloc-hc08 branch just like it does for > trunk. Too bad. OTOH it's probably not a regsiter allocation problem then. Maarten |
From: Erik P. <epe...@iv...> - 2012-04-21 18:25:35
|
On Sat, 21 Apr 2012, Maarten Brock wrote: >> The new gcc-torture tests don't have warning, since I thought that >> having loads warning for all the ports that don't support long long yet >> (or, such as mcs51 can't handle large arrays) would be too distracting; >> for te other issues I just filed bug reports (which reference the >> failing tests), should I add warnings for those in the regression tests, >> too? > > I don't think we need warnings for tests that are just impossible, > like large local arrays in reentrant functions for mcs51 with its > small stack. And maybe bug reports are just as good as or even > better than warnings in the regression results. So no need to add > them. > >> expf fails for the optralloc-hc08 branch just like it does for >> trunk. > > Too bad. OTOH it's probably not a regsiter allocation problem then. I prefer the bug reports too; I had been assuming the expf warning was like the setjmp/longjmp warning and that it simply had no hc08 implementation yet. I'll take a closer look at what's failing with expf. Erik |
From: Erik P. <epe...@iv...> - 2012-04-21 23:35:01
|
On Sat, 21 Apr 2012, Erik Petrich wrote: >>> expf fails for the optralloc-hc08 branch just like it does for >>> trunk. >> >> Too bad. OTOH it's probably not a regsiter allocation problem then. > > I prefer the bug reports too; I had been assuming the expf warning was > like the setjmp/longjmp warning and that it simply had no hc08 > implementation yet. I'll take a closer look at what's failing with expf. For some reason, most, but not all, functions declared in math.h are marked as reentrant. The float_single regression test invokes all of the tested functions using a function pointer; for this to work the parameter needs to either be passed in registers or on the stack. In the case of hc08, there aren't enough registers to pass any floats in registers so the parameter to expf is passed on the stack. However, expf is declared and defined as non-reentrant so it is using the value in its globally allocated parameter variable. If the regression test is changed to call a wrapper function: float _expf(float a) __reentrant {return expf(a);} then the test passes. We've had some discussions in the past about changing the hc08 backend so that --stack-auto is the default and thus all functions are automatically reentrant (so better compliance with the C standard too). I think the switchover to the new register allocator might be a good time to change the reentrancy policy as well. Erik |
From: Maarten B. <sou...@ds...> - 2012-04-22 09:04:11
|
Hi all, > For some reason, most, but not all, functions declared in math.h are > marked as reentrant. The float_single regression test invokes all of the > tested functions using a function pointer; for this to work the parameter > needs to either be passed in registers or on the stack. In the case of > hc08, there aren't enough registers to pass any floats in registers so the > parameter to expf is passed on the stack. However, expf is declared and > defined as non-reentrant so it is using the value in its globally > allocated parameter variable. So the hc08 backend is not testing if the callee is reentrant when it cannot pass everything in registers. If that test was right it would have been obvious what the bug was. See E_NONRENT_ARGS in SDCCast.c where SPEC_REGPARM is probably wrong. > If the regression test is changed to call a wrapper function: > float _expf(float a) __reentrant {return expf(a);} > then the test passes. That's fine by me. > We've had some discussions in the past about changing the hc08 backend so > that --stack-auto is the default and thus all functions are automatically > reentrant (so better compliance with the C standard too). I think the > switchover to the new register allocator might be a good time to change > the reentrancy policy as well. This is a choice that should not be based on this bug! If you have other good reasons then do so, but this should be fixed by adding an extra test. For automatic reentrancy you need a large stack and efficient access to it (not sure about the hc08 for these). Otherwise it is not worth it IMO because it is seldom necessary. Maarten |
From: Philipp K. K. <pk...@sp...> - 2012-04-22 09:26:04
|
On 22.04.2012 11:04, Maarten Brock wrote: > > For automatic reentrancy you need a large stack and > efficient access to it (not sure about the hc08 for > these). Otherwise it is not worth it IMO because it is > seldom necessary. The hc08 does have a large stack. Most instructions have variants for spack-pointer-relative operands, but they typically are one byte longer than the memory-access variants and tyke one cycle longer to execute. E.g. a dbnz instruction is 3 bytes long with operand in memory, and takes 5 cycles to execute. With operand on stack it is 4 bytes long and tykes 6 cycles to execute. So there will be a cost associated with making --stack-auto the default. I do not know how often this is necessary. AFAIK, MISRA-C explicitly forbids recursion, both direct and indirect. On the other hand the benefits are: * ISO C standards allows recursion, and IMO, sdcc should be as standard-compliant as we can resonably get, unless the user uses special options to get some optimization that breaks standard-compliance. * No need to worry about some statement in an interrupt handler calling a support function resulting in hard-to find (for the normal user) bugs. * Reduction in RAM consumption: No need to reserve space for all local variables at allt imes, we now only need as much space as is really needed for local variables. There also could be a kind of compromise: Make everything --stack-auto except for user-written functions that call no other user-written functions: It would get most of the benefits (the only exception being worries about calling user-written functions both in normal code and the interrupt handler). Philipp |
From: Maarten B. <sou...@ds...> - 2012-04-22 09:54:10
|
Hi, > On 22.04.2012 11:04, Maarten Brock wrote: > > > > For automatic reentrancy you need a large stack and > > efficient access to it (not sure about the hc08 for > > these). Otherwise it is not worth it IMO because it is > > seldom necessary. > > The hc08 does have a large stack. Most instructions have variants for > spack-pointer-relative operands, but they typically are one byte longer > than the memory-access variants and tyke one cycle longer to execute. > E.g. a dbnz instruction is 3 bytes long with operand in memory, and > takes 5 cycles to execute. With operand on stack it is 4 bytes long and > tykes 6 cycles to execute. > So there will be a cost associated with making --stack-auto the default. Only +1 byte and +1 cycle is efficient in my view. It will always cost a little unless the core has no efficient direct access at all. Compare this to mcs51 which cannot access the stack without first copying SP (or BP) to A, add the offset and copy again to Ri before doing the actual access. And the stack is also limited to less than 256 bytes. > I do not know how often this is necessary. AFAIK, MISRA-C explicitly > forbids recursion, both direct and indirect. On the other hand the > benefits are: > * ISO C standards allows recursion, and IMO, sdcc should be as > standard-compliant as we can resonably get, unless the user uses special > options to get some optimization that breaks standard-compliance. > * No need to worry about some statement in an interrupt handler calling > a support function resulting in hard-to find (for the normal user) bugs. > * Reduction in RAM consumption: No need to reserve space for all local > variables at allt imes, we now only need as much space as is really > needed for local variables. > > There also could be a kind of compromise: Make everything --stack-auto > except for user-written functions that call no other user-written > functions: It would get most of the benefits (the only exception being > worries about calling user-written functions both in normal code and the > interrupt handler). > > Philipp I see little benefit in this compromise. Maarten |
From: Jan W. <we...@ef...> - 2012-04-22 11:36:24
|
>> * ISO C standards allows recursion, and IMO, sdcc should be as >> standard-compliant as we can resonably get Striving to be standard-compliant is a noble goal, but the specifics of SDCC have to be kept in mind, too. Besides, I doubt standard-compliance means "without command-line switches". Having a well-documented option(s) to allow recursion, and also having well documented the fact that it is not default, is as standard compliant as a limited-resource targeting C compiler should ever get, IMHO. Jan Waclawek |
From: Philipp K. K. <pk...@sp...> - 2012-04-23 08:55:29
|
Am 22.04.2012 12:23, schrieb Jan Waclawek: > > Having a well-documented option(s) to allow recursion, and also > having well documented the fact that it is not default, is as > standard compliant as a limited-resource targeting C compiler should > ever get, IMHO. Please consider the situation of new users of sdcc: A) Standard-compliant (by default) sdcc: Code the user compiles will just work (apart from bugs, device-specific stuff, etc). This includes both the user's code and code from third-party libraries. Maybe the code is a few percent larger or slower than what it could be. But if that happens the user will look up the documentation of the optimization options, and if one of the optimization options breaks their code they are more likely to notice, since it worked before with default settings. B) Non-compliant (by default) sdcc: Often, code compiled by the user will just fail to work (maybe there is recursion used in some third-party library the user compiled, or whatever). This makes it much harder for new users to get started, and leaves a worse impression of sdcc. Rememebr when sdcc had the "data" keyword, and new users always wondered why their code wouldn't compile (due to some variable named "data")? With recursion it is much worse, since the problem would not be compilation failure, but broken code (we cannot detect recursion before link time). Disallowing recursion by default feels like making int 8-bit by default to me. Philipp P.S.: Not placing local variables on the stack has costs other than disallowing recursion: When local variables are placed on the stack they consume at most as much memory as all local variables in a longest path in the call graph do. Not placing them on the stack consumes as much memory as all local variables in all functions do. This easily doubles or triples memory usage. If the z80 port would not place local variables on the stack many of my programs would be a few percent smaller in terms of ROM size, but they simply wouldn't work because they would run out of RAM. |
From: Jan W. <we...@ef...> - 2012-04-23 11:57:40
|
>Please consider the situation of new users of sdcc: There is absolutely no excuse to a "new user" for not reading the fine manual, which is IMHO crystal clear in this regard. Using recursion in the 8-bit world is grossly inappropriate in all but a very small and limited number of cases, and users must understand that and all the implications before they attempt to use it. Accordingly, libraries and other "reusable" code targeted at 8-bitters must clearly document any such "feature". The standard is big-computer-centric ever since (and it was and in certain details it still is PDP-11-centric). This is easy to understand: the standard-makers have little to zero experience with the 8-bitters. The constrained-resource 8-bitters are a marginal area with specifics, which are not and will never be captured by the standard. Thus, I believe, there's no reason to bow to the standard and, more importantly, to uneducated users, more than through concise documentation of the deviations and the related switches - which I believe is of much more importance than the discussion we conduct right now :-) >P.S.: Not placing local variables on the stack has costs other than >disallowing recursion: When local variables are placed on the stack they >consume at most as much memory as all local variables in a longest path >in the call graph do. This should be resolved by thorough link-time optimisation, which has a better potential for optimisation of various resources, not only the local/auto variables space. I do understand the amount of work this requires. Until then, users should be (and to certain extent they already are) advised through the manual in this regard. Jan ----- Original Message --------------- Subject: Re: [sdcc-devel] expf, hc08 reentrancy From: Philipp Klaus Krause <pk...@sp...> Date: Mon, 23 Apr 2012 10:56:10 +0200 To: Development chatter about sdcc <sdc...@li...> >Am 22.04.2012 12:23, schrieb Jan Waclawek: >> >> Having a well-documented option(s) to allow recursion, and also >> having well documented the fact that it is not default, is as >> standard compliant as a limited-resource targeting C compiler should >> ever get, IMHO. > >Please consider the situation of new users of sdcc: > >A) Standard-compliant (by default) sdcc: Code the user compiles will >just work (apart from bugs, device-specific stuff, etc). This includes >both the user's code and code from third-party libraries. Maybe the code >is a few percent larger or slower than what it could be. But if that >happens the user will look up the documentation of the optimization >options, and if one of the optimization options breaks their code they >are more likely to notice, since it worked before with default settings. > >B) Non-compliant (by default) sdcc: Often, code compiled by the user >will just fail to work (maybe there is recursion used in some >third-party library the user compiled, or whatever). This makes it much >harder for new users to get started, and leaves a worse impression of >sdcc. Rememebr when sdcc had the "data" keyword, and new users always >wondered why their code wouldn't compile (due to some variable named >"data")? With recursion it is much worse, since the problem would not be >compilation failure, but broken code (we cannot detect recursion before >link time). > >Disallowing recursion by default feels like making int 8-bit by default >to me. > >Philipp > >P.S.: Not placing local variables on the stack has costs other than >disallowing recursion: When local variables are placed on the stack they >consume at most as much memory as all local variables in a longest path >in the call graph do. Not placing them on the stack consumes as much >memory as all local variables in all functions do. This easily doubles >or triples memory usage. If the z80 port would not place local variables >on the stack many of my programs would be a few percent smaller in terms >of ROM size, but they simply wouldn't work because they would run out of >RAM. > |
From: Maarten B. <sou...@ds...> - 2012-04-23 21:39:00
|
Hi, > >Please consider the situation of new users of sdcc: > > There is absolutely no excuse to a "new user" for not reading the fine > manual, which is IMHO crystal clear in this regard. > > Using recursion in the 8-bit world is grossly inappropriate in all but > a very small and limited number of cases, and users must understand > that and all the implications before they attempt to use it. > Accordingly, libraries and other "reusable" code targeted at 8-bitters > must clearly document any such "feature". > > The standard is big-computer-centric ever since (and it was and in > certain details it still is PDP-11-centric). This is easy to > understand: the standard-makers have little to zero experience with > the 8-bitters. The constrained-resource 8-bitters are a marginal area > with specifics, which are not and will never be captured by the > standard. Thus, I believe, there's no reason to bow to the standard > and, more importantly, to uneducated users, more than through concise > documentation of the deviations and the related switches - which I > believe is of much more importance than the discussion we conduct > right now :-) I agree with Jan on this one. > > P.S.: Not placing local variables on the stack has costs other than > > disallowing recursion: When local variables are placed on the stack > > they consume at most as much memory as all local > > variables in a longest path >in the call graph do. > > This should be resolved by thorough link-time optimisation, which has > a better potential for optimisation of various resources, not only the > local/auto variables space. I do understand the amount of work this > requires. Until then, users should be (and to certain extent they > already are) advised through the manual in this regard. There already is a little bit of overlaying in SDCC for local variables. But it only works for functions that are leafs in the call tree. Due to this limited overlaying it is not as bad as you describe. > > Not placing them on the stack consumes as much > > memory as all local variables in all functions do. > > This easily doubles or triples memory usage. If the > > z80 port would not place local variables on the > > stack many of my programs would be a few percent > > smaller in terms of ROM size, but they simply > > wouldn't work because they would run out of RAM. But placing all locals on stack also has its drawbacks. The compiler will not warn you about out of memory problems a.k.a. stack overflow. And the 8-bitter usually does not have memory protection. If you want to predict stack usage you need to parse the full call graph for that. If we could do that we could also implement full overlaying resulting in identical memory usage (ignoring recursion here). Maarten |
From: Philipp K. K. <pk...@sp...> - 2012-04-24 08:16:20
|
On 23.04.2012 23:38, Maarten Brock wrote: > Hi, > >>> Please consider the situation of new users of sdcc: >> >> There is absolutely no excuse to a "new user" for not reading the fine >> manual, which is IMHO crystal clear in this regard. >> >> Using recursion in the 8-bit world is grossly inappropriate in all but >> a very small and limited number of cases, Why? The cost is small (I haven't tested it, but I expect the increase in code size for hc08 due to local variables on stack to be less than the decrease in them due to the new register allocator). A few % in code size doesn't seem like "grossly inappropriate" to me. >> and users must understand >> that and all the implications before they attempt to use it. >> Accordingly, libraries and other "reusable" code targeted at 8-bitters >> must clearly document any such "feature". >> >> The standard is big-computer-centric ever since (and it was and in >> certain details it still is PDP-11-centric). This is easy to >> understand: the standard-makers have little to zero experience with >> the 8-bitters. The constrained-resource 8-bitters are a marginal area >> with specifics, which are not and will never be captured by the >> standard. Thus, I believe, there's no reason to bow to the standard >> and, more importantly, to uneducated users, more than through concise >> documentation of the deviations and the related switches - which I >> believe is of much more importance than the discussion we conduct >> right now :-) > > I agree with Jan on this one. Where do you stop? In the end this is just inventing a new language by claiming that the standard is not suited for 8-bit platforms. You can argure the same way for making --short-is-8bits the default, or making int 8 bits by default (many people are using int for loop counters, this would give a significant benefit in code size and speed), replace float by signed char or convert division by powers of to to simple right shift even for signed operands (the latter AFAIK being allowed by C90, but not C99). If you really insist on having sdcc non-compliant by default even for platforms where it would be easy to do otherwise, at least make making sdcc compliant easy, e.g. just --std-c99 instead of --std-c99 --stack-auto --int-is-16-bits --double-is-48-bits --std-string-functions --compliant-keyword-namespace --std-division-rounding or whatever. > >>> P.S.: Not placing local variables on the stack has costs other than >>> disallowing recursion: When local variables are placed on the stack >>> they consume at most as much memory as all local >>> variables in a longest path >in the call graph do. >> >> This should be resolved by thorough link-time optimisation, which has >> a better potential for optimisation of various resources, not only the >> local/auto variables space. I do understand the amount of work this >> requires. Until then, users should be (and to certain extent they >> already are) advised through the manual in this regard. > > There already is a little bit of overlaying in SDCC for local > variables. But it only works for functions that are leafs in the > call tree. Due to this limited overlaying it is not as bad as you > describe. > That's one of the reasons I proposed that "compromise", where only leaf functions do not have their variables allocated on the stack. >>> Not placing them on the stack consumes as much >>> memory as all local variables in all functions do. >>> This easily doubles or triples memory usage. If the >>> z80 port would not place local variables on the >>> stack many of my programs would be a few percent >>> smaller in terms of ROM size, but they simply >>> wouldn't work because they would run out of RAM. > > But placing all locals on stack also has its drawbacks. The compiler > will not warn you about out of memory problems a.k.a. stack > overflow. And the 8-bitter usually does not have memory protection. > If you want to predict stack usage you need to parse the full call > graph for that. If we could do that we could also implement full > overlaying resulting in identical memory usage (ignoring recursion > here). Consider functions main, a, b, c, d. main calls a and b. a and b call c. c calls d (conditionally). For simplicity assume that only a and d use stack space, but a calls c with arguments that result in c not calling d, while b calls c with arguments that result in c calling d. Just from the call graph one would overestimate the stack space usage a lot. Essentially you'll have to solve the halting problem to predict stack usage. And I hope to see variable length arrays in sdcc one day, which makes all this even more complicated. Philipp |
From: Maarten B. <sou...@ds...> - 2012-04-24 09:42:28
|
Hi again, >>> Using recursion in the 8-bit world is grossly inappropriate in all but >>> a very small and limited number of cases, > >>> The standard is big-computer-centric ever since (and it was and in >>> certain details it still is PDP-11-centric). This is easy to >>> understand: the standard-makers have little to zero experience with >>> the 8-bitters. The constrained-resource 8-bitters are a marginal area >>> with specifics, which are not and will never be captured by the >>> standard. Thus, I believe, there's no reason to bow to the standard >>> and, more importantly, to uneducated users, more than through concise >>> documentation of the deviations and the related switches - which I >>> believe is of much more importance than the discussion we conduct >>> right now :-) >> >> I agree with Jan on this one. > > Where do you stop? In the end this is just inventing a new language by > claiming that the standard is not suited for 8-bit platforms. You can > argure the same way for making --short-is-8bits the default, or making > int 8 bits by default (many people are using int for loop counters, this > would give a significant benefit in code size and speed), replace float > by signed char or convert division by powers of to to simple right shift > even for signed operands (the latter AFAIK being allowed by C90, but not > C99). I will not argue so, but stop right here. If the target cannot efficiently handle stack access like the mcs51/ds390 and I guess also pic then it should not use it by default. If the target can however I have no objection to use stack by default (z80, hc08). The reason I claim that reentrancy is not often used in small embedded applications is the need for determinism. That is much more important for embedded applications than it is for applications running on an OS backed by hardware memory protection. A big OS will stop the application with an error message on stack overflow / out of memory. A small device will just crash right through and set your dish-washer on fire when the moon happens to be blue and the stack overflows. > If you really insist on having sdcc non-compliant by default even for > platforms where it would be easy to do otherwise, at least make making > sdcc compliant easy, e.g. just --std-c99 instead of --std-c99 > --stack-auto --int-is-16-bits --double-is-48-bits --std-string-functions > --compliant-keyword-namespace --std-division-rounding or whatever. I have no problem with making --std-c99 default to --stack-auto, but then we probably also need the opposite switch and the __nonreentrant keyword. >>>> P.S.: Not placing local variables on the stack has costs other than >>>> disallowing recursion: When local variables are placed on the stack >>>> they consume at most as much memory as all local >>>> variables in a longest path >in the call graph do. >>> >>> This should be resolved by thorough link-time optimisation, which has >>> a better potential for optimisation of various resources, not only the >>> local/auto variables space. I do understand the amount of work this >>> requires. Until then, users should be (and to certain extent they >>> already are) advised through the manual in this regard. >> >> There already is a little bit of overlaying in SDCC for local >> variables. But it only works for functions that are leafs in the >> call tree. Due to this limited overlaying it is not as bad as you >> describe. > > That's one of the reasons I proposed that "compromise", where only leaf > functions do not have their variables allocated on the stack. > >>>> Not placing them on the stack consumes as much >>>> memory as all local variables in all functions do. >>>> This easily doubles or triples memory usage. If the >>>> z80 port would not place local variables on the >>>> stack many of my programs would be a few percent >>>> smaller in terms of ROM size, but they simply >>>> wouldn't work because they would run out of RAM. >> >> But placing all locals on stack also has its drawbacks. The compiler >> will not warn you about out of memory problems a.k.a. stack >> overflow. And the 8-bitter usually does not have memory protection. >> If you want to predict stack usage you need to parse the full call >> graph for that. If we could do that we could also implement full >> overlaying resulting in identical memory usage (ignoring recursion >> here). > > Consider functions main, a, b, c, d. main calls a and b. a and b call c. > c calls d (conditionally). For simplicity assume that only a and d use > stack space, but a calls c with arguments that result in c not calling > d, while b calls c with arguments that result in c calling d. Just from > the call graph one would overestimate the stack space usage a lot. > Essentially you'll have to solve the halting problem to predict stack > usage. And I hope to see variable length arrays in sdcc one day, which > makes all this even more complicated. But when you try to determine maximum stack usage you still need to account for both a and d, or you must find this dependency in the call tree as well. Maarten |
From: Philipp K. K. <pk...@sp...> - 2012-04-24 13:32:57
|
Am 24.04.2012 11:42, schrieb Maarten Brock: > If the target cannot efficiently > handle stack access like the mcs51/ds390 and I guess also pic then it > should not use it by default. How about targets that can access neither the stack nor the memory efficiently? Philipp |
From: Philipp K. K. <pk...@sp...> - 2012-04-24 17:42:02
|
On 24.04.2012 11:42, Maarten Brock wrote: > Hi again, > >>>> Using recursion in the 8-bit world is grossly inappropriate in all but >>>> a very small and limited number of cases, >> >>>> The standard is big-computer-centric ever since (and it was and in >>>> certain details it still is PDP-11-centric). This is easy to >>>> understand: the standard-makers have little to zero experience with >>>> the 8-bitters. The constrained-resource 8-bitters are a marginal area >>>> with specifics, which are not and will never be captured by the >>>> standard. Thus, I believe, there's no reason to bow to the standard >>>> and, more importantly, to uneducated users, more than through concise >>>> documentation of the deviations and the related switches - which I >>>> believe is of much more importance than the discussion we conduct >>>> right now :-) >>> >>> I agree with Jan on this one. >> >> Where do you stop? In the end this is just inventing a new language by >> claiming that the standard is not suited for 8-bit platforms. You can >> argure the same way for making --short-is-8bits the default, or making >> int 8 bits by default (many people are using int for loop counters, this >> would give a significant benefit in code size and speed), replace float >> by signed char or convert division by powers of to to simple right shift >> even for signed operands (the latter AFAIK being allowed by C90, but not >> C99). > > I will not argue so, but stop right here. If the target cannot efficiently > handle stack access like the mcs51/ds390 and I guess also pic then it > should not use it by default. If the target can however I have no > objection to use stack by default (z80, hc08). > > The reason I claim that reentrancy is not often used in small embedded > applications is the need for determinism. That is much more important for > embedded applications than it is for applications running on an OS backed > by hardware memory protection. A big OS will stop the application with an > error message on stack overflow / out of memory. A small device will just > crash right through and set your dish-washer on fire when the moon happens > to be blue and the stack overflows. > >> If you really insist on having sdcc non-compliant by default even for >> platforms where it would be easy to do otherwise, at least make making >> sdcc compliant easy, e.g. just --std-c99 instead of --std-c99 >> --stack-auto --int-is-16-bits --double-is-48-bits --std-string-functions >> --compliant-keyword-namespace --std-division-rounding or whatever. > > I have no problem with making --std-c99 default to --stack-auto, but then > we probably also need the opposite switch and the __nonreentrant keyword. > Since this discussion has been going on for some time and it might be unclear where it is about the hc08 port and where about sdcc in general. My posts often were written in a general way, even though they mostly addressed the hc08 port. I'll just summarize my personal opinion on what I would want sdcc to be in a few years wrt. C99 standard compliance: * Fully compliant with the C99 standard by default when targeting the z80, z180, r2k, hc08, gbz80 or some other ports that currently only exist in my mind (e.g. r3ka and s08). * Compliant with the C99 to the extend possible (implementation limits could be a problem) when using suitable options (preferably just --std-c99) and targeting the mcs51, ds390 or ds400. I might be misguided about the first goal, but so far I still think it is reasonable. I am more unsure about the second goal, I don't really know enough about these platforms. Philipp P.S.: I might write another mail about compliance to other standards and other aspects of sdcc later. |
From: Philipp K. K. <pk...@sp...> - 2012-05-03 21:11:19
|
Am 24.04.2012 11:42, schrieb Maarten Brock: > > I have no problem with making --std-c99 default to --stack-auto, but then > we probably also need the opposite switch and the __nonreentrant keyword. Unless a use case comes up, I don't think __nonreentrant is necessary. An opposite to --stack-auto would be useful though (even in the current situation, where --stack-auto is not the default for some ports). Philipp |
From: Erik P. <epe...@iv...> - 2012-04-26 09:59:58
|
On Sun, 22 Apr 2012, Maarten Brock wrote: >>> For automatic reentrancy you need a large stack and >>> efficient access to it (not sure about the hc08 for >>> these). Otherwise it is not worth it IMO because it is >>> seldom necessary. >> >> The hc08 does have a large stack. Most instructions have variants for >> spack-pointer-relative operands, but they typically are one byte longer >> than the memory-access variants and tyke one cycle longer to execute. >> E.g. a dbnz instruction is 3 bytes long with operand in memory, and >> takes 5 cycles to execute. With operand on stack it is 4 bytes long and >> tykes 6 cycles to execute. >> So there will be a cost associated with making --stack-auto the default. > > Only +1 byte and +1 cycle is efficient in my view. It > will always cost a little unless the core has no > efficient direct access at all. Compare this to mcs51 > which cannot access the stack without first copying SP > (or BP) to A, add the offset and copy again to Ri before > doing the actual access. And the stack is also limited > to less than 256 bytes. On the optralloc-hc08 branch I recently committed an optimization that addresses the +1 byte and cycle penalty for the stack-pointer-relative operands. If there is a span of instructions in which the hx register pair is unused and stacked operands are accessed, it is possible to optimize such that there is only a +1 byte and +2 cycle penalty for the entire span rather than a penalty per access. In this case, there is also an additional 1 byte savings for accesses to the top of the stack. Currently I see the regression tests for hc08 (on the optralloc-hc08 branch) totalling 2405726 bytes with --stack-auto disabled by default and 2439400 bytes with --stack-auto enabled by default. This is an increase of 33674 bytes or 1.4% However, I don't think the regression tests are necessarily a good representative of a "typical" program. I've also be testing the new branch with a project I wrote last year; here --stack-auto increases the size from 2943 bytes to 3033 bytes. This is an increase of only 90 bytes or 3%. (On trunk, the respective sizes are 3157 without --stack-auto, 3568 with --stack-auto, for a 13% increase. I'm really pleased with the new register allocator!) Erik |
From: Philipp K. K. <pk...@sp...> - 2012-04-26 10:15:57
|
Am 26.04.2012 11:59, schrieb Erik Petrich: > > On the optralloc-hc08 branch I recently committed an optimization […] There's an optimization I implemented for the z80-like ports some time ago for accessing the top of the stack, that uses pop/push to read the topmost values on the stack. I.e. instead of ld l, d1(ix) ld h, d2(ix) 6 bytes total, requires ix frame pointer, which has a fixed setup overhead per function and is not available on gbz80) or ld hl, #d add hl, sp ld a, (hl+) ld h, (hl) ld l, a 7 bytes total on the gbz80 that has this hl+ thing. we do pop hl push hl 2 bytes total Obviously this helps most where there are only a few variables allocated to the stack or the topmost ones are accessed most. hc08 probably can benefit from a similar optimization. pulh pshh Would be cheap way to put h into the topmost stack location. Philipp |
From: Erik P. <epe...@iv...> - 2012-04-26 17:25:19
|
On Thu, 26 Apr 2012, Philipp Klaus Krause wrote: > Am 26.04.2012 11:59, schrieb Erik Petrich: > >> >> On the optralloc-hc08 branch I recently committed an optimization [?] > > There's an optimization I implemented for the z80-like ports some time > ago for accessing the top of the stack, that uses pop/push to read the > topmost values on the stack. I.e. instead of > > ld l, d1(ix) > ld h, d2(ix) > 6 bytes total, requires ix frame pointer, which has a fixed setup > overhead per function and is not available on gbz80) > > or > > ld hl, #d > add hl, sp > ld a, (hl+) > ld h, (hl) > ld l, a > 7 bytes total on the gbz80 that has this hl+ thing. > > we do > > pop hl > push hl > 2 bytes total > > Obviously this helps most where there are only a few variables allocated > to the stack or the topmost ones are accessed most. > hc08 probably can benefit from a similar optimization. For the a and x register, this is something to cosider, depending on the state of the --opt-code-size and --opt-code-speed flags: pula psha is 2 bytes and 5 cycles whereas ldaa 1,s is 3 bytes but 4 cycles. If the hx pair have not been disturbed since the last tsx instruction, we can do even better (since tsx loads hx with sp+1): ldaa ,x is 1 byte and 3 cycles. This last case is one of the things I was referencing in my previous message. > pulh > pshh > > Would be cheap way to put h into the topmost stack location. ?? I hope you meant "get h from the topmost stack location". If so, this is a great idea, since I can't think of any other shorter or faster way to load h from a stack location. (2 bytes, 5 cycles) To store h into the stopmost stack location, we could use ais #1 (or pulx or pula, if x or a are unused) pshh which is 3 bytes and 4 cycles (ais version) or 2 bytes and 5 cycles (pula/x version) Erik |