From: Philipp K. K. <pk...@sp...> - 2012-12-13 16:34:07
|
Over time, and especially recently, it seems people want an STM8 port of sdcc. Vaclav has started such a port, but its incomplete, and, IMO has some serious design flaws¹, which are not worth fixing. So, I propose² the following: * Write a new STM8 sdcc port. * Use existing assembler. * Use existing linker. * Use existing ucsim. What do you think about this? Philipp ¹ The flaws are mostly due to the port being based on the hc08 port of more than a year ago. Here's the two most important ones: * Back then the hc08 port used the old register allocator, and so does the stm8 port. Making the hc08 port use the new register allocator required a rewrite of substantial parts of the code genration. Combined a huge improvement was achieved: Just look at the drop in generated hc08 code size at http://sdcc.sourceforge.net/mediawiki/index.php/Hc08_code_size. The STM8 has 5 8-Byte registers, which the new register allocator can easily handle. An STM8 port should be written for the new allocator, not the old one. * The hc08 port places local variables and parameters in memory at fixed addresses by default. Some other ports, such as the z80-related ones use the stack by default. Typically the first approach results in faster, but non-compliant (recursion doesn't work) code. There is some disagreement among sdcc dveelopers as to which should be used by default due to the trade-off. However let's have a look at the STM8 addressing modes, for the typical situation of at most 256 bytes of local variables per function but more than 256 bytes of lcoal variables in all functions combined. The STM8 would use long direct addressing for local variables when doing things hc08-style, while it would use sp indexed addressing mode when doing things z80-style. Now, sp-indexed mode is faster and results in smaller code. When instead assuming the case of at most 256 bytes of local variables in all functions combined, we would get short indirect mode for the hc08-style. This mode results in roughly the same code size and speed as the sp-indexed mode. This means that for the STM8, doing things hc08-style gives us slower, non-compliant code. The only exception would be functions which use more than 256 bytes of local variables without using aggregates or unions; such functions are very rare. An STM8 port therefore should always place all local variables on the stack. ² In case you're wondering if there is some agenda behind my proposal: * I have written the new register allocator, and modified z80 code generation somewhat for it, and have seen a substantial improvement in the generated code. Together with Eric, I have made the hc08 port use the new register allocator, which included making big changes to code generation there. Combined, these resulted in an about 40% reduction in generated hc08 code size. The improvement for hc08 was more spectacular than for Z80, partially because code generation was tailored to the new allocator more. Writing the code generator with the new allocator in mind from the start thus seems like the way to go. * IMO, sdcc should be standard-compliant by default, and only deviate from compliance if there is good reason to do so and on request of the user. |
From: Vaclav P. <vac...@se...> - 2012-12-14 09:30:24
|
Hi Philip, > Just look at the drop in generated hc08 code size at > http://sdcc.sourceforge.net/mediawiki/index.php/Hc08_code_size. The STM8 > has 5 8-Byte registers, which the new register allocator can easily > handle. An STM8 port should be written for the new allocator, not the > old one. I just wonder which register allocator is used in Cosmic compiler ? Can you guess ? The number of used RAM is almost one half to the new allocator in sdcc... Vaclav |
From: Philipp K. K. <pk...@sp...> - 2012-12-14 09:57:58
|
On 14.12.2012 10:20, Vaclav Peroutka wrote: > Hi Philip, > >> Just look at the drop in generated hc08 code size at >> http://sdcc.sourceforge.net/mediawiki/index.php/Hc08_code_size. The STM8 >> has 5 8-Byte registers, which the new register allocator can easily >> handle. An STM8 port should be written for the new allocator, not the >> old one. > > I just wonder which register allocator is used in Cosmic compiler ? Can you > guess ? The number of used RAM is almost one half to the new allocator in > sdcc... > > Vaclav Compilers these days typically use a graph-coloring approach (see Chaitin's work on this) or an ILP-based one. I guess Cosmic uses something that for practical purposes is somewhat worse than our new allocator. Our new allocator is provably optimal, and AFAIK the only such allocator currently existing. However there are other areas where sdcc is much worse than the current state of the art in compiler consturction (and worse than the state of the art of 30 years ago). Most likely these are to blame for sdcc generating worse code compared to Cosmic C. Some coming to my mind at once are: * sdcc has no generalized constant propagation. Currently when a programmer uses an int for a loop counter that happens to always be in the range [0, 100], we need 16 bits for this counter (we only know it fits into an int). Generalized constant propagation would allow us to prove that the counter always stays in the range [0, 100], so we could use an 8-bit variable instead. * sdcc has bad pointer alias analysis. This especially hurts redundancy elimination, which practically has to assume that any two pointers might alias. * The peephole optimizer of the hc08 port lacks some functionality already implemented in other ports, see RFE #3528282 for the main point. * The notVolatile() function in the peephole optimizer is overly pessimistic. Basically it assumes that every variable in memory might be volatile. See RFE #1495816. A simple, but already quite effective improvement would be to use the current behaviour, if there is some volatile accessed somewhere in the function, and otherwise always return true. Philipp |
From: Vaclav P. <vac...@se...> - 2012-12-18 21:39:53
|
= stm8 (and related ports) code size = A history of code size in sdcc and a comparison to other compilers. == STM8 code size comparison (bytes): == {| border=1 class="simple" ! File ! Cosmic C² stm8 ! sdcc #xxxx stm8 |- | cvu_vinb.c | 7 | x |- | galois_lfsr.c | 21 | x |- | get_tile.c | 95 | x |- | huffman_iterative.c | 161 | x |- | huffman_recursive.c | 198 | x |- | init_loop.c | 33 | x |- | insertion_sort.c | 108 | x |- | memcpy_compression.c | 31 | x |- | memtovmemcpy.c | 30 | x |- | play_music.c | 393 | x |- | sdcc_divulong.c | 100 | x |- | sdcc_mullong.c | 132 | x |- | set_screen_mode.c | 49 | x |- | set_sprite_x.c | 66 | x |- | z88dk-mktime.c | 239 | x |- | total | x | x |} The benchmark files can be found at http://colecovision.eu/stuff/testbench.tar.gz ² C Compiler for STM8 (COSMIC Software); Generator V4.3.4 - 23 Mar 2010 This is a list of feature requests, that combined will probably get code size down near to the level of the non-free compilers. {| border=1 class="simple" ! '''Request ID''' ! '''Summary''' |- | |} |
From: Vaclav P. <vac...@se...> - 2012-12-19 07:30:28
|
Hello, I do not know what exactly happened with emailer when I was sending previous email, but I wrote following: I quickly did code size tests for STM8. I took template from SDCC wiki and it is attached. I would upload it to Wiki but I do not have rights for it. Anyway, attachment in the last email was sent properly. Philipp, can you check that file please ? I filled in just code size of the separate file. No data regions. Vaclav PS. Does it make a sense to do similar tests for PIC14 and PIC16 ports ? "--------------------------------------------------------------------------- --- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d(http://p.sf.net/sfu/logmein_12329d2d)" |
From: Borut R. <bor...@gm...> - 2012-12-19 16:35:10
|
Page added to wiki at http://sdcc.sourceforge.net/mediawiki/index.php/Stm8_code_size Borut On 19. 12. 2012 08:22, Vaclav Peroutka wrote: > Hello, > > I do not know what exactly happened with emailer when I was sending > previous email, but I wrote following: > > I quickly did code size tests for STM8. I took template from SDCC wiki > and it is attached. I would upload it to Wiki but I do not have rights > for it. > > Anyway, attachment in the last email was sent properly. > > Philipp, can you check that file please ? I filled in just code size > of the separate file. No data regions. > > Vaclav > > PS. Does it make a sense to do similar tests for PIC14 and PIC16 ports ? > > ------------------------------------------------------------------------------ > LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial > Remotely access PCs and mobile devices and provide instant support > Improve your efficiency, and focus on delivering more value-add > services > Discover what IT Professionals Know. Rescue delivers > http://p.sf.net/sfu/logmein_12329d2d > > > > ------------------------------------------------------------------------------ > LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial > Remotely access PCs and mobile devices and provide instant support > Improve your efficiency, and focus on delivering more value-add services > Discover what IT Professionals Know. Rescue delivers > http://p.sf.net/sfu/logmein_12329d2d > > > _______________________________________________ > Sdcc-user mailing list > Sdc...@li... > https://lists.sourceforge.net/lists/listinfo/sdcc-user |
From: Borut R. <bor...@gm...> - 2012-12-19 07:47:42
|
The self-registartion to the wiki is disabled due to spam attacks. If anybody needs the write access, please let me know: send me your prefer use name and I'll give you access. Regards, Borut On 18. 12. 2012 22:33, Vaclav Peroutka wrote: > I quickly did code size tests for STM8. I took template from SDCC wiki > and it is atached. I would upload it to Wiki but I do not have rights > for it. > > Vaclav > > > Over time, and especially recently, it seems people want an STM8 > port of > sdcc. Vaclav has started such a port, but its incomplete, and, IMO has > some serious design flaws¹, which are not worth fixing. > > So, I propose² the following: > > * Write a new STM8 sdcc port. > * Use existing assembler. > * Use existing linker. > * Use existing ucsim. > > What do you think about this? > > Philipp > > ¹ The flaws are mostly due to the port being based on the hc08 port of > more than a year ago. Here's the two most important ones: > > * Back then the hc08 port used the old register allocator, and so does > the stm8 port. Making the hc08 port use the new register allocator > required a rewrite of substantial parts of the code genration. > Combined > a huge improvement was achieved: Just look at the drop in > generated hc08 > code size at > http://sdcc.sourceforge.net/mediawiki/index.php/Hc08_code_size. > The STM8 > has 5 8-Byte registers, which the new register allocator can easily > handle. An STM8 port should be written for the new allocator, not the > old one. > > * The hc08 port places local variables and parameters in memory at > fixed > addresses by default. Some other ports, such as the z80-related > ones use > the stack by default. Typically the first approach results in faster, > but non-compliant (recursion doesn't work) code. There is some > disagreement among sdcc dveelopers as to which should be used by > default > due to the trade-off. However let's have a look at the STM8 addressing > modes, for the typical situation of at most 256 bytes of local > variables > per function but more than 256 bytes of lcoal variables in all > functions > combined. The STM8 would use long direct addressing for local > variables > when doing things hc08-style, while it would use sp indexed addressing > mode when doing things z80-style. Now, sp-indexed mode is faster and > results in smaller code. When instead assuming the case of at most 256 > bytes of local variables in all functions combined, we would get short > indirect mode for the hc08-style. This mode results in roughly the > same > code size and speed as the sp-indexed mode. > This means that for the STM8, doing things hc08-style gives us slower, > non-compliant code. The only exception would be functions which > use more > than 256 bytes of local variables without using aggregates or unions; > such functions are very rare. An STM8 port therefore should always > place > all local variables on the stack. > > ² In case you're wondering if there is some agenda behind my proposal: > > * I have written the new register allocator, and modified z80 code > generation somewhat for it, and have seen a substantial improvement in > the generated code. Together with Eric, I have made the hc08 port use > the new register allocator, which included making big changes to code > generation there. Combined, these resulted in an about 40% > reduction in > generated hc08 code size. The improvement for hc08 was more > spectacular > than for Z80, partially because code generation was tailored to > the new > allocator more. Writing the code generator with the new allocator in > mind from the start thus seems like the way to go. > > * IMO, sdcc should be standard-compliant by default, and only deviate > from compliance if there is good reason to do so and on request of > the user. > > > > ------------------------------------------------------------------------------ > LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial > Remotely access PCs and mobile devices and provide instant support > Improve your efficiency, and focus on delivering more value-add services > Discover what IT Professionals Know. Rescue delivers > http://p.sf.net/sfu/logmein_12329d2d > > > _______________________________________________ > Sdcc-user mailing list > Sdc...@li... > https://lists.sourceforge.net/lists/listinfo/sdcc-user |
From: Philipp K. K. <pk...@sp...> - 2013-04-14 10:21:40
|
On 18.12.2012 22:33, Vaclav Peroutka wrote: > I quickly did code size tests for STM8. I took template from SDCC wiki > and it is atached. I would upload it to Wiki but I do not have rights > for it. Can you also send me the generated .asm files, so I can see what Cosmic C might to better / worse than sdcc? Philipp P.S.: It would be nice to also have the results for the Raisonance compiler in the wiki. |
From: Valentin D. <val...@gm...> - 2013-04-15 03:51:10
|
Yesterday, we discovered the following while developing the STM8 port with Philipp (we're testing on real hardware). 1) addw SP, #XX takes signed value and costs 1 cycle 2) sub SP, #XX takes unsigned value and costs 1 cycle I'm just wondering if it can be met with ucsim. P.S. chip used: stm8l152c6t6 |
From: Vaclav P. <vac...@se...> - 2013-04-15 05:43:24
|
Hi Valentin, you can try how it works in ucsim. But I probably do not understand well how addw does work in your example. If you use 2's complement and you add and wrap - it automatically does subtraction, doesn't it ? Vaclav ---------- Původní zpráva ---------- Od: Valentin Dudouyt <val...@gm...> Datum: 15. 4. 2013 Předmět: [Sdcc-user] ucsim: datasheet inaccuracies in STM8 "Yesterday, we discovered the following while developing the STM8 port with Philipp (we're testing on real hardware). 1) addw SP, #XX takes signed value and costs 1 cycle 2) sub SP, #XX takes unsigned value and costs 1 cycle I'm just wondering if it can be met with ucsim. P.S. chip used: stm8l152c6t6 ---------------------------------------------------------------------------- -- Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis & visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter _______________________________________________ Sdcc-user mailing list Sdc...@li... https://lists.sourceforge.net/lists/listinfo/sdcc-user" |
From: Valentin D. <val...@gm...> - 2013-04-15 07:23:14
|
We're adding a 8-bit integer to 16-bit integer. So it's makes a difference is it signed or not. Ok, I'll try. On 15.04.2013 12:38, Vaclav Peroutka wrote: > Hi Valentin, > > you can try how it works in ucsim. But I probably do not understand > well how addw does work in your example. If you use 2's complement and > you add and wrap - it automatically does subtraction, doesn't it ? > > Vaclav |