From: Frieder F. <fri...@we...> - 2016-07-22 18:42:06
|
Hi Philipp, Am 16.07.2016 um 17:23 schrieb Philipp Klaus Krause: > That leaves memcpy(). memcpy() is hard to do well on STM8; the problem > is the count overhead. Loosely related, maybe the cycle count of 2 that is emitted for incw is a little high: sdcc/device/lib/stm8/memcpy.lst [..] 00004B 98 loop_3: 00004B F6 [ 1] 99 ld a, (x) 00004C 90 F7 [ 1] 100 ld (y), a 00004E 5C [ 2] 101 incw x 00004F 90 5C [ 2] 102 incw y 000051 103 loop_2: 000051 F6 [ 1] 104 ld a, (x) 000052 90 F7 [ 1] 105 ld (y), a 000054 5C [ 2] 106 incw x 000055 90 5C [ 2] 107 incw y 000057 108 loop_1: 000057 F6 [ 1] 109 ld a, (x) 000058 90 F7 [ 1] 110 ld (y), a 111 00005A 0A 08 [ 1] 112 dec (8, sp) 00005C 26 CC [ 1] 113 jrne loop 00005E 0A 07 [ 1] 114 dec (7, sp) 000060 26 C8 [ 1] 115 jrne loop [..] According to "PM0044 Programming manual", Rev 3 http://www.st.com/resource/en/programming_manual/cd00161709.pdf page 107 INCW (0x5c) takes just 1 cycle. On page 67 on the other hand it is listed as 2 cycles. The corresponding DECW takes 1 cycle as well, so likely that's what it is for INCW too. If 1 cycle turns out to be true maybe we could have a Duff's device for 32 bit instead of 64 bit?) The memcpy with 8 unrolled copies now needs a substantial 101 bytes, more than 30 bytes could be saved if memcpy used only 4 unrolled copies. Hope that's not a spoilsport for Dhrystones and Coremarks! Greetings, Frieder |