OnBoard C / Feature Requests / #63 OnBC: support for arrays larger than 32k

Lincoln Ramsay - 2009-02-21

Since I was out and didn't have access to the OnBoardC code today I spent some time going over the assembly it produces. I used Code68Dis (an apparently unreleased 68k disassembler I got from John Wilund) to see what OnBoardC had generated for each of the two statements. I've commented what is going on (I used http://68k.hax.com/ to fill in my understanding of 68k assembler as I went).

value = arrayP[8192];

Store the address of arrayP in a0
movea.l #-24(a6),a0
Store 8192 in d0
move.w $2000,d0
Left shift d0 by 2 (a quick way of calculating 8192 * sizeof(UInt32))
lsl.w #2,d0
Store offset d0 from a0 into value
move.l (a0,d0.w),#-36(a6)

The overflow does not actually occur until the last line (which interprets the word value 0x8000 as -1 instead of 32768). It is valid to say (a0,d0.l) so I would expect OnBoardC could generate code that treats d0 as a long instead of a word and the problem will be gone. That would be much better than the result of the other statement.

value = *(arrayP+8192);

Push 8192 on the stack
move.l $00002000,-(sp)
Push sizeof(UInt32) on the stack
move.l $00000004,-(sp)
Execute a function, most likely a special routine to multiple two DWORDs, storing the result in d0.l
jsr.l $0402(pc)
Increment the stack pointer by 8 (pop off the values we put on before)
addq.l #8,sp
Store the address of arrayP in a0
movea.l #-24(a6),a0
Add d0 to a0 and store in a0
adda.l d0,a0
Store a0 in value
move.l (a0),#-36(a6)

This uses longs everywhere but the use of a multiplication function means this is going to be much slower (unless that function is trapped by the 68k emulator and executed as native ARM - though you've still got the trap overhead...).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Lincoln Ramsay - 2009-02-21

Here is a patch that makes this work. It causes array offsets to be treated as long values and needs a hack to get the assembler to produce the correct move.l (a0,d0.l) statement.

diff --git a/OnBoardC/Src/codegen.c b/OnBoardC/Src/codegen.c
index 823f06f..e2b0347 100644
--- a/OnBoardC/Src/codegen.c
+++ b/OnBoardC/Src/codegen.c
@@ -337,6 +337,12 @@ void emitOperand(Node *operand)
writeRegToBuffer(operand->reg);
writeTokenToBuffer(Asm_Comma);
writeRegToBuffer(operand->reg2);
+ // force long (rather than word) for array offsets
+ // so that we can access >32k offset from the array start
+ writeTokenToBuffer(Asm_Period);
+ writeTokenToBuffer(Asm_Identifier);
+ writeTokenToBuffer(1);
+ writeTokenToBuffer('l');
writeTokenToBuffer(Asm_RightParen);
}
break;
diff --git a/OnBoardC/Src/compile.c b/OnBoardC/Src/compile.c
index 406e9b5..04e8b62 100644
--- a/OnBoardC/Src/compile.c
+++ b/OnBoardC/Src/compile.c
@@ -2234,7 +2234,7 @@ Boolean doOperator()
}
if ((operand2->typeIndex != LongTypeIndex)
&& (operand2->typeIndex != UnsignedLongTypeIndex)) {
- operand2 = addConvert(operand2, IntTypeIndex);
+ operand2 = addConvert(operand2, LongTypeIndex);
if (operand2 == NULL) return NULL;
}
if (getTypeSize(getBaseType(operand1->typeIndex)) > 1) {

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Lincoln Ramsay - 2009-02-21

For a binary with the patch applied, see:

http://ramsay.webhop.org/OnBoard/long_array_offsets.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Lincoln Ramsay - 2009-02-21

I'm unhappy with the patch as a general solution.

While in the code I noticed that any types for array indexes other than "long" and "unsigned long" will be converted to "int". The patch changes this to convert to "long" instead which causes the multiplication to be done in a slower way.

eg.
push (long) 8192 on the stack
push (long) 4 on the stack
call the "multiply" routine
pop the stack
vs
load (word) 8192
left shift 2

It would be nice perhaps to make this opt-in by continuing to use "int" for any type other than "long" and let the user do something like this to signal to the compiler that a long offset is needed:

value = arrayP[(UInt32)8192];

The patch also forces .l onto the offset register. For some reason the assembler always uses .w here (even when a long value is loaded into the register!). A cleaner fix here would be to add the .l only when a long value has been loaded into the register. That should be possible to do.

I'm not sure if you could handle this in the assembler instead. That would be the cleanest way. I'm not sure though if the assembler is keeping track of what sized data it has loaded into the registers (it probably doesn't since it's just encoding instructions, not making decisions about the size of datatypes).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

OnBC: support for arrays larger than 32k

Group

Searches

Help

#63 OnBC: support for arrays larger than 32k

Discussion