bee / Tickets / #3 More compact instruction encoding

Anonymous - 2020-07-10

Originally posted by: apt1002

byte opcodes in other than the first byte could use the full 8 bits

I think what you mean is that you don't want to waste 2 bits between every instruction, just because you use 2 bits before the first instruction. I agree, obviously.

However, using a 6-bit encoding for the first instruction and an 8-bit encoding for the remainder sounds like a lot of work for Bee and for compilers targeting Bee, and will make it difficult to realise any benefits.

Surely the easy way to get a benefit is to abandon the idea that the boundaries between the instructions are byte-aligned? For example, you could divide the 32-bit word into 2+6+6+6+6+6, or 2+10+10+10, depending on how many opcodes you want to define.

Of course this extends our earlier conversation about concatenating code, and whether you need random access to individual instructions. To recap, I claim that you don't, and that "append" is a sufficient primitive (in addition to random access to words for patching forward references). I don't think there is any need for individual instructions to have addresses. You counter that byte access is a reality and you might as well use it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2020-07-13

Ticket changed by: rrthomas

status: open --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2020-07-13

Originally posted by: rrthomas

Closing this. Bee's 1-instruction-per-word is a very nice simplification, which makes assembly, disassembly and single-stepping much easier than for Mit.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2020-07-14

Ticket changed by: rrthomas

status: closed --> open
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2020-07-14

Originally posted by: rrthomas

Reopening, for two reasons:

If Bee were to be implemented in hardware, not packing opcodes would be really inefficient.

A bit-packed representation (as suggested by @apt1002) could be entirely forwards compatible, in that words with only one real instruction would be interpreted as being padded with NOPs.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2022-02-15

Originally posted by: apt1002

I have been giving this a bit more thought, and I'd like to propose a design. I think it is necessary to distinguish three kinds of instruction:

CALLI, JUMPI, JUMPZI, PUSHRELI are PC-relative, in the sense that they contain an immediate constant whose value depends on the location at which the instruction is assembled.

NOP (which will presumably become NEXT), PUSHI, TRAP, JUMP, CALL, RET, CATCH, THROW, BREAK are terminal, in the sense that they are always the last instruction in an opcode word.

Everything else is non-terminal.

PC-relative instructions need to occupy an entire opcode word, as they do now. Partly this is to allow them to be patched. Partly it is because it is necessary to know the value of PC in order to assemble these instructions, and attempting to pack them with other instructions adds significant complexity. The bottom bits of the word are the opcode, and the remainder of the bits are the immediate constant. An opcode BEE_OP_INSN is reserved to mean "not PC-relative".

In contrast, terminal and non-terminal instructions can be packed. In general, a packed opcode word can contain (after BEE_OP_INSN) zero or more non-terminal instructions, followed by exactly one terminal instruction, maybe followed by an immediate constant.

Packed encoding

Various options exist for packing the instructions:

Use 6-bit opcodes, as suggested earlier in this comment thread.

Use a Huffman code to make common instructions shorter.

Use something fancier.

Supplementary instructions

It will perhaps be desirable to add some non-terminal instructions such as PUSH0 and PUSH1 to avoid using the terminal PUSHI instruction for common small immediate constants.

RETI (returning a constant) might also be useful, to avoid following the terminal PUSHI instruction by a word containing only RET.

NEXTZ

It is slightly annoying that the number of PC-relative instructions is five (including BEE_OP_INSN) as this is slightly too many for a 2-bit opcode. To fix this, I suggest deleting JUMPZI (and also JUMPZ), and adding NEXTZ. The new NEXTZ instruction pops the top item of the data stack, and behaves like NEXT if it is zero, otherwise it increments PC.

JUMPZI and JUMPZ become pseudo-instructions:

JUMPZI can be implemented as NEXTZ if the following word contains JUMPI.

JUMPZ can be implemented as NEXTZ POP if the following word contains JUMP.

Note that JUMPZ and JUMPZI psuedo-instructions are non-terminal. Instructions following them on the not-taken path can be packed into the same word. This might often include the first few instructions of loop bodies and "then" blocks.

NEXTZ can be used to form other pseudo-instructions too:

JUMPNZ can be implemented as NEXTZ JUMP if the following word begins with POP.

RETZ can be implemented as NEXTZ if the following word contains RET.

RETZI can be implemented as NEXTZ if the following word contains RETI.

NOTZ can be implemented as NEXTZ NEXT if the following word contains NOT.

NEGATEZ can be implemented as NEXTZ NEXT if the following word contains NEGATE.

nSWAPZ can be implemented as NEXTZ NEXT if the following word contains PUSHn SWAP.

INCZ can be implemented as NEXTZ NEXT if the following word contains NOT NEGATE.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

More compact instruction encoding

A minimal-effort VM for running Forth

Milestone

Searches

Help

#3 More compact instruction encoding

Discussion

Packed encoding

Supplementary instructions

`NEXTZ`