|
From: Julian S. <js...@ac...> - 2002-12-09 00:17:47
|
Hi. I merged 61-special-d 62-lazy-eflags 67-dist 65-fix-ldt 55-ac-clientreq Thanks as ever for them. Have considered 01-partial-mul but am somewhat put off by the fact that it doesn't cover all smul+umul cases and therefore only patchily achieves its aim. How about modifying the UMUL/SMUL uinstrs so that they do a NxN -> 2N multiply for N=8/16/32 bits, taking two TempRegs, which are read as operands, and then have the double-length result written to them both? This simplifies the code generation too since you can just generate the NxN -> 2N x86 insn (IIRC; not sure if it is available for insns and signedness)? Or perhaps it's not worth the effort. I've started to fix various end-user reported bugs, as part of stabilisation efforts, as you'll see from the cvs mail. Thx for your msg re meaning of new_emit, which just arrived. J |
|
From: Jeremy F. <je...@go...> - 2002-12-09 03:48:02
|
On Sun, 2002-12-08 at 16:25, Julian Seward wrote:
> Have considered 01-partial-mul but am somewhat put off by the fact that it
> doesn't cover all smul+umul cases and therefore only patchily achieves its
> aim. How about modifying the UMUL/SMUL uinstrs so that they do a
> NxN -> 2N multiply for N=8/16/32 bits, taking two TempRegs, which are
> read as operands, and then have the double-length result written to them
> both? This simplifies the code generation too since you can just generate
> the NxN -> 2N x86 insn (IIRC; not sure if it is available for insns and
> signedness)?
>
> Or perhaps it's not worth the effort.
Well, in terms of frequency, I didn't find any of the other multiply
forms being used in real code. gcc can be convinced to use the 8 bit
multiply, but partial results don't matter there (at least, I haven't
found any uses of multiply which expect partial results from partial
arguments at the bit level).
In particular, I didn't find any uses of unsigned multiply, so I'm
really unsure about whether its worth adding a new UMUL UInstr just for
its sake. (I know I reserved an opcode for it, but there's no other
support for it.)
That said, it has been a long while since I looked at that patch in
detail, so maybe there's some simple improvements. In particular, I
think it leaves some dead code, so that should be cleaned up.
> I've started to fix various end-user reported bugs, as part of
> stabilisation efforts, as you'll see from the cvs mail.
Yes. As you can see I've started making the attempt at packaging
everything up. I think we should push out another dev snapshot soon so
that we can get more eager testers.
J
|
|
From: Julian S. <js...@ac...> - 2002-12-09 19:25:04
|
[...] > That said, it has been a long while since I looked at that patch in > detail, so maybe there's some simple improvements. In particular, I > think it leaves some dead code, so that should be cleaned up. I just a bit disinclined against having two mechanisms for integer multiplication (the helper fns _and_ direct ucode). If the direct route covered all the bases, I'd take it. Not only does it allow scope for better instrumentation, the generated code is surely better too. > > I've started to fix various end-user reported bugs, as part of > > stabilisation efforts, as you'll see from the cvs mail. > > Yes. As you can see I've started making the attempt at packaging > everything up. I think we should push out another dev snapshot soon so > that we can get more eager testers. I'll try building the current head on various distros, and if that looks promising, I'll try and emit a 1.9.1 snapshot this evening. J |
|
From: Jeremy F. <je...@go...> - 2002-12-09 21:51:40
|
On Mon, 2002-12-09 at 11:32, Julian Seward wrote:
> [...]
> > That said, it has been a long while since I looked at that patch in
> > detail, so maybe there's some simple improvements. In particular, I
> > think it leaves some dead code, so that should be cleaned up.
>
> I just a bit disinclined against having two mechanisms for integer
> multiplication (the helper fns _and_ direct ucode). If the direct route
> covered all the bases, I'd take it. Not only does it allow scope for better
> instrumentation, the generated code is surely better too.
Well, there are really two kinds of multiply: the NxN->2N set, and the
NxN->N set. The latter has a UInstr, but the former are done with
helpers. Since the 2N forms are slower instructions which stomp
specific registers, they're not really desireable to generate all the
time; it seems to me that to support inline code generation for the 2N
forms pretty much requires separate opcodes, which leads for 4 being
used for multiply (though perhaps a flag can be used to distinguish
either N from 2N or signed from unsigned, though all the unsigned
multiplies are 2N).
I don't think the quality of the generated code is all that important
since the helpers aren't that expensive to call (push and pop are
cheap), and 2N forms are hardly ever used in code I've tested. Also,
making sure that everything is in the right register would kill a lot of
the potential efficency gains (unless the regalloc can be changed to
make sure that specific temp end up in specific registers so that the
rearrangement happens at compile time rather than runtime - but that
sounds even more complex).
J
|