From: Ian R. <ian...@ma...> - 2006-05-27 16:17:38
|
Hi, I'm investigating bug #1488798 and I think liveness analysis for SSA form is wrong, but I'd appreciate advice. This e-mail is long, but I want my explanation to be verbose so people can pull me up on points within it. I also think it might be informative to those not familiar with liveness analysis and the IR format. When we're in SSA form and want to return to non-SSA form, we must generate moves to replace phi instructions. Eg: if (x > 0) y1 = 0 else y2 = 1 y3 = phi (y1, y2) becomes: if (x > 0) { y1 = 0 y3 = y1 } else { y2 = 1 y3 = y2 } there is a special situation when the result of the phi instruction is live at the exit of the block where the move is being inserted. Eg: x1 = 0 loop: x2 = phi (x1, x3) ... x3 = x2 + ... if ... goto loop y = x2 z = x3 if that became: x1 = 0 x1 = x1 loop: ... x3 = x2 + ... x2 = x3 if ... goto loop y = x2 z = x3 then clearly y and z would have a different meaning after leaving SSA form. So we generate instead: x1 = 0 t1 = x1 loop: x2 = t1 ... x3 = x2 + ... t1 = x3 if ... goto loop y = x2 z = x3 A different way, to avoid generating a temporary, is to break the edge and place the copy on the edge. Sounds complex but in the example this just looks like: x1 = 0 x2 = x1 loop: ... x3 = x2 + ... if !... goto next x2 = x3 goto loop next: y = x2 z = x3 So leaving SSA form seems not too tricky. We either: 1) generate a copy 2) or if that's unsafe either generate a copy on the back edge or generate a copy involving a temporary. I'm pretty certain that bug #1488798 is because we've handled something that's a case 2 as a case 1. So how do we detect when to generate a case 2? We generate a set that is the union of all the in liveness for all the successor blocks. What is the in liveness? It is a list of all the registers that must be live at the entry to the basic block. We generate the in liveness by traversing the CFG backwards. Each basic block has operands that it uses, these need generating so are known as gens. A basic block also defines operands, knowns as the kills. For the factored CFG (FCFG) of the Jikes RVM, potentially exceptioning instructions (PEIs) mean that we may leave a basic block early, so we also generate a set of kills created before the first PEI. Bug #1488798 relates to code with no PEIs - so no need to worry about these. We don't want the ins to be large, as knowing a register is dead allows for optimisation. To compute the in set the following is performed currentSet = union of all successor block in sets In = currentSet - BBKillSet U (exceptionBlockSummary - firstPEIKillSet) U Gen which in bug #1488798 becomes: currentSet = union of all successor block in sets In = currentSet - BBKillSet U Gen This all looks good, but the problem comes in that we handle phi instructions differently in the liveness analysis and I strongly believe this is broken. I think the fix requires reworking the SSA optimisations though. For phi instructions we generate the kill set in the same way as for a regular instruction. The gen set is generated differently though. We generate the gen set by visiting the successor of a basic block and adding to the gen set the uses of the successor blocks phi instructions (but only for phi operand for the edge from the basic block to the successor). Here's the comment from the code: The rvals of phi nodes are logically uses in the phi's predecessor blocks, so here we collect phi rvals from the current block's successors into the gen set for this block, being careful to collect only the appropriate rval This logic differs from a normal instruction. With a normal instruction a use in a basic block means that operand is added to the gen set. For phi instructions, a use in the basic block means its added to the predecessors gen set. I think this logic is causing bug #1488798 as for the loop in question this means that the basic block which uses the operand doesn't show this in its in set. Therefore we generate a case 1 copy. The working is like this: block0: // kill set: x1 // gen set: empty // in set: empty x1 = 0 block1: // kill set: x2 // gen set: empty // in set: x3 -- the in set from block3 with the kill set removed x2 = phi (x1, x3) goto block3 block3: // kill set: x3 // gen set: x2, x3 // in set: x2, x3 ... x3 = x2 + ... if ... goto block1 block4: // kill set: y, z // gen set: empty // in set: empty y = phi x2, ... z = phi x3, ... when considering basic block3 and inserting a copy for "x2 = phi (x1, x3)" of "x2 = x3" then the union of the in sets for block4 and block1 is "x3". As block3 doesn't believe x2 is used after block3 we don't generate a case 2 copy. This leads to: block0: x1 = 0 x2 = x1 block1: goto block3 block3: ... x3 = x2 + ... x2 = x3 y = x2 z = x3 if ... goto block1 block4: In other words y and z are equivalent, where as previously "y = x2" and "z = x2 + ..." and this is the bug. I think the treating of phi instructions specially is the problem here. I think a use in a phi instruction is a gen in the basic block it appears in. Making this change is relatively trivial but it breaks SSA formation. I believe the SSA live analysis needs fixing and then the SSA formation code rewriting. This change is going effect all SSA phases that use liveness information. What I hope is that someone can explain if my logic is wrong? Or if someone can point out pitfalls I'm not considering. I've looked at the FCFG PASTE'99 paper, but this doesn't describe liveness analysis for SSA form. Maybe these pitfalls have been discussed in one of the other papers about the IR format or in the reviewers feedback. Obviously the liveness analysis for the FCFG is something the Jalapeno and Jikes RVM authors have thought a lot about, so saying the algorithm for constructing it for SSA form is wrong is something quite bold. I hope this e-mail will convince people other than myself that the changes I'm going to need to make to the SSA form are necessary. Thanks, Ian Rogers - http://www.cs.man.ac.uk/~irogers |
From: Avery M. <av...@av...> - 2006-05-27 20:41:34
|
Hi All, I am exploring RVM scheduling algorithms, and found sysVirtualProcessorBind(int) implemented for AIX (via bindprocessor) but not for Linux. Am I missing where this functionality is implemented, or is processor affinity support presently lacking on Linux? If so, is there any reason sysVirtualProcessorBind in sys.C cannot be extended to support Linux (i.e. add sched_setaffinity() support), say using PLPA (to avoid kernel signature issues)? Thanks in advance! -a |
From: David P G. <gr...@us...> - 2006-05-31 13:28:51
|
> I am exploring RVM scheduling algorithms, and found > sysVirtualProcessorBind(int) implemented for AIX (via bindprocessor) but not > for Linux. Am I missing where this functionality is implemented, or is > processor affinity support presently lacking on Linux? > > If so, is there any reason sysVirtualProcessorBind in sys.C cannot be > extended to support Linux (i.e. add sched_setaffinity() support), say using > PLPA (to avoid kernel signature issues)? I think this would be ok. Most likely it isn't there because by the time we ported to linux from AIX we weren't using sysVirtualProcessorBind as much. --dave |
From: Ian R. <ian...@ma...> - 2006-05-30 13:28:39
|
Hi, ok, I'm wrong, it was inevitable :-) The liveness analysis says that the in set doesn't include uses from phi instructions as that would mean that in the case: if (x > 0) y1 = 0 // block 1 else y2 = 1 // block 2 y3 = phi (y1, y2) y1 and y2 were both live on both the edge from block 1 _and_ block 2. Looking for a use/def of y2 in block 1 would be fruitless and gives rather expectedly breaks things. So it's the out set computation in leave SSA that's the problem. The out set is the upwardly exposed uses of an operand that are live at the exit of a basic block (as opposed to the in set that are live at entry). The out set computation needs to include any appropriate uses from phi instructions in successor basic blocks. The current algorithm does do this as it merely unions all the in sets of successor basic blocks together - thereby missing any uses from phis. So liveness analysis requires a proper out set computation performing and this value should be used in leave SSA. I'd still welcome feedback. I propose adding out sets to the liveness information and modifying leave SSA to use them. Thanks to Jeremy Singer for his help. Regards, Ian |
From: Michael H. <hi...@us...> - 2006-05-30 15:11:27
|
Hi Ian, > Hi, > > ok, I'm wrong, it was inevitable :-) > > The liveness analysis says that the in set doesn't include uses from phi > instructions as that would mean that in the case: > > if (x > 0) > y1 = 0 // block 1 > else > y2 = 1 // block 2 > y3 = phi (y1, y2) > > > y1 and y2 were both live on both the edge from block 1 _and_ block 2. > Looking for a use/def of y2 in block 1 would be fruitless and gives > rather expectedly breaks things. > > So it's the out set computation in leave SSA that's the problem. The out > set is the upwardly exposed uses of an operand that are live at the exit > of a basic block (as opposed to the in set that are live at entry). The > out set computation needs to include any appropriate uses from phi > instructions in successor basic blocks. The current algorithm does do > this as it merely unions all the in sets of successor basic blocks > together - thereby missing any uses from phis. So liveness analysis > requires a proper out set computation performing and this value should > be used in leave SSA. > > I'd still welcome feedback. I propose adding out sets to the liveness > information and modifying leave SSA to use them. If I understand things correctly, it seems to me that the enhancement needed is to compute the liveness out sets with knowledge about phis. If so, shouldn't this be done in the Leave SSA code? The liveness code is used by a few other clients, such as register allocation and gc maps, so I'm wondering if it makes sense to keep client-specific requirements in the client code (leave SSA in this case), where ever possible? I haven't looked at the liveness code in sometime now, nor am sure I fully understand the current situation, so correct me if I have things wrong. Thanks, Mike |
From: Ian R. <ian...@ma...> - 2006-05-30 16:05:26
|
Michael Hind wrote: > If I understand things correctly, it seems to me that the enhancement > needed is to compute the liveness out sets with knowledge about phis. If > so, shouldn't this be done in the Leave SSA code? Yes. Having looked at the use for the out set, it's only used in leave SSA. What I'm proposing to change (still working on it), is to have a helper method in OPT_LiveAnalysis that gets/generates the out set. Apparently the theory is that a use in a phi instruction is logically a use on the edge from the basic block containing the phi instruction. So the live analysis is correct to place the phi uses in the gen sets of predecessor basic blocks, if we were creating out sets it would also be correct to place the phi uses in the out set of the predecessor basic block. Currently this isn't done by leave SSA, but given we already have a lot of phi handling code in the liveness analysis I think it makes sense to put it there (in fact the same method that iterates over the instructions adding them to the gen set can be reused to add them to the out set). Thanks, Ian |
From: Ian R. <ian...@ma...> - 2006-06-02 18:19:53
|
I thought I should write this to see if it stimulates any thoughts from other people. Another SSA complication is that we allow (or explicitly test for and allow in leave SSA, whether it can happen is another matter): block0: x1 = ... block1: x2 = phi x1/block0, ... x3 = phi x2/block0, ... ie the definition of x3 depends on a definition by a former phi node in the same block. This makes life hard for leave SSA as we may need to split the edge or rename the copy for x2, but x3 may appear fine meaning we could mistakenly insert a copy of "x3 = x2" before the copy that defines x2. I believe this also makes a strange case in the liveness analysis as: block0: // kill = x1 // gen/in = x2 x1 = ... block1: // kill = x2, x3 // gen/in = empty x2 = phi x1/block0, ... x3 = phi x2/block0, ... so we think there's an upwardly exposed use of x2, whereas I think the use of x2 by x3 is logically killed on the edge between block0 and block1 and so we should have liveness information that looks like: block0: // kill = x1 // in = empty x1 = ... block1: // kill = x2, x3 // gen/in = empty x2 = phi x1/block0, ... x3 = phi x2/block0, ... I think that given the problems this situation allows in the IR we should outlaw it with a test in the IR verification. The example above would be equivalent to: block0: x1 = ... block1: x2 = phi x1/block0, ... x3 = phi x1/block0, ... and when the verification fails extra code can be adding to make sure we adhere to this form. Comments appreciated. Many thanks, Ian |
From: Ian R. <ian...@ma...> - 2006-06-02 19:11:13
|
Ian Rogers wrote: > block0: > x1 = ... > block1: > x2 = phi x1/block0, ... > x3 = phi x1/block0, ... thinking about it more I'd still be scuppered by: block0: x1 = ... block1: x2 = phi x1/block0, ... x3 = phi x2/block1, ... I guess it's a special case worth worrying about.. Ian |
From: Ian R. <ian...@ma...> - 2006-05-31 14:23:16
|
Hi, Having fixed the problem with the liveness analysis of out sets in SSA form, leave SSA is still broken. I switched from using the "insert temporary and copy" method to "breaking the edge" (ie the other case 2 choice from my first e-mail) and this fixes this problem. I think some more thought is needed for the copying of temporaries in case 2, but this leads me to question why we have 2 approaches to handling the case 2 situation? Splitting the edge creates a new basic block, but generates fewer instructions. It also doesn't require as much processing of the CFG fixing up register values. It seems to me that always splitting the edge may make a sensible default and given the temporary copying approach is broken, we could just remove this. It may possibly make leaving SSA cheaper as well. Feedback appreciated. Thanks, Ian Ian Rogers wrote: > Hi, > > ok, I'm wrong, it was inevitable :-) > > The liveness analysis says that the in set doesn't include uses from phi > instructions as that would mean that in the case: > > if (x > 0) > y1 = 0 // block 1 > else > y2 = 1 // block 2 > y3 = phi (y1, y2) > > > y1 and y2 were both live on both the edge from block 1 _and_ block 2. > Looking for a use/def of y2 in block 1 would be fruitless and gives > rather expectedly breaks things. > > So it's the out set computation in leave SSA that's the problem. The out > set is the upwardly exposed uses of an operand that are live at the exit > of a basic block (as opposed to the in set that are live at entry). The > out set computation needs to include any appropriate uses from phi > instructions in successor basic blocks. The current algorithm does do > this as it merely unions all the in sets of successor basic blocks > together - thereby missing any uses from phis. So liveness analysis > requires a proper out set computation performing and this value should > be used in leave SSA. > > I'd still welcome feedback. I propose adding out sets to the liveness > information and modifying leave SSA to use them. Thanks to Jeremy Singer > for his help. > > Regards, > > Ian |