On Fri, 2008-10-03 at 17:46 -0700, Andrew Pinski wrote:
On Fri, Oct 3, 2008 at 5:40 PM, Jeff Law <email@example.com> wrote:
I'd be hard pressed to see how this patch could cause any kind ofthe
performance regression since it's merely propagating information to
backend that was getting lost in regrename and I don't see that theppc uses
REG_POINTER in its backend.
PPC does not use it but the way lwx is used is [base+index] which
should really be done that way, otherwise it will have some stalls in
some cases (on Power 6).
After further investigation, there is some light to this problem.
There appears to be a bad interaction between REG_POINTER and RTL alias
analysis, as we see below.
The epilogue for these two functions from vortex (Mem_GetAddr,
Mem_GetWord) are the following, for both revisions:
Mem_GetAddr (revision 140615):
* lis 8,Test@ha
* stw 0,Test@l(8)
Mem_GetAddr (revision 140616):
* lis 8,Test@ha
* stw 0,Test@l(8)
Mem_GetWord (revisions 140615 and 140616):
* lis 11,Test@ha
* stw 0,Test@l(11)
Clearly, the code generation for Mem_GetWord is unaffected by the
changes from revision 140616.
The problem happens with Mem_GetAddr/revision 140616. We can see that
"lis 8,Test@ha" and "stw 0,Test@l(8)" are very close to each other,
leading to possible stalls in the pipeline, which doesn't happen with
the code generated by revision 140615.
The positioning of "lis" and "stw" we see in revision 140616 is due to a
number of dependencies created wrt the following lwz instructions, so
the "stw" needs to execute before that, avoiding the possibility to move
"stw" further down.
So we're left with the question of why these dependencies are being
Digging further, revision 140616 enabled some registers to store a value
meaning its contents are, in fact, a pointer, and we access this data
Looking at this specific code inside alias.c:find_base_term:
if (REG_P (tmp1) && REG_POINTER (tmp1))
return find_base_term (tmp1);
if (REG_P (tmp2) && REG_POINTER (tmp2))
return find_base_term (tmp2);
Since up until revision 140615 we didn't store the specific information
mentioned above, we just flew by these conditions and returned either
tmp1 or tmp2, not going into the recursive call.
With revision 140616, we actually have the pointer-in-reg information,
and thus we go into the recursive call to either "find_base_term (tmp1)"
or "find_base_term (tmp2)".
This should work as expected, as it really does for Mem_GetWord, but for
Mem_GetAddr, the recursive call to "find_base_term (tmp1)" returns NULL.
This leads us back to "init_alias_analysis", where we actually fill up
the reg_base_value vector that is used, inside find_base_term, to return
the base term we want.
Keeping it short, the 8th entry in that vector, that should've contained
the correct base term, is blank due to the 8th register being defined
and set before the "stw" instruction in the epilogue. This specific
situation led to all the others, including the performance degradation.
This also relates to the -fno-strict-aliasing flag, which sets aliases
to 0 and we need to do extra work to sort out the conflicting pieces.
Without -fno-strict-aliasing, the code is generated correctly.
So, it seems the RTL alias analysis framework in GCC is not designed to
handle REG_POINTER" on hard registers. I would like to suggest that we
return to Peter Bergner's proposed solution.