This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/51017] [4.8/4.9/5 Regression] GCC performance regression (vs. 4.4/4.5), PRE increases register pressure too much
- From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 18 Feb 2015 11:09:34 +0000
- Subject: [Bug tree-optimization/51017] [4.8/4.9/5 Regression] GCC performance regression (vs. 4.4/4.5), PRE increases register pressure too much
- Auto-submitted: auto-generated
- References: <bug-51017-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017
--- Comment #21 from Richard Biener <rguenth at gcc dot gnu.org> ---
We do already inhibit creating loop-carried dependencies of some kind, but only
when vectorization is enabled (because it can inhibit vectorization). But we
still PRE invariant loads:
Replaced MEM[(vtype * {ref-all})&DES_bs_all + 20528B] with prephitmp_2898 in
all uses of _1195 = MEM[(vtype * {ref-all})&DES_bs_all + 20528B] because we
know
it's {0, 0} on entry. Note that store motion doesn't apply here because
those stores are said to alias with the MEM[(vtype * {ref-all})k_2 + 848B]
kinds (iterating DES_bs_all.KS.v - unfortunately field-sensitive points-to
analysis doesn't help here as the points-to result itself isn't
field-sensitive).
Of course without store-motion applying this kind of PRE is not really useful.
If store-motion applied it would create the same kind of problem, of course
(in this case up to 0x300(?) live registers).
One possible solution is to simply avoid this kind of "partly" store-motion,
that is converting
for (;;)
reg = MEM;
MEM = fn(reg);
to
reg = MEM;
for (;;)
reg = fn(reg);
MEM = reg;
of course this is also a profitable transform. Thus the solution might be
instead to limit register pressure in some way by somehow assessing costs
to individual transforms. At least it seems to be too difficult for
the register allocator to re-materialize 'reg' from MEM (as it would also
need to perform sophisticated analysis to determine that, basically
undoing the PRE transform).