This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Don't lower MIN_EXPR and MAX_EXPR [was Re: GCC Benchmarks(coybench), AMD64 and i686, 14 August 2004]
- From: Paolo Bonzini <paolo dot bonzini at polimi dot it>
- To: Roger Sayle <roger at eyesopen dot com>
- Cc: Paolo Bonzini <bonzini at gnu dot org>, Daniel Berlin <dberlin at dberlin dot org>, GCC Patches <gcc-patches at gcc dot gnu dot org>, Jeff Law <law at redhat dot com>
- Date: Thu, 19 Aug 2004 16:11:45 +0200
- Subject: Re: [PATCH] Don't lower MIN_EXPR and MAX_EXPR [was Re: GCC Benchmarks(coybench), AMD64 and i686, 14 August 2004]
- References: <Pine.LNX.4.44.0408190633270.4605-100000@www.eyesopen.com>
I've no problem with the lowering or not of MIN_EXPR, MAX_EXPR and
ABS_EXPR, but it seems to me that the more fundamental problem is
that we're failing to CSE the phi nodes themselves.
Yes. I know this is only a work around.
For example, it would be nice to optimize
x = a ? b : c;
y = a ? b : c;
into the equivalent
x = a ? b : c;
y = x;
Not only nice -- I think I would have expected it even before tree-SSA...
Alas I'm not a tree-ssa expert but the above transformations would
appear doable with GCC's current infrastructure.
I don't know. I'd expect that PRE/FRE do this, but these things are not
trivial with SSA because the basic blocks in the first phi node do not
dominate those in the second phi node. As far as I could understand,
SSA makes many optimizations easier, but they are still going to require
solving data flow equations, etc., unless only follow
dominator->dominated edges.
For example, in this case
if (a1 < 0)
z1 = a1 + b1;
else
z2 = b1;
z3 = PHI (z1, z2)
if (a1 < 0)
w1 = a1 + b1;
else
w2 = a1;
w3 = PHI (w1, w2)
You cannot simply transform "w1 = z1" in the assignment for w1 because
this does not respect SSA's dominance property (all defs must dominate
their uses).
mole's testcase is simpler but, in concrete terms, it looks like SSA
excels at handling complex nests of if's and while's more than at
optimizing adjacent ones. Again, I'm not sure if this is right, but
given this particular optimizer deficiency, it seems plausible.
Paolo