Richard Guenther wrote:
On Wed, Oct 14, 2009 at 6:27 PM, Vladimir Makarov <vmakarov@redhat.com>
wrote:
Zdenek Dvorak wrote:
Hi,
+ if (i < ira_reg_class_cover_size)
+ size_cost = comp_cost + 10;
+ else
+ size_cost = 0;
Including comp_cost in size_cost makes no sense (this would prevent us
from
moving even very costly invariants out of the loop if we run out of
registers).
That is exactly what I intended. As I wrote above, I tried a lot of
heuristics with different parameters which decided to move loop
invariant
depending on spill cost and loop invariant cost. But they don't work
well
at least for x86/x86_64 and power6. I have some speculation for this.
x86/x86_64 is OOO processors these days. And costly invariant will
be
hidden because usually the invariant has a lot of freedom to be
executed
out-of-order. For power6, long latency is hidden by insn scheduling.
It
is hard to me find a processor where it will be important. Another
reason
for this, it is very hard to evaluate accurately spill cost at this
stage.
So I decided not to use combination of register pressure and
invariant
cost in my approach.
could you please add this reasoning to the comment? Another reason why
preventing the invariant motion does not hurt might be that all
expensive
invariants were already moved out of the loop by PRE and gimple
invariant
motion pass.
+ for (i = 0; i < ira_reg_class_cover_size; i++)
+ {
+ cover_class = ira_reg_class_cover[i];
+ if ((int) new_regs[cover_class]
+ + (int) regs_needed[cover_class]
+ + LOOP_DATA (curr_loop)->max_reg_pressure[cover_class]
+ + IRA_LOOP_RESERVED_REGS
+ - ira_available_class_regs[cover_class] > 0)
+ break;
+ }
It might be clearer to write this as ... >
ira_available_class_regs[cover_class] instead
of ... - ira_available_class_regs[cover_class] > 0. Otherwise, the
patch
is OK.
Zdenek, thanks for the additional comments. I incorporated them into the
patch just before committing. Here is the affected patch part:
I think this consistently regressed both compile-time and runtime for
Polyhedron on x86_64. For Itanium the story isn't clear, but effects
are seen there as well (it's also the only one I see off-noise effects
on SPEC 2000 - significant ups and downs).
Yes, it is expensive optimization (at least 3 additional passes
through RTL insns one for calculating register pressure and two very
expensive passes for finding register classes for pseudos). It is
clearly seen from SPEC compilation time graphs on
http://vmakarov.fedorapeople.org/spec
for 2 last benchmarking. Therefore I proposed it only for -O3.
Overall SPEC2000 scores are practically the same on x86/x86_64.
As for Polyhedron benchmarks, here is my results on Core I7:
first: -ffast-math -funroll-loops -O3 -fno-ira-loop-pressure
second: -ffast-math -funroll-loops -O3 -fira-loop-pressure
x86:
Geometric Mean Execution Time = 12.84 seconds
Geometric Mean Execution Time = 12.82 seconds
x86_64:
Geometric Mean Execution Time = 9.89 seconds
Geometric Mean Execution Time = 9.91 seconds
On power6:
first: -mtune=power6 -ffast-math -funroll-loops -O3 -fno-ira-loop-pressure
second: -mtune=power6 -ffast-math -funroll-loops -O3 -fira-loop-pressure
Geometric Mean Execution Time = 19.22 seconds
Geometric Mean Execution Time = 19.04 seconds
As I wrote earlier the winner of the optimization usage will be
loops with pressure lower (but not too lower) than #registers. For
x86/x86_64, practically all loops have pressure more than #registers.
For such loops, evaluation of invariant cost vs spill cost would be
important. But at this stage, spill cost is impossible to evaluate
accurately. So usage of old and new loop invariant motion criteria on
processors similar x86/x86_64 will give different results for particular
tests (some tests better, some worse) but overall score will be
practically the same.
Probably, there is no sense to use IRA-based register pressure calculation
for all targets (including x86/x86_64) but for power it is a clear win as it
is seen from polyhedron and as I reported for SPEC2000.
So we could switch it off by default for -O3. What do you think about this
solution, Richard?