Bug 31360 - [4.2 Regression] RTL loop invariant is not aggressive enough
Summary: [4.2 Regression] RTL loop invariant is not aggressive enough
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.3.0
: P1 major
Target Milestone: 4.3.0
Assignee: Zdenek Dvorak
URL: http://gcc.gnu.org/ml/gcc-patches/200...
Keywords: missed-optimization
Depends on: 31676 31848 31849
Blocks:
  Show dependency treegraph
 
Reported: 2007-03-26 06:43 UTC by Andrew Pinski
Modified: 2008-03-03 14:30 UTC (History)
9 users (show)

See Also:
Host:
Target: powerpc-*-*
Build:
Known to work: 4.3.0
Known to fail:
Last reconfirmed: 2007-04-16 17:53:36


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Pinski 2007-03-26 06:43:10 UTC
Testcase:
void f(int *a)
{
  int i;
  for (i = 0;i<100;i++)
    a[i] = 0;
}


-----------cut-------------
we get:
_f:
        li r0,0
        li r2,4
        stw r0,0(r3)
        li r0,99
        mtctr r0
L2:
        li r0,0
        stwx r0,r2,r3
        addi r2,r2,4
        bdnz L2
        blr


the "li r0,0" should be moved out of the loop but is not.
Comment 1 Andrew Pinski 2007-03-26 06:48:35 UTC
I get:
Set in insn 47 is invariant (0), cost 4, depends on

Where insn 47 is the load constant:
(insn 47 46 48 3 (set (reg:SI 156)
        (const_int 0 [0x0])) 328 {*movsi_internal1} (nil)
    (nil))

If I change the constant to DEADBEEF, it removes it from the loop.  I should check if this is true on the 4.2 branch too because I think it is.
Comment 2 Richard Biener 2007-03-26 10:14:57 UTC
I guess the cost of loading zero is cheaper then?
Comment 3 Andrew Pinski 2007-03-26 16:51:05 UTC
> I guess the cost of loading zero is cheaper then?

Cheaper than loading 0xDEADBEEF but not cheap enough not to pull out of the loop.  All the "easy" (one instruction) constants are not pulled out.

Really I think rtl loop invariant should pull everything it can pull out of the loop and then let the RA "fix" up the loop when it runs out of registers.
Comment 4 Zdenek Dvorak 2007-03-26 18:00:49 UTC
Subject: Re:  [4.3 Regression] rtl loop invariant is broken

> > I guess the cost of loading zero is cheaper then?
> 
> Cheaper than loading 0xDEADBEEF but not cheap enough not to pull out of the
> loop.  All the "easy" (one instruction) constants are not pulled out.
> 
> Really I think rtl loop invariant should pull everything it can pull out of the
> loop and then let the RA "fix" up the loop when it runs out of registers.

I would agree, if we had RA capable of that (which I am not quite sure
whether we do or not, although this seems simple enough), or better,
RA doing better job under high register pressure.
Comment 5 Andrew Pinski 2007-03-26 18:09:16 UTC
> I would agree, if we had RA capable of that (which I am not quite sure
> whether we do or not, although this seems simple enough), or better,
> RA doing better job under high register pressure.

Then how do you explain why loop.c pulled this out of the loop but the new rtl loop invariant does not?  I don't see why we should care about register pressure except inside the RA.  Every other compiler does it that way, plus they work better than GCC.
Comment 6 Zdenek Dvorak 2007-03-26 18:18:17 UTC
Subject: Re:  [4.3 Regression] rtl loop invariant is broken

> > I would agree, if we had RA capable of that (which I am not quite sure
> > whether we do or not, although this seems simple enough), or better,
> > RA doing better job under high register pressure.
> 
> Then how do you explain why loop.c pulled this out of the loop but the new rtl
> loop invariant does not?  I don't see why we should care about register
> pressure except inside the RA.  Every other compiler does it that way, plus
> they work better than GCC.

some citation (on both claims)?  Of course, it is straightforward to
remove the register pressure checks from the optimizers, so if you want,
you can experiment with that.
Comment 7 Andrew Pinski 2007-03-26 22:11:48 UTC
Plus I just tested the 4.2 branch and it is broken there also, GRRRRRRRRRRRR.  Sorry for finding this late in the release cycle but I guess nobody really looked at the code quality after loop.c was turned off and rtl loop invariant motion was turned on.
Comment 8 Steven Bosscher 2007-03-26 22:13:52 UTC
Re. comment #5

Andrew, stop spreading nonsense.  Other compilers do take register pressure into account.  In fact, they do so much better than GCC usually does.  (In GCC you can't do it well because instructions are selected too late, so you don't know per-class register pressure).  So your stupid and annoying crusade to litter every bug with an "let RA fix it" remark shoud stop, thank you very much.
Comment 9 Andrew Pinski 2007-03-26 22:17:31 UTC
This is still a 4.2 regression.

4.2 produces:
.L2:
        li 0,100
        stwx 0,9,3
        addi 9,9,4
        bdnz .L2
        blr

While 4.1 produces:
.L2:
        stwx 0,9,3
        addi 9,9,4
        bdnz .L2
Comment 10 Steven Bosscher 2007-03-26 22:33:39 UTC
Just a small heuristics problem.  This could be fixed by giving insns to move with comp_cost - size_cost == 0 in gain_for_invariant a small gain anyway.  Or maybe comp_cost should overrule size_cost.

Just as a demonstration of how trivial this is, consider this hack:

Index: loop-invariant.c
===================================================================
--- loop-invariant.c    (revision 123222)
+++ loop-invariant.c    (working copy)
@@ -1101,8 +1101,10 @@ find_invariants_to_move (void)
     }

   new_regs = 0;
+
+  /* Find instructions to move.  Nothing gained is nothing lost.  */
   while (best_gain_for_invariant (&inv, &regs_needed,
-                                 new_regs, regs_used, n_inv_uses) > 0)
+                                 new_regs, regs_used, n_inv_uses) >= 0)
     {
       set_move_mark (inv->invno);
       new_regs += regs_needed;


Looks to me like a small heuristics tweak and some benchmarking would get this one out of the way with only a small effort.
Comment 11 Steven Bosscher 2007-03-26 22:36:02 UTC
And of course the hack doesn't actually work :-)  But it's still just a heuristics tweak what we need here.
Comment 12 Steven Bosscher 2007-03-26 22:43:28 UTC
Zdenek, what is the size heuristic for anyway?  For powerpc, the current heuristic tells the compiler not to move any simple moves:

1) An invariant with a gain of 0 will not be moved.
2) gain = comp_cost - size_cost
3) comp_cost of a simple move is 4
4) all powerpc instructions have a size_cost of 4

(At least, the insns that map to a single machine insns have a size_cost of 4.)

So we tell the compiler never to move simple moves for powerpc :-(

Comment 13 Zdenek Dvorak 2007-03-26 23:05:16 UTC
Subject: Re:  [4.2/4.3 Regression] rtl loop invariant is broken

> Zdenek, what is the size heuristic for anyway?  For powerpc, the current
> heuristic tells the compiler not to move any simple moves:
> 
> 1) An invariant with a gain of 0 will not be moved.
> 2) gain = comp_cost - size_cost
> 3) comp_cost of a simple move is 4
> 4) all powerpc instructions have a size_cost of 4
> 
> (At least, the insns that map to a single machine insns have a size_cost of 4.)

size_cost and global_cost_for_size are somewhat missnamed.  The purpose
of global_cost_for_size is to estimate the effect of register pressure,
and it currently works this way:

1) if the number of used registers is significantly smaller than the
   number of available registers, then the cost for each newly used
   register is target_small_cost (which probably turns out to be 4 on
   ppc)
2) if the number of used registers approaches the number of available
   registers, the cost for a newly used register increases more
   significantly (by target_pres_cost and later by target_spill_cost).

The purpose of 1) should have been to prevent optimizations that do not
gain anything and increase register pressure, but apparently this does
not work quite well.  Two possible solutions are decreasing
target_small_cost (maybe setting it fixed to one), or even removing the
penalty for increasing number of used registers in the case 1)
completely.
Comment 14 Mark Mitchell 2007-04-15 23:10:48 UTC
I think we can all agree that this is an important bug, even if we're not all agreed about exactly how to fix it.  This seems likely be to a frequent problem.

Therefore, despite the fact this is "just" a missed optimization, I'm marking it P1.
Comment 15 Zdenek Dvorak 2007-04-17 17:42:40 UTC
Subject: Bug 31360

Author: rakdver
Date: Tue Apr 17 17:42:29 2007
New Revision: 123919

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=123919
Log:
	PR rtl-optimization/31360
	* cfgloopanal.c (target_small_cost, target_pres_cost): Removed.
	(target_reg_cost): New.
	(init_set_costs): Initialize target_reg_cost.  Add comments
	regarding the rationale of the costs.
	(global_cost_for_size): Renamed to...
	(estimate_reg_pressure_cost): ... and simplify.  Decrease importance
	of register pressure.
	* tree-ssa-loop-ivopts.c (ivopts_global_cost_for_size): Use
	estimate_reg_pressure_cost.  Add number of ivs.
	(determine_set_costs): Dump target_reg_cost.
	* loop-invariant.c (gain_for_invariant):  Use
	estimate_reg_pressure_cost.  Removed n_inv_uses argument.
	(best_gain_for_invariant, find_invariants_to_move): Remove
	n_inv_uses.
	* cfgloop.h (target_small_cost, target_pres_cost): Removed.
	(target_reg_cost): Declare.
	(global_cost_for_size): Declaration removed.
	(estimate_reg_pressure_cost): Declare.

	* gcc.dg/loop-7.c: New test.


Added:
    trunk/gcc/testsuite/gcc.dg/loop-7.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cfgloop.h
    trunk/gcc/cfgloopanal.c
    trunk/gcc/loop-invariant.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-loop-ivopts.c

Comment 16 Steven Bosscher 2007-04-17 22:13:01 UTC
Fixeth on ze trunk.
Comment 17 Mark Mitchell 2007-04-24 22:42:16 UTC
Zdenek, do you think this is appropriate for a backport to 4.2.0?
Comment 18 Andrew Pinski 2007-04-24 22:44:12 UTC
(In reply to comment #17)
> Zdenek, do you think this is appropriate for a backport to 4.2.0?
Well right now it causes a regression so I don't think it is appropriate until that regression is fixed :). See PR 31676 for the regression.
Comment 19 Richard Earnshaw 2007-05-06 00:27:55 UTC
This patch is responsible for the code size regressions on CSiBE at -Os.  It also causes a bootstrap failure on arm-netbsdelf2: genautomata is being miscompiled.
Comment 20 Steven Bosscher 2007-05-06 11:10:04 UTC
Re. comment #19: What "code size regression"?  Your comment is too unspecific.
Comment 21 Richard Earnshaw 2007-05-06 15:34:55 UTC
(In reply to comment #20)
> Re. comment #19: What "code size regression"?  Your comment is too unspecific.
> 

The code size regression that shows up on CSiBE between r123918 and r123919 on ARM, mips and PPC (at least).  Just build CSiBE at -Os with and without this patch.  On ARM this is approx 0.3%, which is a significant move (given that a good improvement is normally about 0.01%).
Comment 22 Steven Bosscher 2007-05-06 18:10:18 UTC
Again, please be specific: Please identify the benchmark, the file, maybe even the function, and extra bonus for figuring out what causes the size to increase.

I don't understand why you expect anyone to go figure out what is wrong, when you've obviously already built CSiBE and can do this yourself in a matter of minutes.
Comment 23 Andrew Pinski 2007-05-06 18:14:13 UTC
Also file a new bug for the code size problem and the wrong code issue and link them to this bug.
Comment 24 Mark Mitchell 2007-05-14 22:25:23 UTC
Will not be fixed in 4.2.0; retargeting at 4.2.1.
Comment 25 Eric Botcazou 2007-06-23 10:16:35 UTC
Any hope of having this fixed on the 4.2 branch too or should it be closed?
Comment 26 Richard Biener 2007-06-23 10:23:40 UTC
I guess its hard to find a fix that does not regress elsewhere, so I'm inclined to WONTFIX this on the 4.2 branch.  As it's still P1 this needs RM approval though.
Mark?
Comment 27 Mark Mitchell 2007-06-23 20:45:09 UTC
Richard G., by:

"I guess its hard to find a fix that does not regress elsewhere, so I'm inclined
to WONTFIX this on the 4.2 branch."

are you referring to functional problems or to Richard E's comments regarding code size?  The followup PR 31676 patch looks very small; couldn't we backport this patch plus that one to 4.2, if we wanted to do so?

As for the code size regression, Richard E., have you had a chance to identify a specific CSiBE file that expands, so that Zdenek can look at that?

Meanwhile, I would prefer to leave this regression open.  I really do feel that it's a severe optimization regression.  After all, this is basically a variant on bzero.

Thanks,

-- Mark
Comment 28 Richard Earnshaw 2007-06-25 08:06:31 UTC
(In reply to comment #27)

> As for the code size regression, Richard E., have you had a chance to identify
> a specific CSiBE file that expands, so that Zdenek can look at that?
> 

There were a number of files that regressed.  PR 31849 is one example.

Note, if this patch is applied to the branch, then the fix for 31848 will probably also be required.
Comment 29 Mark Mitchell 2007-06-25 11:38:40 UTC
Richard --

Thank you for refreshing my memory as to the code-size issue and putting the PR numbers in the audit trail here.

-- Mark
Comment 30 Mark Mitchell 2007-10-09 19:20:18 UTC
Change target milestone to 4.2.3, as 4.2.2 has been released.
Comment 31 Eric Botcazou 2007-11-02 08:17:58 UTC
Any objection to closing this as WONTFIX now?
Comment 32 Andrew Pinski 2007-11-02 08:22:46 UTC
(In reply to comment #31)
> Any objection to closing this as WONTFIX now?

Fine with me since we (sony) is not going to use 4.2.
Comment 33 Eric Botcazou 2007-11-03 07:24:51 UTC
By silent consensus.
Comment 34 Eric Botcazou 2007-11-03 07:27:37 UTC
Reopening...
Comment 35 Eric Botcazou 2007-11-03 07:28:10 UTC
Fixed in the upcoming 4.3 series.
Comment 36 Alexandre Pereira Nunes 2008-02-12 13:13:34 UTC
I think it's worth to note here the implications of the fix to PR31849.
Comment 37 Alexandre Pereira Nunes 2008-03-03 14:30:07 UTC
If PR31849 is right, shouldn't this be reopened or marked as something other than fixed ?