Testcase: void f(int *a) { int i; for (i = 0;i<100;i++) a[i] = 0; } -----------cut------------- we get: _f: li r0,0 li r2,4 stw r0,0(r3) li r0,99 mtctr r0 L2: li r0,0 stwx r0,r2,r3 addi r2,r2,4 bdnz L2 blr the "li r0,0" should be moved out of the loop but is not.
I get: Set in insn 47 is invariant (0), cost 4, depends on Where insn 47 is the load constant: (insn 47 46 48 3 (set (reg:SI 156) (const_int 0 [0x0])) 328 {*movsi_internal1} (nil) (nil)) If I change the constant to DEADBEEF, it removes it from the loop. I should check if this is true on the 4.2 branch too because I think it is.
I guess the cost of loading zero is cheaper then?
> I guess the cost of loading zero is cheaper then? Cheaper than loading 0xDEADBEEF but not cheap enough not to pull out of the loop. All the "easy" (one instruction) constants are not pulled out. Really I think rtl loop invariant should pull everything it can pull out of the loop and then let the RA "fix" up the loop when it runs out of registers.
Subject: Re: [4.3 Regression] rtl loop invariant is broken > > I guess the cost of loading zero is cheaper then? > > Cheaper than loading 0xDEADBEEF but not cheap enough not to pull out of the > loop. All the "easy" (one instruction) constants are not pulled out. > > Really I think rtl loop invariant should pull everything it can pull out of the > loop and then let the RA "fix" up the loop when it runs out of registers. I would agree, if we had RA capable of that (which I am not quite sure whether we do or not, although this seems simple enough), or better, RA doing better job under high register pressure.
> I would agree, if we had RA capable of that (which I am not quite sure > whether we do or not, although this seems simple enough), or better, > RA doing better job under high register pressure. Then how do you explain why loop.c pulled this out of the loop but the new rtl loop invariant does not? I don't see why we should care about register pressure except inside the RA. Every other compiler does it that way, plus they work better than GCC.
Subject: Re: [4.3 Regression] rtl loop invariant is broken > > I would agree, if we had RA capable of that (which I am not quite sure > > whether we do or not, although this seems simple enough), or better, > > RA doing better job under high register pressure. > > Then how do you explain why loop.c pulled this out of the loop but the new rtl > loop invariant does not? I don't see why we should care about register > pressure except inside the RA. Every other compiler does it that way, plus > they work better than GCC. some citation (on both claims)? Of course, it is straightforward to remove the register pressure checks from the optimizers, so if you want, you can experiment with that.
Plus I just tested the 4.2 branch and it is broken there also, GRRRRRRRRRRRR. Sorry for finding this late in the release cycle but I guess nobody really looked at the code quality after loop.c was turned off and rtl loop invariant motion was turned on.
Re. comment #5 Andrew, stop spreading nonsense. Other compilers do take register pressure into account. In fact, they do so much better than GCC usually does. (In GCC you can't do it well because instructions are selected too late, so you don't know per-class register pressure). So your stupid and annoying crusade to litter every bug with an "let RA fix it" remark shoud stop, thank you very much.
This is still a 4.2 regression. 4.2 produces: .L2: li 0,100 stwx 0,9,3 addi 9,9,4 bdnz .L2 blr While 4.1 produces: .L2: stwx 0,9,3 addi 9,9,4 bdnz .L2
Just a small heuristics problem. This could be fixed by giving insns to move with comp_cost - size_cost == 0 in gain_for_invariant a small gain anyway. Or maybe comp_cost should overrule size_cost. Just as a demonstration of how trivial this is, consider this hack: Index: loop-invariant.c =================================================================== --- loop-invariant.c (revision 123222) +++ loop-invariant.c (working copy) @@ -1101,8 +1101,10 @@ find_invariants_to_move (void) } new_regs = 0; + + /* Find instructions to move. Nothing gained is nothing lost. */ while (best_gain_for_invariant (&inv, ®s_needed, - new_regs, regs_used, n_inv_uses) > 0) + new_regs, regs_used, n_inv_uses) >= 0) { set_move_mark (inv->invno); new_regs += regs_needed; Looks to me like a small heuristics tweak and some benchmarking would get this one out of the way with only a small effort.
And of course the hack doesn't actually work :-) But it's still just a heuristics tweak what we need here.
Zdenek, what is the size heuristic for anyway? For powerpc, the current heuristic tells the compiler not to move any simple moves: 1) An invariant with a gain of 0 will not be moved. 2) gain = comp_cost - size_cost 3) comp_cost of a simple move is 4 4) all powerpc instructions have a size_cost of 4 (At least, the insns that map to a single machine insns have a size_cost of 4.) So we tell the compiler never to move simple moves for powerpc :-(
Subject: Re: [4.2/4.3 Regression] rtl loop invariant is broken > Zdenek, what is the size heuristic for anyway? For powerpc, the current > heuristic tells the compiler not to move any simple moves: > > 1) An invariant with a gain of 0 will not be moved. > 2) gain = comp_cost - size_cost > 3) comp_cost of a simple move is 4 > 4) all powerpc instructions have a size_cost of 4 > > (At least, the insns that map to a single machine insns have a size_cost of 4.) size_cost and global_cost_for_size are somewhat missnamed. The purpose of global_cost_for_size is to estimate the effect of register pressure, and it currently works this way: 1) if the number of used registers is significantly smaller than the number of available registers, then the cost for each newly used register is target_small_cost (which probably turns out to be 4 on ppc) 2) if the number of used registers approaches the number of available registers, the cost for a newly used register increases more significantly (by target_pres_cost and later by target_spill_cost). The purpose of 1) should have been to prevent optimizations that do not gain anything and increase register pressure, but apparently this does not work quite well. Two possible solutions are decreasing target_small_cost (maybe setting it fixed to one), or even removing the penalty for increasing number of used registers in the case 1) completely.
I think we can all agree that this is an important bug, even if we're not all agreed about exactly how to fix it. This seems likely be to a frequent problem. Therefore, despite the fact this is "just" a missed optimization, I'm marking it P1.
Subject: Bug 31360 Author: rakdver Date: Tue Apr 17 17:42:29 2007 New Revision: 123919 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=123919 Log: PR rtl-optimization/31360 * cfgloopanal.c (target_small_cost, target_pres_cost): Removed. (target_reg_cost): New. (init_set_costs): Initialize target_reg_cost. Add comments regarding the rationale of the costs. (global_cost_for_size): Renamed to... (estimate_reg_pressure_cost): ... and simplify. Decrease importance of register pressure. * tree-ssa-loop-ivopts.c (ivopts_global_cost_for_size): Use estimate_reg_pressure_cost. Add number of ivs. (determine_set_costs): Dump target_reg_cost. * loop-invariant.c (gain_for_invariant): Use estimate_reg_pressure_cost. Removed n_inv_uses argument. (best_gain_for_invariant, find_invariants_to_move): Remove n_inv_uses. * cfgloop.h (target_small_cost, target_pres_cost): Removed. (target_reg_cost): Declare. (global_cost_for_size): Declaration removed. (estimate_reg_pressure_cost): Declare. * gcc.dg/loop-7.c: New test. Added: trunk/gcc/testsuite/gcc.dg/loop-7.c Modified: trunk/gcc/ChangeLog trunk/gcc/cfgloop.h trunk/gcc/cfgloopanal.c trunk/gcc/loop-invariant.c trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-loop-ivopts.c
Fixeth on ze trunk.
Zdenek, do you think this is appropriate for a backport to 4.2.0?
(In reply to comment #17) > Zdenek, do you think this is appropriate for a backport to 4.2.0? Well right now it causes a regression so I don't think it is appropriate until that regression is fixed :). See PR 31676 for the regression.
This patch is responsible for the code size regressions on CSiBE at -Os. It also causes a bootstrap failure on arm-netbsdelf2: genautomata is being miscompiled.
Re. comment #19: What "code size regression"? Your comment is too unspecific.
(In reply to comment #20) > Re. comment #19: What "code size regression"? Your comment is too unspecific. > The code size regression that shows up on CSiBE between r123918 and r123919 on ARM, mips and PPC (at least). Just build CSiBE at -Os with and without this patch. On ARM this is approx 0.3%, which is a significant move (given that a good improvement is normally about 0.01%).
Again, please be specific: Please identify the benchmark, the file, maybe even the function, and extra bonus for figuring out what causes the size to increase. I don't understand why you expect anyone to go figure out what is wrong, when you've obviously already built CSiBE and can do this yourself in a matter of minutes.
Also file a new bug for the code size problem and the wrong code issue and link them to this bug.
Will not be fixed in 4.2.0; retargeting at 4.2.1.
Any hope of having this fixed on the 4.2 branch too or should it be closed?
I guess its hard to find a fix that does not regress elsewhere, so I'm inclined to WONTFIX this on the 4.2 branch. As it's still P1 this needs RM approval though. Mark?
Richard G., by: "I guess its hard to find a fix that does not regress elsewhere, so I'm inclined to WONTFIX this on the 4.2 branch." are you referring to functional problems or to Richard E's comments regarding code size? The followup PR 31676 patch looks very small; couldn't we backport this patch plus that one to 4.2, if we wanted to do so? As for the code size regression, Richard E., have you had a chance to identify a specific CSiBE file that expands, so that Zdenek can look at that? Meanwhile, I would prefer to leave this regression open. I really do feel that it's a severe optimization regression. After all, this is basically a variant on bzero. Thanks, -- Mark
(In reply to comment #27) > As for the code size regression, Richard E., have you had a chance to identify > a specific CSiBE file that expands, so that Zdenek can look at that? > There were a number of files that regressed. PR 31849 is one example. Note, if this patch is applied to the branch, then the fix for 31848 will probably also be required.
Richard -- Thank you for refreshing my memory as to the code-size issue and putting the PR numbers in the audit trail here. -- Mark
Change target milestone to 4.2.3, as 4.2.2 has been released.
Any objection to closing this as WONTFIX now?
(In reply to comment #31) > Any objection to closing this as WONTFIX now? Fine with me since we (sony) is not going to use 4.2.
By silent consensus.
Reopening...
Fixed in the upcoming 4.3 series.
I think it's worth to note here the implications of the fix to PR31849.
If PR31849 is right, shouldn't this be reopened or marked as something other than fixed ?