Bug 47614 - [4.6 Regression] cpu2000 benchmark 301.apsi fails with revision 169782
[4.6 Regression] cpu2000 benchmark 301.apsi fails with revision 169782
Status: RESOLVED FIXED
Product: gcc
Classification: Unclassified
Component: rtl-optimization
4.6.0
: P1 normal
: 4.6.0
Assigned To: Not yet assigned to anyone
: wrong-code
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-02-04 22:34 UTC by Pat Haugen
Modified: 2011-02-11 22:01 UTC (History)
7 users (show)

See Also:
Host: powerpc64-linux
Target: powerpc64-linux
Build: powerpc64-linux
Known to work:
Known to fail:
Last reconfirmed:


Attachments
testcase (427 bytes, text/plain)
2011-02-04 22:34 UTC, Pat Haugen
Details
smaller testcase (317 bytes, text/plain)
2011-02-07 21:18 UTC, Pat Haugen
Details
gcc46-pr47614.patch (1.25 KB, patch)
2011-02-09 18:30 UTC, Jakub Jelinek
Details | Diff
gcc46-pr47614-2.patch (417 bytes, patch)
2011-02-09 18:31 UTC, Jakub Jelinek
Details | Diff
gcc46-pr47614-2.patch (696 bytes, patch)
2011-02-09 18:48 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Pat Haugen 2011-02-04 22:34:25 UTC
Created attachment 23248 [details]
testcase
Comment 1 Pat Haugen 2011-02-04 22:47:57 UTC
Premature bug creation before...

301.apsi started failing on PowerPC with r169782. Attatched testcase was compiled with 'gfortran -S -m64 -mcpu=power7 -O3 -funroll-loops -ffast-math'. Following is good/bad snippet of the assembler.

r169781:

        xsmaxdp 0,0,11
        ble 7,.L2
        mr 10,5
        addi 0,7,-2
        lfdu 5,8(10)
        rlwinm 8,0,0,29,31
        li 9,1
        xscmpudp 0,0,5
        ble 0,.L2


r169782:
        xsmaxdp 0,0,11
        ble 7,.L2
        xscmpudp 0,0,11
        addi 0,7,-2
        mr 10,5
        rlwinm 8,0,0,29,31
        li 9,1
        ble 0,.L2

Note in the new version that the lfdu instruction is gone. This doesn't affect the compare since the memory value in f11 is still valid, the problem is that elminating the lfdu instruction elminates the increment of r10 which causes problems later on when it is used in subsequent loads (and is pointing at the wrong location).
Comment 2 Richard Biener 2011-02-07 11:54:13 UTC
Btw, not really a patch that was appropriate for this stage ...
Comment 3 Eric Botcazou 2011-02-07 12:35:06 UTC
> Btw, not really a patch that was appropriate for this stage ...

Are you the same Richard Guenther who wrote comment #29 under PR 43494? :-)
That being said, feel free to overrule and declare this 4.7 material.
Comment 4 Richard Biener 2011-02-07 12:49:56 UTC
(In reply to comment #3)
> > Btw, not really a patch that was appropriate for this stage ...
> 
> Are you the same Richard Guenther who wrote comment #29 under PR 43494? :-)
> That being said, feel free to overrule and declare this 4.7 material.

"I suppose pinging it on the ML is better than pinging it here."

is a general comment.  I didn't even look at the patch when saying that ;)

A brief look suggests the patch may only change var-tracking behavior, but
it seems that is not the case.
Comment 5 Eric Botcazou 2011-02-07 18:21:23 UTC
> A brief look suggests the patch may only change var-tracking behavior, but
> it seems that is not the case.

Quite very brief then. :-)  PR 43494 was a wrong code regression and Alexandre's patch fixed it, that's why I looked at it in the first place after seeing the double ping.
Comment 6 Jakub Jelinek 2011-02-07 19:13:28 UTC
It doesn't affect var-tracking at all BTW, because var-tracking deals with pre/post_inc/dec/modify manually by changing the insns passed to cselib temporarily.  It was originally written with the intent to be used in var-tracking though.
Comment 7 Pat Haugen 2011-02-07 21:18:35 UTC
Created attachment 23270 [details]
smaller testcase

A slightly smaller testcase.

Looking at the dumps, postreload is deleting the following insn in r169782.

(insn 95 392 89 3 (set (reg:DF 45 13 [orig:191 MEM[(real(kind=8)[0:] *)D.1405_52] ] [191])
        (mem:DF (pre_inc:DI (reg:DI 10 10 [orig:216 ivtmp.29 ] [216])) [2 MEM[(real(kind=8)[0:] *)D.1405_52]+0 S8 A64])) fail.f:9 392 {*movdf_hardfloat64}
     (expr_list:REG_INC (reg:DI 10 10 [orig:216 ivtmp.29 ] [216])
        (nil)))
Comment 8 Jakub Jelinek 2011-02-08 14:52:25 UTC
I can't reproduce this with a cross compiler unfortunately, but perhaps using check_for_inc_dec (currently static in dse.c) when deleting insn where needed in postreload.c might fix that.
Comment 9 Pat Haugen 2011-02-08 21:08:28 UTC
Thanks Jakub, check_for_inc_dec() does indeed fix the issue as desired (still elminates the redundant load, but keeps the increment). I'll fire off bootstrap/regtest.
Comment 10 Jakub Jelinek 2011-02-09 18:30:29 UTC
Created attachment 23287 [details]
gcc46-pr47614.patch

Actually, I can reproduce it, I've just been looking for pre_modify instead of
pre_inc that is actually removed.

So, either we use check_for_inc_dec there...
Comment 11 Jakub Jelinek 2011-02-09 18:31:51 UTC
Created attachment 23288 [details]
gcc46-pr47614-2.patch

Or simply reject side effects, similarly how we reject them elsewhere in postreload.
Comment 12 Eric Botcazou 2011-02-09 18:42:01 UTC
> Or simply reject side effects, similarly how we reject them elsewhere in
> postreload.

Yes, I think that's good enough for now.
Comment 13 Jakub Jelinek 2011-02-09 18:48:46 UTC
Created attachment 23289 [details]
gcc46-pr47614-2.patch

Actually, the second patch is not enough, we don't delete the insn in that case, but still reload_cse_simplify_operands optimizes the MEM into a register, ignoring its side-effects.
I think this should work.  Alternatively, if we go the first patch route, I think side_effects_p guard in reload_cse_simplify_operands is still desirable.
Comment 14 Pat Haugen 2011-02-09 21:03:31 UTC
(In reply to comment #13)
> Alternatively, if we go the first patch route, I
> think side_effects_p guard in reload_cse_simplify_operands is still desirable.

My bootstrap/regtest (of essentially the first patch above) went fine, but I agree that the addition of side_effects_p guard in reload_cse_simplify_operands is probably the right thing to do too.  I like the first patch over the second since we are able to replace the redundant load with a simple increment.

I'll bootstrap/regtest the additional change.
Comment 15 Jakub Jelinek 2011-02-09 21:10:23 UTC
Please move the side_effects_p check in reload_cse_simplify_operands right before the cselib_lookup call, there is no point calling cselib_lookup if it has side effects.
BTW, on powerpc64-linux Alex's incdec patch apparently broke profiledbootstrap,
cc1 built with -fprofile-use keeps segfaulting on most of libgcc compilations.

A patch similar to #c13 (just with if (side_effects_p (op)) continue; moved
in front of v = cselib_lookup (...);) fixed the profiledbootstrap apparently
(but guess a patch using check_for_inc_dec + the *_operands change will do the same).  I agree that in this case it can generate better code to delete the redundant insn and just keep the side effects.
Comment 16 Pat Haugen 2011-02-11 20:52:58 UTC
Author: pthaugen
Date: Fri Feb 11 20:52:55 2011
New Revision: 170059

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=170059
Log:
	PR rtl-optimization/47614
	* rtl.h (check_for_inc_dec): Declare.
	* dse.c (check_for_inc_dec): Externalize...
	* postreload.c (reload_cse_simplify): ...use it before deleting stmt.
	(reload_cse_simplify_operands): Don't simplify opnds with side effects.

	* testsuite/gfortran.dg/pr47614.f: New.


Added:
    trunk/gcc/testsuite/gfortran.dg/pr47614.f
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/dse.c
    trunk/gcc/postreload.c
    trunk/gcc/rtl.h
    trunk/gcc/testsuite/ChangeLog
Comment 17 Pat Haugen 2011-02-11 22:01:24 UTC
Fixed.