Bug 25248

Summary: [4.1 Regression] 2.6.15-rc4 arch/powerpc/mm/hash_utils_64.c miscompiled
Product: gcc Reporter: olh
Component: middle-endAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: bergner, gcc-bugs, pinskia, rakdver, rguenth, schwab
Priority: P3 Keywords: patch, wrong-code
Version: 4.1.0   
Target Milestone: 4.1.0   
URL: http://gcc.gnu.org/ml/gcc-patches/2005-12/msg00454.html
Host: powerpc64-linux Target: powerpc64-linux
Build: powerpc64-linux Known to work:
Known to fail: Last reconfirmed: 2005-12-06 13:48:29
Attachments: PR25248.tar.bz2
PR25248-2.tar.bz2
PR25248-3.tar.bz2
Minimal test case.
pr25248.patch

Description olh 2005-12-04 01:32:12 UTC
current kernels do not boot, they hang after 'returning from prom_init'
gcc4.1 and mainline miscompile arch/powerpc/mm/hash_utils_64.c.
Taking the object file from a gcc4.0 compiled tree fixes booting.

gcc-mainline r108000 binutils-mainline -> fails
gcc-4_0-branch r107977 binutils-mainline -> works
gcc-4_1-branch r106530 (+patch r106693 from bug #24644) binutils-mainline 2005-11-05 -> fails
Comment 1 olh 2005-12-04 01:36:20 UTC
Created attachment 10400 [details]
PR25248.tar.bz2

buildscripts and preprocessed files.
Comment 2 Andrew Pinski 2005-12-04 02:19:51 UTC
As far as I can see, the tree level is fine.
Comment 3 olh 2005-12-04 16:43:54 UTC
an object file compiled with r102096 doesnt work either. 
Will try older ones.
Comment 4 olh 2005-12-04 18:17:54 UTC
Created attachment 10402 [details]
PR25248-2.tar.bz2

this change breaks it. I replaced only arch/powerpc/mm/hash_utils_64.o in my tests.

r99558 | dnovillo | 2005-05-11 04:24:44 +0200 (Wed, 11 May 2005) | 16 lines
* tree-optimize.c (init_tree_optimization_passes): Re-organize
optimization passes to do an initial batch of scalar cleanups.
Comment 5 Andrew Pinski 2005-12-04 19:24:35 UTC
(In reply to comment #4)
> Created an attachment (id=10402) [edit]
> PR25248-2.tar.bz2
> this change breaks it. I replaced only arch/powerpc/mm/hash_utils_64.o in my
> tests.

That would mean there is a latent bug somewhere.
Comment 6 olh 2005-12-04 20:58:31 UTC
Created attachment 10403 [details]
PR25248-3.tar.bz2

If someone can spot the bug, I cant.

Unified all asm labels to reduce diff noise.
The object file prevents booting also in a gcc 4.0 compiled kernel.
Comment 7 Giovanni Bajo 2005-12-04 23:17:04 UTC
Further bonus points if you can spot which function is miscompiled.
Comment 8 Peter Bergner 2005-12-06 01:57:16 UTC
We're miscompiling htab_bolt_mapping().  What do I win? ;-)
I have a userland test case which I'll attach once I've shunk it down a little.
Comment 9 Peter Bergner 2005-12-06 06:26:26 UTC
Created attachment 10414 [details]
Minimal test case.

Here's a minimal test case.  This works with older gcc's and without optimization on newer gcc's (.

bergner@vervain:~/PR25248> gcc3.3.3 -m64 -O1 hash_utils_64.c
bergner@vervain:~/PR25248> ./a.out
addr = 0x0000000000001000
bergner@vervain:~/PR25248> gcc4.1 -m64 -O0 hash_utils_64.c
bergner@vervain:~/PR25248> ./a.out
addr = 0x0000000000001000
bergner@vervain:~/PR25248> gcc41 -m64 -O1 hash_utils_64.c
bergner@vervain:~/PR25248> ./a.out
addr = 0x4000000000001000
Comment 10 olh 2005-12-06 08:38:09 UTC
Odd, -O1 doesnt fix it for me:

+CFLAGS_hash_utils_64.o      += -O1

gcc version 4.2.0 20051206 (experimental)
GNU ld version 2.16.91 20051206
Comment 11 olh 2005-12-06 08:44:17 UTC
same is true for a r99558 compiled object file, in a gcc40 kernel.
-O1 doesnt help there.
Comment 12 Richard Biener 2005-12-06 13:37:01 UTC
Note that in the reduced testcase we have

pr25248.c: In function ?htab_bolt_mapping?:
pr25248.c:15: warning: integer constant is too large for ?unsigned long? type
pr25248.c: In function ?main?:
pr25248.c:24: warning: integer constant is too large for ?unsigned long? type
pr25248.c:24: warning: large integer implicitly truncated to unsigned type

because on ppc64 long is 4 bytes.  Using long long fixes the problem.  Can someone confirm that this is also the problem with the original testcase (and close the bug as INVALID)?

Thanks.
Comment 13 Richard Biener 2005-12-06 13:39:12 UTC
hmhm, maybe I'm in a ppc32 chroot.
Comment 14 Peter Bergner 2005-12-06 13:40:03 UTC
sizeof(unsigned long) and sizeof(unsigned long long) are both 8 bytes on ppc64.

Olaf, -O1 isn't a workaround, it's the minimum optimization level that still shows the bug.  Trying some -fno-* options, using -fno-tree-loop-optimize makes the problem go away at least on the test case:

bergner@vervain:~/PR25248> gcc4.1 -m64 -O2 -fno-tree-loop-optimize hash_utils_64.c
bergner@vervain:~/PR25248> ./a.out
addr = 0x0000000000001000
Comment 15 Richard Biener 2005-12-06 13:43:42 UTC
So, CCing zdenek.  Btw., checking -floop-optimize2 may also be worth trying (setting up a rpoper chroot now).
Comment 16 Richard Biener 2005-12-06 13:48:29 UTC
Confirmed.  -fno-ivopts fixes it.
Comment 17 Richard Biener 2005-12-06 13:59:26 UTC
IVOPTs truncates pstartD.1866_10 down to int, which is wrong:

htab_bolt_mapping (pstartD.1866, cntD.1867)
{
  unsigned intD.3 D.1959;
  long unsigned intD.4 D.1960;
  long unsigned intD.4 ivtmp.50D.1954;
  unsigned intD.3 stepD.1871;
  long unsigned intD.4 iD.1870;
  intD.0 D.1881;
  long unsigned intD.4 D.1880;
  long unsigned intD.4 D.1879;
  unsigned intD.3 cond.12D.1878;
  intD.0 D.1877;
  intD.0 shift.11D.1876;
  unsigned intD.3 shift.10D.1875;

<bb 0>:
  shift.10D.1875_5 = shiftD.1865;
  shift.11D.1876_6 = (intD.0) shift.10D.1875_5;
  D.1877_7 = 1 << shift.11D.1876_6;
  stepD.1871_8 = (unsigned intD.3) D.1877_7;
  if (cntD.1867_11 != 0) goto <L9>; else goto <L4>;

<L9>:;
  ivtmp.50D.1954_21 = (long unsigned intD.4) stepD.1871_8;
  D.1959_23 = (unsigned intD.3) pstartD.1866_10;
  D.1960_26 = (long unsigned intD.4) D.1959_23;
  ivtmp.50D.1954_3 = D.1960_26 - 000000000;

  # ivtmp.50D.1954_1 = PHI <ivtmp.50D.1954_2(5), ivtmp.50D.1954_3(1)>;
  # iD.1870_28 = PHI <iD.1870_18(5), 0(1)>;
<L0>:;
  cond.12D.1878_15 = condD.1864;
  if (cond.12D.1878_15 != 0) goto <L1>; else goto <L2>;


before we have (cunroll):

<bb 0>:
  shift.10D.1875_5 = shiftD.1865;
  shift.11D.1876_6 = (intD.0) shift.10D.1875_5;
  D.1877_7 = 1 << shift.11D.1876_6;
  stepD.1871_8 = (unsigned intD.3) D.1877_7;
  if (cntD.1867_11 != 0) goto <L9>; else goto <L4>;

<L9>:;

  # iD.1870_28 = PHI <iD.1870_18(5), 0(1)>;
  # pstartD.1866_27 = PHI <pstartD.1866_17(5), pstartD.1866_10(1)>;
<L0>:;
  cond.12D.1878_15 = condD.1864;
  if (cond.12D.1878_15 != 0) goto <L1>; else goto <L2>;


Zdenek surely knows where this happens and how to best fix it.
Comment 18 olh 2005-12-06 14:26:56 UTC
Adding -fno-ivopts to CFLAGS for hash_utils_64.o fixes it for me.
Tested with gcc-4_1-branch r108104 and current binutils.
Comment 19 Zdenek Dvorak 2005-12-06 15:28:01 UTC
Seems like some problem in iv analysis:

ssa name pstart_17
  type long unsigned int
  base (long unsigned int) (unsigned int) D.1301_7 + (long unsigned int) (unsigned int) pstart_10
  step (long unsigned int) step_8
  is a biv
Comment 20 Zdenek Dvorak 2005-12-06 15:43:42 UTC
This patch should fix the problem:

Index: tree-scalar-evolution.c
===================================================================
*** tree-scalar-evolution.c     (revision 108078)
--- tree-scalar-evolution.c     (working copy)
*************** follow_ssa_edge_in_rhs (struct loop *loo
*** 1042,1047 ****
--- 1042,1048 ----
    t_bool res = t_false;
    tree rhs0, rhs1;
    tree type_rhs = TREE_TYPE (rhs);
+   tree evol;

    /* The RHS is one of the following cases:
       - an SSA_NAME,
*************** follow_ssa_edge_in_rhs (struct loop *loo
*** 1084,1097 ****
            {
              /* Match an assignment under the form:
                 "a = b + c".  */
              res = follow_ssa_edge
                (loop, SSA_NAME_DEF_STMT (rhs0), halting_phi,
!                evolution_of_loop, limit);

              if (res == t_true)
                *evolution_of_loop = add_to_evolution
                  (loop->num,
!                  chrec_convert (type_rhs, *evolution_of_loop, at_stmt),
                   PLUS_EXPR, rhs1);

              else if (res == t_false)
--- 1085,1099 ----
            {
              /* Match an assignment under the form:
                 "a = b + c".  */
+             evol = *evolution_of_loop;
              res = follow_ssa_edge
                (loop, SSA_NAME_DEF_STMT (rhs0), halting_phi,
!                &evol, limit);

              if (res == t_true)
                *evolution_of_loop = add_to_evolution
                  (loop->num,
!                  chrec_convert (type_rhs, evol, at_stmt),
                   PLUS_EXPR, rhs1);

              else if (res == t_false)
Comment 21 olh 2005-12-06 18:11:40 UTC
Created attachment 10420 [details]
pr25248.patch

This patch fixes it for me.
Let me attach it in a readable form.
Comment 22 Peter Bergner 2005-12-06 18:47:45 UTC
...and I can verify that the patch fixes the reduced test case.  Thanks!
Comment 23 Zdenek Dvorak 2005-12-06 20:33:08 UTC
Patch: http://gcc.gnu.org/ml/gcc-patches/2005-12/msg00454.html
Comment 24 Zdenek Dvorak 2005-12-08 09:34:30 UTC
Subject: Bug 25248

Author: rakdver
Date: Thu Dec  8 09:34:26 2005
New Revision: 108225

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=108225
Log:
	PR tree-optimization/25248
	* tree-scalar-evolution.c (follow_ssa_edge_in_rhs): Do not use
	evolution_of_loop from the failed attempt.  Remove handling
	of MULT_EXPR.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-scalar-evolution.c

Comment 25 Richard Biener 2005-12-08 15:11:13 UTC
Zdenek, can you please apply to the 4.1 branch, too?
Comment 26 Zdenek Dvorak 2005-12-08 15:44:24 UTC
Subject: Bug 25248

Author: rakdver
Date: Thu Dec  8 15:44:22 2005
New Revision: 108236

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=108236
Log:
	PR tree-optimization/25248
	* tree-scalar-evolution.c (follow_ssa_edge_in_rhs): Do not use
	evolution_of_loop from the failed attempt.  Remove handling of
	MULT_EXPR.


Modified:
    branches/gcc-4_1-branch/gcc/ChangeLog
    branches/gcc-4_1-branch/gcc/tree-scalar-evolution.c

Comment 27 Andrew Pinski 2005-12-08 15:54:51 UTC
Fixed.