Bug 33992 - [4.3 Regression] Miscompiles function with inlining, breaks profiledbootstrap
Summary: [4.3 Regression] Miscompiles function with inlining, breaks profiledbootstrap
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.3.0
: P1 critical
Target Milestone: 4.3.0
Assignee: Richard Biener
URL:
Keywords: build, wrong-code
: 34720 (view as bug list)
Depends on:
Blocks:
 
Reported: 2007-11-04 01:37 UTC by devurandom
Modified: 2008-02-15 13:39 UTC (History)
10 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.2.2
Known to fail:
Last reconfirmed: 2008-02-10 22:08:45


Attachments
build.log (121.43 KB, text/plain)
2007-11-04 02:02 UTC, devurandom
Details
build.log ICE (77.78 KB, text/plain)
2007-11-04 11:47 UTC, devurandom
Details
Executable test that shows the failure. (2.08 KB, text/plain)
2008-02-08 21:17 UTC, Uroš Bizjak
Details
Shorter and more evil testcase (1.13 KB, text/plain)
2008-02-09 21:04 UTC, Uroš Bizjak
Details
patch (786 bytes, patch)
2008-02-10 22:43 UTC, Richard Biener
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description devurandom 2007-11-04 01:37:31 UTC
Hello!

I am trying to build gcc-4.3 svn since a few days, using dirtyepic's Gentoo ebuild. I am always getting this error:

        /var/tmp/portage/sys-devel/gcc-4.3.0_pre20071028/work/build/./gcc/xgcc -shared-libgcc -B/var/tmp/portage/sys-devel/gcc-4.3.0_pre20071028/work/build/./gcc -nostdinc++ -L/var/tmp/portage/sys-devel/gcc-4.3.0_pre20071028/work/build/x86_64-pc-linux-gnu/libstdc++-v3/src -L/var/tmp/portage/sys-devel/gcc-4.3.0_pre20071028/work/build/x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -B/usr/x86_64-pc-linux-gnu/bin/ -B/usr/x86_64-pc-linux-gnu/lib/ -isystem /usr/x86_64-pc-linux-gnu/include -isystem /usr/x86_64-pc-linux-gnu/sys-include -Winvalid-pch -x c++-header -pipe -O2 -march=athlon64   -D_GNU_SOURCE -I/var/tmp/portage/sys-devel/gcc-4.3.0_pre20071028/work/build/x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu -I/var/tmp/portage/sys-devel/gcc-4.3.0_pre20071028/work/build/x86_64-pc-linux-gnu/libstdc++-v3/include -I/var/tmp/portage/sys-devel/gcc-4.3.0_pre20071028/work/gcc-4.3.0-20071028/libstdc++-v3/libsupc++ -O0 -g /var/tmp/portage/sys-devel/gcc-4.3.0_pre20071028/work/gcc-4.3.0-20071028/libstdc++-v3/include/precompiled/stdc++.h -o x86_64-pc-linux-gnu/bits/stdc++.h.gch/O0g.gch
In file included from /var/tmp/portage/sys-devel/gcc-4.3.0_pre20071028/work/gcc-4.3.0-20071028/libstdc++-v3/include/precompiled/stdc++.h:68:
/var/tmp/portage/sys-devel/gcc-4.3.0_pre20071028/work/build/x86_64-pc-linux-gnu/libstdc++-v3/include/limits:1047: error: stray ‘\275’ in program
/var/tmp/portage/sys-devel/gcc-4.3.0_pre20071028/work/build/x86_64-pc-linux-gnu/libstdc++-v3/include/limits:1079: error: stray ‘\230’ in program
/var/tmp/portage/sys-devel/gcc-4.3.0_pre20071028/work/build/x86_64-pc-linux-gnu/libstdc++-v3/include/limits:1104: error: stray ‘\232’ in program
make[4]: *** [x86_64-pc-linux-gnu/bits/stdc++.h.gch/O0g.gch] Error 1
make[4]: Leaving directory `/var/tmp/portage/sys-devel/gcc-4.3.0_pre20071028/work/build/x86_64-pc-linux-gnu/libstdc++-v3/include'

emerge --info, striped from probably useless Gentoo-stuff:
Portage 2.1.3.17 (default-linux/amd64/2007.0/desktop, gcc-4.2.2, glibc-2.6.1-r0, 2.6.23-gentoo x86_64)
=================================================================
System uname: 2.6.23-gentoo x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 5000+
Timestamp of tree: Sun, 04 Nov 2007 00:50:01 +0000
app-shells/bash:     3.2_p17-r1
dev-java/java-config: 1.3.7, 2.1.2-r1
dev-lang/python:     2.5.1-r3
dev-python/pycrypto: 2.0.1-r6
sys-apps/baselayout: 1.12.10-r5
sys-apps/sandbox:    1.2.18.1-r2
sys-devel/autoconf:  2.13, 2.61-r1
sys-devel/automake:  1.5, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10
sys-devel/binutils:  2.18-r1
sys-devel/gcc-config: 1.4.0-r4
sys-devel/libtool:   1.5.24
virtual/os-headers:  2.6.23
ACCEPT_KEYWORDS="amd64 ~amd64"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-pipe -O2 -march=athlon64"
CHOST="x86_64-pc-linux-gnu"
CXXFLAGS="-pipe -O2 -march=athlon64 -fvisibility-inlines-hidden"
LANG="en_GB.UTF-8"
LC_ALL="en_GB.UTF-8"
LDFLAGS="-Wl,--hash-style=gnu"
LINGUAS="de"
MAKEOPTS="-j3"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS

What could have caused the stray characters to appear in the file? Or why doesn't anyone else seem to nice them as an error?
Comment 1 devurandom 2007-11-04 01:38:49 UTC
(Make summary less cryptic)
Comment 2 devurandom 2007-11-04 02:02:30 UTC
Created attachment 14481 [details]
build.log
Comment 3 Andrew Pinski 2007-11-04 06:31:27 UTC
first don't use  -fvisibility-inlines-hidden, it changes the ABI.
Second  what happens if unset CFLAGS?
Comment 4 devurandom 2007-11-04 11:47:43 UTC
Created attachment 14482 [details]
build.log ICE

Building without setting CFLAGS or CXXFLAGS results in an ICE:
/var/tmp/portage/sys-devel/gcc-4.3.0_pre20071103/work/gcc-4.3.0-20071103/libiberty/hashtab.c: In function ‘htab_expand’:
/var/tmp/portage/sys-devel/gcc-4.3.0_pre20071103/work/gcc-4.3.0-20071103/libiberty/hashtab.c:554: internal compiler error: Segmentation fault
Comment 5 devurandom 2007-11-09 09:50:10 UTC
The problem (stray characters in include/limits) persists after I rebuilt the whole system to wipe out any leftovers from previous attempts at vectorisation and hidden inlines. Setting LANG and LC_ALL also didn't help. It seems as if the file itself is wrong...
Comment 6 Ryan Hill 2007-11-10 06:42:28 UTC
this error doesn't happen on x86 but i did reproduce it w/ make profiledbootstrap using today's trunk on x86_64.  if no one else is hitting this it points to a problem on our end, but i'll look further into it soon.
Comment 7 Ryan Hill 2007-11-10 15:56:53 UTC
looks like this only occurs with profiledbootstrap.  i'll test a vanilla build to see if this is Gentoo specific.
Comment 8 Ryan Hill 2007-11-11 02:42:14 UTC
reproduced with gcc-4.3-20071109 snapshot.

dirtyepic@kali ~/tmp/gccbuild $ ../gcc-4.3-20071109/configure --disable-nls --with-system-zlib --disable-checking --disable-werror --enable-secureplt --disable-libunwind-exceptions --enable-multilib --enable-languages=c,c++

dirtyepic@kali ~/tmp/gccbuild $ make profiledbootstrap -j1
Comment 9 Hanno Meyer-Thurow 2007-12-04 17:43:07 UTC
I wonder what goes wrong with building cpp at stage 3(?). :)

ana build # for f in $(find ./ -type f -name cpp); do ls -l $f; PATH=$(dirname $f) $f -dM /dev/null | grep __DBL_MAX__; done
-rwxr-xr-x 1 root root 279196  4. Dez 18:20 ./gcc/cpp
#define __DBL_MAX__ ½.3200111555710001e+310
-rwxr-xr-x 1 root root 258066  4. Dez 17:47 ./stage1-gcc/cpp
#define __DBL_MAX__ 1.7976931348623157e+308
-rwxr-xr-x 1 root root 389702  4. Dez 18:09 ./prev-gcc/cpp
#define __DBL_MAX__ 1.7976931348623157e+308

LDFLAGS="-Wl,-O1 -Wl,--hash-style=gnu -Wl,--as-needed -Wl,--sort-common"
STAGE1_CFLAGS="-O2"
BOOT_CFLAGS=" -O2 -mtune=nocona -march=nocona -pipe"


(Hint for bad defines from: http://forums.gentoo.org/viewtopic-p-4569014.html#4569014)
Comment 10 Hanno Meyer-Thurow 2007-12-05 13:01:21 UTC
Replacing -O2 with -O1 in BOOT_CFLAGS hides the error here. At least one can install gcc 4.3 again. :)

Gentoo users would edit toolchain.eclass:

--- /usr/portage/eclass/toolchain.eclass.orig	2007-12-05 13:58:52.000000000 +0100
+++ /usr/portage/eclass/toolchain.eclass	2007-12-05 13:58:59.000000000 +0100
@@ -1467,7 +1467,7 @@
 
 	# In general gcc does not like optimization, and add -O2 where
 	# it is safe.  This is especially true for gcc 3.3 + 3.4
-	replace-flags -O? -O2
+	replace-flags -O? -O1
 
 	# ... sure, why not?
 	strip-unsupported-flags
Comment 11 Stefaan De Roeck 2008-01-21 20:18:03 UTC
I can reproduce with gcc 4.3 snapshot 20080118
Comment 12 Ryan Hill 2008-02-04 20:30:38 UTC
no one cares that make profiledbootstrap on 64bit targets miscompiles the preprocessor when BOOT_CFLAGS="-O2" ?
Comment 13 Wolfgang Bangerth 2008-02-06 01:19:36 UTC
Re-confirmed here:
  http://gcc.gnu.org/ml/gcc/2008-02/msg00066.html
Comment 14 Richard Biener 2008-02-06 09:28:03 UTC
Err well.  Adjusted the summary to something more descriptive, marked as
a regression (4.2 did profiledbootstrap ok).
Comment 15 Richard Biener 2008-02-06 09:59:42 UTC
profiledbootstrap on x86_64 fails for me with

checking for x86_64-unknown-linux-gnu-gcc... /space/rguenther/obj/./gcc/xgcc -B/space/rguenther/obj/./gcc/ -B/usr/local/x86_64-unknown-linux-gnu/bin/ -B/usr/local/x86_64-unknown-linux-gnu/lib/ -isystem /usr/local/x86_64-unknown-linux-gnu/include -isystem /usr/local/x86_64-unknown-linux-gnu/sys-include
checking for suffix of object files... configure: error: cannot compute suffix of object files: cannot compile
See `config.log' for more details.
make[2]: *** [configure-stagefeedback-target-libgcc] Error 1
make[2]: Leaving directory `/space/rguenther/obj'
make[1]: *** [stagefeedback-bubble] Error 2
make[1]: Leaving directory `/space/rguenther/obj'
make: *** [profiledbootstrap] Error 2
obj> cat stage_current 
stagefeedback
obj> tail x86_64-unknown-linux-gnu/libgcc/config.log
configure:2588: /space/rguenther/obj/./gcc/xgcc -B/space/rguenther/obj/./gcc/ -B/usr/local/x86_64-unknown-linux-gnu/bin/ -B/usr/local/x86_64-unknown-linux-gnu/lib/ -isystem /usr/local/x86_64-unknown-linux-gnu/include -isystem /usr/local/x86_64-unknown-linux-gnu/sys-include -c -g -O2 -fprofile-use  conftest.c >&5
<built-in>:0: internal compiler error: in real_to_decimal, at real.c:1656
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.
configure:2591: $? = 1

where this ICE also occurs for empty conftest.c and if only preprocessing
using the stagefeedback cc1.

backtrace for just ./cc1 -quiet -E /dev/null

#0  fancy_abort (file=0xb530b2 "../../trunk/gcc/real.c", line=1656, 
    function=0xb53b20 "real_to_decimal") at ../../trunk/gcc/diagnostic.c:659
#1  0x000000000065d352 in real_to_decimal (str=0x7fff736bd500 "&#65533;&#65533;s&#65533;\177", 
    r_orig=<value optimized out>, buf_size=<value optimized out>, digits=17, 
    crop_trailing_zeros=0) at ../../trunk/gcc/real.c:1656
#2  0x0000000000471275 in builtin_define_with_hex_fp_value (
    macro=0x7fff736bd620 "__DBL_MAX__", type=<value optimized out>, digits=1, 
    hex_str=<value optimized out>, fp_suffix=0xb0b55b "", 
    fp_cast=0xb29315 "%s") at ../../trunk/gcc/c-cppbuiltin.c:859
#3  0x00000000004714c1 in builtin_define_float_constants (
    name_prefix=0xaff950 "DBL", fp_suffix=0xb0b55b "", fp_cast=0xb29315 "%s", 
    type=0x2b19379cf780) at ../../trunk/gcc/c-cppbuiltin.c:205
#4  0x0000000000471ab3 in c_cpp_builtins (pfile=0xecbf70)
    at ../../trunk/gcc/c-cppbuiltin.c:516
#5  0x0000000000467c9a in finish_options () at ../../trunk/gcc/c-opts.c:1508
#6  0x0000000000468069 in c_common_init () at ../../trunk/gcc/c-opts.c:1243
#7  0x000000000047341e in c_objc_common_init ()
    at ../../trunk/gcc/c-objc-common.c:72
#8  0x00000000006d0410 in toplev_main (argc=0, argv=<value optimized out>)
    at ../../trunk/gcc/toplev.c:2130
#9  0x00002b19377a4154 in __libc_start_main () from /lib64/libc.so.6
#10 0x0000000000403f49 in _start ()

in frame 3, the hex float looks good:

(gdb) print buf
$3 = "0x0.", 'f' <repeats 13 times>, "8p1024\000

so it must be indeed the real_to_decimal monster that is miscompiled with
-O2 -fprofile-use (given the 'correct' profile).
Comment 16 Richard Biener 2008-02-06 11:19:55 UTC
Just re-building real.o with removing the real.gc* files before fixes the bug,
so to trigger this bug you indeed need the profile-feedback.

-O2 -fbranch-probabilities -finline-functions is enough to trigger the failure,
-fno-tree-vrp -fno-tree-dominator-opts (that is, disabling jump threading) fixes
it.

It looks like rtd_divmod goes wrong, as it returns 141 for

(gdb) print *num
$90 = {cl = 1, decimal = 0, sign = 0, signalling = 0, canonical = 0, 
  uexp = 1024, sig = {0, 0, 18446744073709549568}}
(gdb) print *den
$91 = {cl = 1, decimal = 0, sign = 0, signalling = 0, canonical = 0, 
  uexp = 1017, sig = {3455452434845676643, 2207960133917416517, 
    16708042751414900190}}

called from

  /* At this point, PTEN should contain the nearest power of 10 smaller
     than R, such that this division produces the first digit.

     Using a divide-step primitive that returns the complete integral
     remainder avoids the rounding error that would be produced if
     we were to use do_divide here and then simply multiply by 10 for
     each subsequent digit.  */

  digit = rtd_divmod (&r, &pten);

though pten also looks funny here.
Comment 17 Richard Biener 2008-02-06 20:51:07 UTC
P2 - this should not block the release (it's not that profiledbootstrap was never
broken in released compilers).  It's also hard to analyze (no, I'm not on it,
volunteers welcome).
Comment 18 Uroš Bizjak 2008-02-07 20:47:57 UTC
(In reply to comment #17)
> P2 - this should not block the release (it's not that profiledbootstrap was
> never
> broken in released compilers).  It's also hard to analyze (no, I'm not on it,
> volunteers welcome).

It looks that ten_to_ptwo() returns wrong result for argumens >= 7.

Try this debug session:

< non-profiled (correct) cc1>:

(gdb) set args -E -quiet /dev/null
(gdb) break real_to_decimal
Breakpoint 3 at 0x599580: file ../../gcc-svn/trunk/gcc/real.c, line 1453.
(gdb) run
Breakpoint ...

(gdb) p *ten_to_ptwo (7)
$1 = {cl = 1, decimal = 0, sign = 0, signalling = 0, canonical = 0, 
  uexp = 426, sig = {159020156881263929, 14298703881791668535, 
    10644899600020376799}}
---

<profiled (wrong) cc1>:

<repeat all of the above commands>

(gdb) p *ten_to_ptwo (7)
$1 = {cl = 1, decimal = 0, sign = 0, signalling = 0, canonical = 0, 
  uexp = 423, sig = {5021769561911101971, 862341187440057448, 
    11372220470965069550}}

You can see that the results are different.

Suprisingly, result when passing 6 as an argument differs only for one LSB.

(correct):
(gdb) p *ten_to_ptwo (6)
$2 = {cl = 1, decimal = 0, sign = 0, signalling = 0, canonical = 0, 
  uexp = 213, sig = {5834422113351499776, 4377335499248575995, 
    14012984643248170709}}

(wrong):
(gdb) p *ten_to_ptwo (6)
$2 = {cl = 1, decimal = 0, sign = 0, signalling = 0, canonical = 0, 
  uexp = 213, sig = {5834422113351499776, 4377335499248575995, 
    14012984643248170708}}

Luckily, ten_to_ptwo() from real.c has only ~20 lines.
Comment 19 Uroš Bizjak 2008-02-08 14:21:50 UTC
The core of the problem is, that for profiled bootstrap, the call to normalize() from do_multiply() is simply gone. Gone in the sense, that normalize() is neither inlined at the call site, neither is called in do_multiply(). This leads to various observed strange effects when real numbers are processed.

	  for (k = j; k < SIGSZ * 2; k += 2)
	    {
	      unsigned long bi = b->sig[k / 2];
	      if (k & 1)
		bi >>= HOST_BITS_PER_LONG / 2;
	      else
		bi &= ((unsigned long)1 << (HOST_BITS_PER_LONG / 2)) - 1;

	      u.sig[k / 2] = ai * bi;
	    }

>>>>>	  normalize (&u);
	  inexact |= do_add (rr, rr, &u, 0);
	}
    }

Adding __attirbute__((noinline)) to protptype of normalize() is enough to fix profiled bootstrap:

Index: real.c
===================================================================
--- real.c      (revision 132182)
+++ real.c      (working copy)
@@ -98,7 +98,7 @@
 static void clear_significand_below (REAL_VALUE_TYPE *, unsigned int);
 static bool div_significands (REAL_VALUE_TYPE *, const REAL_VALUE_TYPE *,
                              const REAL_VALUE_TYPE *);
-static void normalize (REAL_VALUE_TYPE *);
+static void normalize (REAL_VALUE_TYPE *) __attribute__((noinline));
 
 static bool do_add (REAL_VALUE_TYPE *, const REAL_VALUE_TYPE *,
                    const REAL_VALUE_TYPE *, int);


So, what upsets gcc's inliner/profiler/whatever to drop the call?
Comment 20 Uroš Bizjak 2008-02-08 20:15:58 UTC
(In reply to comment #19)

> So, what upsets gcc's inliner/profiler/whatever to drop the call?

Correction, normalize() gets inlined together with lshift_significand(), but there is something wrong with inlined version. Adding attribute ((noinline)) only to lshift_significand() doesn't fix the failure.
Comment 21 Uroš Bizjak 2008-02-08 20:25:12 UTC
Following patch that forces inlining of normalize breaks normal bootstrap in exactly the same way, so it looks that there is something wrong with inliner.

Index: real.c
===================================================================
--- real.c      (revision 132182)
+++ real.c      (working copy)
@@ -98,7 +98,7 @@
 static void clear_significand_below (REAL_VALUE_TYPE *, unsigned int);
 static bool div_significands (REAL_VALUE_TYPE *, const REAL_VALUE_TYPE *,
                              const REAL_VALUE_TYPE *);
-static void normalize (REAL_VALUE_TYPE *);
+static inline void normalize (REAL_VALUE_TYPE *) __attribute__((always_inline));
 
 static bool do_add (REAL_VALUE_TYPE *, const REAL_VALUE_TYPE *,
                    const REAL_VALUE_TYPE *, int);


<built-in>:0: internal compiler error: in real_to_decimal, at real.c:1656<built-in>:0: internal compiler error: in real_to_decimal, at real.c:1656
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.
Comment 22 Uroš Bizjak 2008-02-08 21:17:33 UTC
Created attachment 15119 [details]
Executable test that shows the failure.

This is the test that shows the inliner failure:

[uros@localhost ~]$ gcc -O2 test.c
[uros@localhost ~]$ ./a.out
uexp=213
sig[0]=5834422113351499776
sig[1]=4377335499248575995
sig[2]=14012984643248170709

[uros@localhost ~]$ gcc -O2 -DINLINE test.c
[uros@localhost ~]$ ./a.out
uexp=212
sig[0]=11668844226702999552
sig[1]=1837141970856070134
sig[2]=9579225209565564328

The first result is correct, the second is not.
Comment 23 Richard Biener 2008-02-08 21:29:06 UTC
Now that we have a testcase.
Comment 24 Uroš Bizjak 2008-02-09 21:04:54 UTC
Created attachment 15125 [details]
Shorter and more evil testcase

This is much shorter testcase that basically consists of two functions, do_test() and do_add() in addition to a couple of realnum handling functions. 

The test however exposes more problems.

We still have:
[uros@localhost test]$ gcc -O2 t1.c
[uros@localhost test]$ ./a.out
[uros@localhost test]$ gcc -O2 -DINLINE t1.c
[uros@localhost test]$ ./a.out
Aborted

There is new problem exposed:

As can be seen in the testcase, there is a call to normalize() in do_add() function. Since we call do_add() with arg *a of rvc_zero and arg *b of rvc_normal class of numbers, this call is effectively a dead code.

However, when the call to normalize() is removed, the test always fail, no matter if normalize is forced to be inlined or not.
Comment 25 Uroš Bizjak 2008-02-09 21:06:04 UTC
Critical, IMO.
Comment 26 Uroš Bizjak 2008-02-10 08:23:01 UTC
Following is significantly minimized testcase:

--cut here--
#include <stdlib.h>
#include <stdio.h>

struct real_value
{
  unsigned long sig[3];
};


#ifdef INLINE
static inline void __attribute__((always_inline))
#else
static void __attribute__((noinline))
#endif
normalize (struct real_value *r)

{
  int j;

  for (j = 0; ; j++)
    if (r->sig[2] & ((unsigned long)1 << (63 - j)))
      break;

  printf ("%i\n", j);

  if (j)
    abort ();
}

static void __attribute__((noinline))
  do_test (struct real_value *r)
{
  int i;

  for (i = 0; i < 2; ++i)
    normalize (r);
}

int main()
{
  struct real_value r;

  r.sig[0] = 0ull;
  r.sig[1] = 0ull;
  r.sig[2] = 0x8000000000000001ull;
  
  do_test (&r);

  return 0;
}
--cut here--

[uros@localhost test]$ gcc -O2 -DINLINE t1.c
[uros@localhost test]$ ./a.out
63
Aborted
[uros@localhost test]$ gcc -O2  t1.c
[uros@localhost test]$ ./a.out
0
0

Note, that it looks like MSB is not detected in normalize().
Comment 27 Uroš Bizjak 2008-02-10 10:44:39 UTC
Minimum testcase, fails also for 32bits:

--cut here--
extern void abort ();

void __attribute__((noinline))
bar (unsigned long long i)
{
  if (i)
    abort ();
}

void __attribute__((always_inline))
foo (unsigned long long *r)

{
  int i;

  for (i = 0; ; i++)
    if (*r & ((unsigned long long)1 << (63 - i)))
      break;

  bar (i);
}

void __attribute__((noinline))
do_test (unsigned long long *r)
{
  int i;

  for (i = 0; i < 2; ++i)
    foo (r);
}

int main()
{
  unsigned long long r = 0x8000000000000001ull;

  do_test (&r);
  return 0;
}
--cut here--

[uros@localhost test]$ gcc -O1 t1.c && ./a.out
[uros@localhost test]$ gcc -O2 t1.c && ./a.out
Aborted
Comment 28 Uroš Bizjak 2008-02-10 11:10:43 UTC
This testcase can be used to analyze the failure:

--cut here--
extern void abort ();

void __attribute__((noinline))
bar (unsigned long long i)
{
  if (i)
    abort ();
}

#ifdef INLINE
void __attribute__((always_inline))
#else
void __attribute__((noinline))
#endif
foo (unsigned long long *r)
{
  int i;

  for (i = 0; ; i++)
    if (*r & ((unsigned long long)1 << (63 - i)))
      break;

  bar (i);
}

void __attribute__((noinline))
do_test (unsigned long long *r)
{
  int i;

  for (i = 0; i < 2; ++i)
    foo (r);
}

int main()
{
  unsigned long long r = 0x8000000000000001ull;

  do_test (&r);
  return 0;
}
--cut here--

[uros@localhost test]$ gcc -O2 t1.c && ./a.out
[uros@localhost test]$ gcc -O2 -DINLINE t1.c && ./a.out
Aborted
Comment 29 Uroš Bizjak 2008-02-10 12:27:51 UTC
The problem is in forwprop4 pass:

When testcase from #28 is compiled with "-O2 -DINLINE", forwprop4 pass transforms:

--cut here--
;; Function do_test (do_test)

do_test (r)
{
  long long unsigned int shifttmp.49;
  long long unsigned int D.1576;
  ...

<bb 6>:
  # i_19 = PHI <0(2), i_4(5)>
  D.1576_13 = *r_3(D);
  shifttmp.49_16 = D.1576_13 & 0x08000000000000000;
  if (shifttmp.49_16 != 0)
    goto <bb 5>;
  else
    goto <bb 3>;

...
}
--cut here--

to 

--cut here--
;; Function do_test (do_test)

  Replaced 'shifttmp.49_16 != 0' with '0'                (-> !!!!!!!)
do_test (r)
{
  long long unsigned int D.1576;
  ...

<bb 6>:
  # i_19 = PHI <0(2), i_4(5)>
  D.1576_13 = *r_3(D);
  goto <bb 3>;

  ...
}
--cut here--

The replacement is wrong.
Comment 30 Uroš Bizjak 2008-02-10 12:31:02 UTC
tree-optimization
Comment 31 Uroš Bizjak 2008-02-10 13:45:55 UTC
Patch in testing:

Index: tree-ssa-forwprop.c
===================================================================
--- tree-ssa-forwprop.c (revision 132202)
+++ tree-ssa-forwprop.c (working copy)
@@ -376,7 +376,7 @@ forward_propagate_into_cond (tree cond_e
            tree op1 = TREE_OPERAND (cond, 1);
            rhs0 = GIMPLE_STMT_OPERAND (def_stmt, 1);
            tmp = combine_cond_expr_cond (TREE_CODE (cond), boolean_type_node,
-                                         fold_convert (TREE_TYPE (op1), rhs0),
+                                         fold_convert (TREE_TYPE (rhs0), rhs0),
                                          op1, !single_use0_p);
          }
        /* If that wasn't successful, try the second operand.  */
@@ -393,7 +393,7 @@ forward_propagate_into_cond (tree cond_e
            rhs1 = GIMPLE_STMT_OPERAND (def_stmt, 1);
            tmp = combine_cond_expr_cond (TREE_CODE (cond), boolean_type_node,
                                          op0,
-                                         fold_convert (TREE_TYPE (op0), rhs1),
+                                         fold_convert (TREE_TYPE (rhs1), rhs1),
                                          !single_use1_p);
          }
        /* If that wasn't successful either, try both operands.  */
@@ -402,7 +402,7 @@ forward_propagate_into_cond (tree cond_e
            && rhs1 != NULL_TREE)
          tmp = combine_cond_expr_cond (TREE_CODE (cond), boolean_type_node,
                                        rhs0,
-                                       fold_convert (TREE_TYPE (rhs0), rhs1),
+                                       fold_convert (TREE_TYPE (rhs1), rhs1),
                                        !(single_use0_p && single_use1_p));
       }
     else if (TREE_CODE (cond) == SSA_NAME)
Comment 32 Andrew Pinski 2008-02-10 13:49:07 UTC
-                                         fold_convert (TREE_TYPE (op1), rhs0),
+                                         fold_convert (TREE_TYPE (rhs0),
rhs0),
                                          op1, !single_use0_p);

Wait, you don't need the fold_convert here then.  Since rhs0 is already be TREE_TYPE (rhs0) :).

Seriously this needs more thinking that patch.  Since after your patch, we have a type mismatch.
Comment 33 Uroš Bizjak 2008-02-10 14:31:33 UTC
(In reply to comment #32)

> Seriously this needs more thinking that patch.  Since after your patch, we have
> a type mismatch.

I will leave this problem to tree expert then.

(BTW: There is another fold_convert in forward_propagate_comparison(), tree-ssa-forwprop.c, line 788, that IMO has type mismatch.
Comment 34 Uroš Bizjak 2008-02-10 14:32:54 UTC
BTW: Although patch from Comment #31 is probably not correct, it fixes profiledbootstrap.
Comment 35 Richard Biener 2008-02-10 22:08:45 UTC
The source is very likely a type mismatch in the IL (the foldings are correct,
but also strictly unneccessary).  In principle it goes as follows.  We have
a comparison

  op0 OP op1

where op0 and op1 are required to be trivially convertible to each other.  Now
we look up the definition of any of op0 or op1, where their RHS are required
to be trivially convertible to their LHS and so the RHSes are trivially
convertible to each other.  Thus we don't need the conversion - for the
final IL that is - but since fold likes to have more precise types we
convert either operand to the type of the other before calling fold.

So far goes the theory - of course it all breaks down if one of the above
assumption does not hold.

I will have a look.
Comment 36 Richard Biener 2008-02-10 22:36:28 UTC
Actually the wrong types are caused by loop IM which replaces the bittest

  D.1210_17 = D.1208_13 >> 63;
  D.1211_2 = (int) D.1210_17;
  if (D.1211_1 != 0)

by

  shifttmp.48_12 = 0x8000000000000000;
  shifttmp.48_16 = shifttmp.48_12 & D.1208_13;
  if (shifttmp.48_16 != 0)

but forgets to update the type of the zero.  Testing the patch.
Comment 37 Richard Biener 2008-02-10 22:43:26 UTC
Created attachment 15129 [details]
patch

I'm bootstrapping and regtesting it (it fixes the testcase).  Maybe someone
can do a profiledbootstrap with it as well (otherwise I will do tomorrow).
Comment 38 Ryan Hill 2008-02-11 01:34:40 UTC
yep, profiledbootstrap w/ BOOT_CFLAGS="-O2" on x86_64 is working now.  i'll try to get it tested on ppc64 to see if it fixes PR34720 too.
Comment 39 Richard Biener 2008-02-11 08:27:45 UTC
Subject: Bug 33992

Author: rguenth
Date: Mon Feb 11 08:27:00 2008
New Revision: 132234

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=132234
Log:
2008-02-11  Uros Bizjak  <ubizjak@gmail.com>
	Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/33992
	* tree-ssa-loop-im.c (rewrite_bittest): Fixup the type of
	the zero we compare against.

	* gcc.c-torture/execute/pr33992.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.c-torture/execute/pr33992.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-loop-im.c

Comment 40 Uroš Bizjak 2008-02-11 08:59:29 UTC
Richi, thanks for the fix.

Fixed.
Comment 41 Uroš Bizjak 2008-02-15 13:39:44 UTC
*** Bug 34720 has been marked as a duplicate of this bug. ***