[regehr@gamow tmp436]$ current-gcc -O1 small.c -o small [regehr@gamow tmp436]$ ./small 0 [regehr@gamow tmp436]$ current-gcc -O2 small.c -o small [regehr@gamow tmp436]$ ./small 1 [regehr@gamow tmp436]$ current-gcc -v Using built-in specs. COLLECT_GCC=current-gcc COLLECT_LTO_WRAPPER=/uusoc/exports/scratch/regehr/z/compiler-install/gcc-r168380-install/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../configure --with-libelf=/usr/local --enable-lto --prefix=/home/regehr/z/compiler-install/gcc-r168380-install --program-prefix=r168380- --enable-languages=c,c++ Thread model: posix gcc version 4.6.0 20101231 (experimental) (GCC) [regehr@gamow tmp436]$ cat small.c int printf(const char *format, ...); static unsigned div(unsigned p1, unsigned p2) { return p1 / p2; } static unsigned rshift (unsigned p1, unsigned p2) { if (p2 >= 32) return p1; else return p1 >> p2; } static unsigned g_26 = 1; static unsigned g_78 = 1; static void func_28 (unsigned char p_30, unsigned p_31) { if (p_31) { unsigned l1 = div (0x7000, p_30-2) ^ g_26; unsigned l2 = rshift (g_26, g_26); unsigned l4 = g_26 - l2; g_78 &= (l4 && (g_26-1)) + l1; } } static void func_2 (void) { func_28 (1, 1); } int main(void) { func_2(); func_28 (-1, 1); printf("%d\n", g_78); return 0; }
/* PR tree-optimization/47148 */ static inline unsigned bar (unsigned x, unsigned y) { if (y >= 32) return x; else return x >> y; } static unsigned a = 1, b = 1; static inline void foo (unsigned char x, unsigned y) { if (!y) return; unsigned c = (0x7000U / (x - 2)) ^ a; unsigned d = bar (a, a); b &= ((a - d) && (a - 1)) + c; } int main (void) { foo (1, 1); foo (-1, 1); if (b && ((unsigned char) -1) == 255) __builtin_abort (); return 0; } Indeed, seems to be caused by partial inlining. With -fno-partial-inlining it works fine.
It is caused by revision 161433: http://gcc.gnu.org/ml/gcc-cvs/2010-06/msg01351.html
It seems this is very much related to the PR46942 ABI screw up. Apparently sometimes on x86_64 we actually rely on the sign/zero extensions done by the caller, not to DImode, but just to SImode, not by setting SUBREG_PROMOTED_P bit in the subregs, but at least in combine.c's setup_incoming_promotions and thus the zero extension is optimized away. Normally, e.g. when compiling static unsigned a = 1, b = 1; static __attribute__((noinline)) void foo (unsigned char x) { unsigned c = (0x7000U / (x - 2)) ^ 1; b &= c; } int main (void) { foo (1); foo (-1); if (b && ((unsigned char) -1) == 255) __builtin_abort (); return 0; } the caller indeed does the needed promotions, as CALL_EXPR's argument has int type rather than unsigned char. But when calling the artificial foo.part.0, the argument passed to it is unsigned char 255 rather than int 255 and it sets a QImode %rdi register to -1 (i.e. 255) instead of setting SImode %rdi register to 255, which means it is incorrectly sign extended instead of zero extended.
(In reply to comment #3) > the caller indeed does the needed promotions, as CALL_EXPR's argument has int > type rather than unsigned char. But when calling the artificial foo.part.0, > the > argument passed to it is unsigned char 255 rather than int 255 and it sets a > QImode %rdi register to -1 (i.e. 255) instead of setting SImode %rdi register > to 255, which means it is incorrectly sign extended instead of zero extended. I proposed to update x86-64 psABI to --- When a value of type _Bool is returned in a register, bit 0 contains the truth value and bits 1 to 7 shall be zero. When an argument of type _Bool is passed in a register or on the stack, bit 0 contains the truth value and bits 1 to 31 shall be zero. When a value of type signed/unsigned char or short is returned in a register, bits 0 to 7 for char and bits 0 to 15 for short contain the value and other bits are left unspecified. When an argument of signed/unsigned type char or short is passed in a register or on the stack, it shall be sign/zero extended to signed/unsigned int. --- http://gcc.gnu.org/ml/gcc/2010-12/msg00525.html
Maybe it is fnsplit fault, it might want to look at DECL_ARG_TYPE of the PARM_DECL in question (partial inlining means it always has a fndecl, so can look at its DECL_ARGUMENTS to find that type) and use that type instead of the type in TYPE_ARG_TYPES.
Created attachment 22879 [details] gcc46-pr47148.patch Untested fix on the fnsplit side.
Author: jakub Date: Mon Jan 3 21:10:31 2011 New Revision: 168441 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=168441 Log: PR tree-optimization/47148 * ipa-split.c (split_function): Convert arguments to DECL_ARG_TYPE if possible. * gcc.c-torture/execute/pr47148.c: New test. Added: trunk/gcc/testsuite/gcc.c-torture/execute/pr47148.c Modified: trunk/gcc/ChangeLog trunk/gcc/ipa-split.c trunk/gcc/testsuite/ChangeLog
Fixed.