I _think_ I understand C strict aliasing - it's based on the type of expressions. In the testcase below, the type of expressions is the same, which is why I think it's a compiler bug. The bug is reproducible with gcc 4.1.0 on multiple platforms. I've tried i386 (i686) and alpha myself and am able to reproduce the problem on both. The original reporter of the problem with my program (the real one, not the testcase) was able to reproduce the problem on other architectures. The problem does not occur with gcc 2.95.3 and 3.4.5. Basically, in the program below, gcc 4.1.0 would pre-load BF_current.P[0] and BF_current.P[17] into registers or stack locations - however the loop modifies P[0] via ptr. #include <stdio.h> #include <string.h> #define BF_N 16 typedef unsigned int BF_word; typedef BF_word BF_key[BF_N + 2]; static struct { BF_word S[4][0x100]; BF_key P; } BF_current; #define BF_ROUND(L, R, N) \ tmp1 = L & 0xFF; \ tmp2 = L >> 8; \ tmp2 &= 0xFF; \ tmp3 = L >> 16; \ tmp3 &= 0xFF; \ tmp4 = L >> 24; \ tmp1 = BF_current.S[3][tmp1]; \ tmp2 = BF_current.S[2][tmp2]; \ tmp3 = BF_current.S[1][tmp3]; \ tmp3 += BF_current.S[0][tmp4]; \ tmp3 ^= tmp2; \ R ^= BF_current.P[N + 1]; \ tmp3 += tmp1; \ R ^= tmp3; #define BF_ENCRYPT \ L ^= BF_current.P[0]; \ { \ int i; \ for (i = 0; i < BF_N; i += 2) { \ BF_ROUND(L, R, i); \ BF_ROUND(R, L, i+1); \ } \ } \ tmp4 = R; \ R = L; \ L = tmp4 ^ BF_current.P[BF_N + 1]; int main(void) { BF_word L, R; BF_word tmp1, tmp2, tmp3, tmp4, *ptr; BF_word i, j; for (i = 0; i < 4; i++) for (j = 0; j < 0x100; j++) BF_current.S[i][j] = (i + 0x12345678) * j; for (i = 0; i < BF_N + 2; i++) BF_current.P[i] = i * 0x98765432; L = R = 0; ptr = BF_current.P; do { #ifndef WORKAROUND ptr += 2; BF_ENCRYPT; *(ptr - 2) = L; *(ptr - 1) = R; #else BF_ENCRYPT; *ptr = L; *(ptr + 1) = R; ptr += 2; #endif } while (ptr < &BF_current.P[BF_N + 2]); printf("%08x %08x\n", L, R); return 0; } host!user:~$ gcc gcc-4.1.0-aliasing-bug.c -o gcc-4.1.0-aliasing-bug -Wall -O2 -fno-strict-aliasing host!user:~$ ./gcc-4.1.0-aliasing-bug 8261e2e6 f1f1bc33 host!user:~$ gcc gcc-4.1.0-aliasing-bug.c -o gcc-4.1.0-aliasing-bug -Wall -O2 host!user:~$ ./gcc-4.1.0-aliasing-bug 46be060b df072f90 host!user:~$ gcc gcc-4.1.0-aliasing-bug.c -o gcc-4.1.0-aliasing-bug -Wall -O2 -DWORKAROUND host!user:~$ ./gcc-4.1.0-aliasing-bug 8261e2e6 f1f1bc33 host!user:~$ uname -mrs Linux 2.4.32-ow1 alpha host!user:~$ gcc -v Using built-in specs. Target: alphaev56-unknown-linux-gnu Configured with: ./configure --prefix=/home/user/gcc-4.1.0 Thread model: posix gcc version 4.1.0 (i386 gives the same results)
Created attachment 10979 [details] testcase
Hmm -O2 -fno-ivopts allows for it to work.
This is interesting because the mainline works with or without -fno-strict-aliasing. Confirmed a regression, hopefully someone will reduce the testcase further.
Reduced testcase: extern int printf(const char *format, ...); typedef unsigned int BF_word; typedef BF_word BF_key[16 + 2]; static struct { BF_key P; } BF_current; int main(void) { BF_word L, R; BF_word tmp1, tmp2, tmp3, tmp4, *ptr; L = R = 0; ptr = BF_current.P; do { int i; ptr += 2; L ^= BF_current.P[0]; for (i = 0; i < 16; i += 2) { tmp1 = L & 0xFF; tmp3 += tmp1; L ^= tmp3;; } *(ptr - 2) = L; } while (ptr < &BF_current.P[16 + 2]); printf("%08x %08x\n", L, R); return 0; } -O2 -fno-strict-aliasing: 0001e6d4 00000000 -O2: 0001c100 00000000 -fno-ivopts does not fix it.
Umm, that has uninitialized vars. The following is ok wrt -Wall -Werror: extern int printf(const char *, ...); typedef unsigned int BF_word; typedef BF_word BF_key[16 + 2]; static struct { BF_key P; } BF_current; int main(void) { BF_word L; BF_word tmp4, *ptr; BF_word i; for (i = 0; i < 16 + 2; i++) BF_current.P[i] = i * 0x98765432; L = 0; ptr = BF_current.P; do { ptr += 2; L ^= BF_current.P[0]; tmp4 = L >> 24; L = tmp4 ^ BF_current.P[16 + 1]; *(ptr - 2) = L; } while (ptr < &BF_current.P[16 + 2]); printf("%08x\n", L); return 0; } and prints 1fdb9752 vs. 1fdb974d.
With -fno-ivopts -fno-tree-vrp tree-optimizers produce the same for both 4.1.0 and 4.2.0 until tree loop-invariant-motion, which does the wrong transformation because of bogus alias information: <L11>:; # TMT.6_23 = PHI <TMT.6_24(3), TMT.6_25(5)>; # ptr_3 = PHI <&BF_current.P[0](3), ptr_9(5)>; # L_2 = PHI <0(3), L_14(5)>; <L7>:; ptr_9 = ptr_3 + 8B; # VUSE <BF_current_21>; D.1297_10 = BF_current.P[0]; L_11 = L_2 ^ D.1297_10; tmp4_12 = L_11 >> 24; # VUSE <BF_current_21>; D.1298_13 = BF_current.P[17]; L_14 = tmp4_12 ^ D.1298_13; D.1299_15 = ptr_9 - 8B; # TMT.6_25 = V_MAY_DEF <TMT.6_23>; *D.1299_15 = L_14; if (ptr_9 < &BF_current.P[18]) goto <L12>; else goto <L4>; the store to *D.1299_15 kills BF_current_21, which we don't notice.
The problem is latent on the mainline with -fno-tree-salias.
With -O2 -fno-ivopts -fno-tree-vrp -fno-tree-salias -fno-strict-aliasing we correctly get <L11>:; # BF_current_23 = PHI <BF_current_21(5), BF_current_24(7)>; # ptr_3 = PHI <&BF_current.P[0](5), ptr_9(7)>; # L_2 = PHI <0(5), L_14(7)>; <L7>:; ptr_9 = ptr_3 + 8B; # VUSE <BF_current_23>; D.1541_10 = BF_current.P[0]; L_11 = L_2 ^ D.1541_10; tmp4_12 = L_11 >> 24; # VUSE <BF_current_23>; D.1542_13 = BF_current.P[17]; L_14 = tmp4_12 ^ D.1542_13; D.1543_15 = ptr_9 - 8B; # BF_current_24 = V_MAY_DEF <BF_current_23>; *D.1543_15 = L_14; if (ptr_9 < &BF_current.P[18]) goto <L12>; else goto <L4>;
Pointed-to sets for pointers in main ptr_8, points-to vars: { BF_current } ptr_3, points-to vars: { BF_current } ptr_9, points-to vars: { BF_current } D.1543_15, is dereferenced, points-to anything D.1544_16, points-to vars: { BF_current } is wrong because of: handle_ptr_arith (struct constraint_expr lhs, tree expr) { tree op0, op1; struct constraint_expr base, offset; if (TREE_CODE (expr) != PLUS_EXPR) return false; and (gdb) call debug_tree(expr) <minus_expr 0x40194360 type <pointer_type 0x402260b8 type <integer_type 0x4021df74 BF_word sizes-gimplified public unsigned SI size <integer_cst 0x4018d3f0 constant invariant 32> unit size <integer_cst 0x4018d180 constant invariant 4> align 32 symtab 0 alias set -1 precision 32 min <integer_cst 0x4018d468 0> max <integer_cst 0x4018d450 4294967295> pointer_to_this <pointer_type 0x402260b8>> sizes-gimplified unsigned SI size <integer_cst 0x4018d3f0 32> unit size <integer_cst 0x4018d180 4> align 32 symtab 0 alias set -1> arg 0 <ssa_name 0x40228e38 type <pointer_type 0x402260b8> visited var <var_decl 0x40199210 ptr> def_stmt <modify_expr 0x40194264> version 9 ptr-info 0x4021f740> arg 1 <integer_cst 0x402271c8 type <pointer_type 0x402260b8> constant invariant 8>> I have a fix.
Subject: Bug number PR tree-optimization/26587 A patch for this bug has been added to the patch tracker. The mailing list url for the patch is http://gcc.gnu.org/ml/gcc-patches/2006-03/msg00357.html
Patches posted.
Subject: Bug 26587 Author: rguenth Date: Tue Mar 7 16:23:38 2006 New Revision: 111808 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=111808 Log: 2006-03-07 Richard Guenther <rguenther@suse.de> PR tree-optimization/26587 * tree-ssa-structalias.c (handle_ptr_arith): Handle MINUS_EXPR. * gcc.dg/torture/pr26587.c: New testcase. Added: trunk/gcc/testsuite/gcc.dg/torture/pr26587.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-structalias.c
Subject: Bug 26587 Author: rguenth Date: Tue Mar 7 16:26:14 2006 New Revision: 111809 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=111809 Log: 2006-03-07 Richard Guenther <rguenther@suse.de> PR tree-optimization/26587 * tree-ssa-structalias.c (handle_ptr_arith): Handle MINUS_EXPR. * gcc.dg/torture/pr26587.c: New testcase. Added: branches/gcc-4_1-branch/gcc/testsuite/gcc.dg/torture/pr26587.c Modified: branches/gcc-4_1-branch/gcc/ChangeLog branches/gcc-4_1-branch/gcc/testsuite/ChangeLog branches/gcc-4_1-branch/gcc/tree-ssa-structalias.c
Fixed.