char a[10], b[10]; int f1(int i) { return a[i+1] + b[i+1]; } With -O1 and higher, gcc performs CSE on i+1: addl $1, %edi movslq %edi,%rdi movsbl a(%rdi),%eax movsbl b(%rdi),%edx addl %edx, %eax ret This doesn't happen with the equivalent return (&a[0])[i+1] + (&b[0])[i+1]; or return *(a + i + 1) + *(b + i + 1); which both compile to movslq %edi,%rdi movsbl a+1(%rdi),%eax movsbl b+1(%rdi),%edx addl %edx, %eax ret
Well actually this is just address section which goes funny. We use CSE at the rtl level do to some address mode selection (which I feel is wrong). Part of the reasons why we go funny is that we do the add in 32bit. What is happening with the other ones is the others are being converted into " return (&a[1])[i] + (&b[1])[i];" which are not converted back into a[i+1] and b[i+1] because I don't know but they should be. This is a regression because we do PRE/FRE at the tree level and let the address selection done at the rtl level (which is correct) but the rtl level address selectiong is not working to the level we need it to be for the regression not to show up.
It's not really CSE's fault, though I agree that doing addressing mode selection there is wrong. fwprop does not fix the bug for example, though teaching it about this case could be easier than teaching CSE. Unlike CSE, fwprop fixes this case: char a[10], b[10]; int f1(log i) { return a[i+1] + b[i+1]; } which generates incq %rdi movsbl a(%rdi),%eax movsbl b(%rdi),%edx addl %edx, %eax with current GCC. So one way to fix this bug (assuming fwprop is merged in 4.3) is to teach something (expand?) that (set (reg:DI 58 [ D.1872 ]) (sign_extend:DI (reg:SI 60 [ D.1872 ]))) (parallel [ (set (reg:DI 62 [ D.1872 ]) (plus:DI (reg/v:DI 58 [ i ]) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags)) ]) at least when we can disregard overflow, might be better than (parallel [ (set (reg:SI 58 [ D.1872 ]) (plus:SI (reg/v:SI 60 [ i ]) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags)) ]) (set (reg:DI 62 [ D.1872 ]) (sign_extend:DI (reg:SI 58 [ D.1872 ])))
BTW, what did 4.0 and 3.4 produce?
For x86, 3.4.0 produces: movl 4(%esp), %edx movsbl a+1(%edx),%eax movsbl b+1(%edx),%edx addl %edx, %eax ret While 4.0.0 produced: movl 4(%esp), %edx incl %edx movsbl a(%edx),%eax movsbl b(%edx),%edx addl %edx, %eax ret (Note this is also not really a target issue as PPC can be helped by this also by having a+1 and b+1 and no need for an extra increment of 1 to the argument).
fwprop fixes the bug on i386-pc-linux-gnu
I plan to fix it via the fwprop merge in 4.3, but not in 4.2. Should I still assign this to me?
We're not folding return (int) *((char *) (long unsigned int) i + &a + 1B) + (int) *((char *) (long unsigned int) i + &b + 1B); one reason is that the C frontend decomposes &a[i], one is the array-to-pointer decay code of the frontend. Compare that to the C++ frontend where we manage to fold to return <retval> = (int) a[(<unnamed type>) (long unsigned int) i + 1] + (int) b[(<unnamed type>) (long unsigned int) i + 1]; and so do the optimization. In particular, fold-const.c:try_move_mult_to_index does not recognize <addr_expr 0x2b0bb95fd0c0 type <pointer_type 0x2b0bb9615420 type <integer_type 0x2b0bb96022c0 char public string-flag QI size <integer_cst 0x2b0bb95f27e0 constant invariant 8> unit size <integer_cst 0x2b0bb95f2810 constant invariant 1> align 8 symtab 0 alias set -1 precision 8 min <integer_cst 0x2b0bb95f28a0 -128> max <integer_cst 0x2b0bb95f2960 127> pointer_to_this <pointer_type 0x2b0bb9615420>> public unsigned DI size <integer_cst 0x2b0bb95f2d80 constant invariant 64> unit size <integer_cst 0x2b0bb95f2db0 constant invariant 8> align 64 symtab 0 alias set -1> constant invariant arg 0 <var_decl 0x2b0bb97bcbb0 a type <array_type 0x2b0bb97bcb00 type <integer_type 0x2b0bb96022c0 char> BLK size <integer_cst 0x2b0bb97ce3f0 constant invariant 80> unit size <integer_cst 0x2b0bb97ce390 constant invariant 10> align 8 symtab 0 alias set -1 domain <integer_type 0x2b0bb97bca50>> addressable used public static common BLK defer-output file t.i line 1 size <integer_cst 0x2b0bb97ce3f0 80> unit size <integer_cst 0x2b0bb97ce390 10> align 8>> as &a[0].
One minimal fix for this is the following (patches for this I sent many times long time ago, but poking in the C frontend is tedious): Index: c-typeck.c =================================================================== *** c-typeck.c (revision 117629) --- c-typeck.c (working copy) *************** build_unary_op (enum tree_code code, tre *** 3032,3046 **** } /* For &x[y], return x+y */ ! if (TREE_CODE (arg) == ARRAY_REF) { tree op0 = TREE_OPERAND (arg, 0); if (!c_mark_addressable (op0)) return error_mark_node; ! return build_binary_op (PLUS_EXPR, ! (TREE_CODE (TREE_TYPE (op0)) == ARRAY_TYPE ! ? array_to_pointer_conversion (op0) ! : op0), TREE_OPERAND (arg, 1), 1); } --- 3032,3044 ---- } /* For &x[y], return x+y */ ! if (TREE_CODE (arg) == ARRAY_REF ! && TREE_CODE (TREE_TYPE (TREE_OPERAND (arg, 0))) != ARRAY_TYPE) { tree op0 = TREE_OPERAND (arg, 0); if (!c_mark_addressable (op0)) return error_mark_node; ! return build_binary_op (PLUS_EXPR, op0, TREE_OPERAND (arg, 1), 1); }
For this, on i386-pc-linux-gnu, C and C++ give the exact (pessimized) same code: char a[10], b[10]; int f1(int i) { return a[i+1] + b[i+1]; } That RTL address selection sucks is just a fact. :-)
I get (-O -m32) for C++: _Z2f1i: .LFB2: pushl %ebp .LCFI0: movl %esp, %ebp .LCFI1: movl 8(%ebp), %edx addl $1, %edx movsbl b(%edx),%eax movsbl a(%edx),%edx addl %edx, %eax popl %ebp ret which I thought was the good behavior.
Subject: Re: [4.0/4.1/4.2 Regression] address selection does not work correctly > movl 8(%ebp), %edx > addl $1, %edx > movsbl b(%edx),%eax > movsbl a(%edx),%edx > No, the good behavior has "b+1(%edx)" and "a+1(%edx)" (for non-PIC code). Paolo
On the trunk I get: movsbl b+1(%edx),%eax movsbl a+1(%edx),%edx so this has been fixed there.
Can we mark it as WONTFIX for 4.0 to 4.2?
> Can we mark it as WONTFIX for 4.0 to 4.2? Seems reasonable to me, but it would be nice to detect a future regression.
Subject: Re: [4.0/4.1/4.2 Regression] address selection does not work correctly Yes, we should add a testcase.
Subject: Bug 28940 Author: ebotcazou Date: Sat Nov 3 07:53:01 2007 New Revision: 129868 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=129868 Log: PR rtl-optimization/28940 * gcc.target/i386/addr-sel-1.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/addr-sel-1.c Modified: trunk/gcc/testsuite/ChangeLog
Fixed in the upcoming 4.3 series.