28940 – [4.0/4.1/4.2 Regression] address selection does not work correctly

Bug 28940 - [4.0/4.1/4.2 Regression] address selection does not work correctly

Summary: [4.0/4.1/4.2 Regression] address selection does not work correctly

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	rtl-optimization (show other bugs)
Version:	4.1.1

Importance:	P2 normal
Target Milestone:	4.1.3
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2006-09-02 21:14 UTC by Lev Makhlis
Modified:	2007-11-03 07:54 UTC (History)
CC List:	7 users (show)

See Also:
Host:
Target:
Build:
Known to work:	3.4.0
Known to fail:	4.0.0 4.1.0 4.2.0
Last reconfirmed:	2006-09-02 21:29:36

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Lev Makhlis 2006-09-02 21:14:17 UTC

char a[10], b[10];

int f1(int i)
{
        return a[i+1] + b[i+1];
}

With -O1 and higher, gcc performs CSE on i+1:

        addl    $1, %edi
        movslq  %edi,%rdi
        movsbl  a(%rdi),%eax
        movsbl  b(%rdi),%edx
        addl    %edx, %eax
        ret

This doesn't happen with the equivalent
        return (&a[0])[i+1] + (&b[0])[i+1];
or
        return *(a + i + 1) + *(b + i + 1);
which both compile to
        movslq  %edi,%rdi
        movsbl  a+1(%rdi),%eax
        movsbl  b+1(%rdi),%edx
        addl    %edx, %eax
        ret

Comment 1 Andrew Pinski 2006-09-02 21:29:36 UTC

Well actually this is just address section which goes funny.  We use CSE at the rtl level do to some address mode selection (which I feel is wrong).  Part of the reasons why we go funny is that we do the add in 32bit.


What is happening with the other ones
is the others are being converted into " return (&a[1])[i] + (&b[1])[i];" which
are not converted back into a[i+1] and b[i+1] because I don't know but they should be. 

This is a regression because we do PRE/FRE at the tree level and let the address selection done at the rtl level (which is correct) but the rtl level address selectiong is not working to the level we need it to be for the regression not to show up.

Comment 2 Paolo Bonzini 2006-09-07 08:11:45 UTC

It's not really CSE's fault, though I agree that doing addressing mode selection there is wrong.  fwprop does not fix the bug for example, though teaching it about this case could be easier than teaching CSE.

Unlike CSE, fwprop fixes this case:

char a[10], b[10];

int f1(log i)
{
        return a[i+1] + b[i+1];
}

which generates

        incq    %rdi
        movsbl  a(%rdi),%eax
        movsbl  b(%rdi),%edx
        addl    %edx, %eax

with current GCC.  So one way to fix this bug (assuming fwprop is merged in 4.3) is to teach something (expand?) that


(set (reg:DI 58 [ D.1872 ])
        (sign_extend:DI (reg:SI 60 [ D.1872 ])))

(parallel [
            (set (reg:DI 62 [ D.1872 ])
                (plus:DI (reg/v:DI 58 [ i ])
                    (const_int 1 [0x1])))
            (clobber (reg:CC 17 flags))
        ])

at least when we can disregard overflow, might be better than

(parallel [
            (set (reg:SI 58 [ D.1872 ])
                (plus:SI (reg/v:SI 60 [ i ])
                    (const_int 1 [0x1])))
            (clobber (reg:CC 17 flags))
        ])

(set (reg:DI 62 [ D.1872 ])
        (sign_extend:DI (reg:SI 58 [ D.1872 ])))

Comment 3 Paolo Bonzini 2006-09-07 08:15:18 UTC

BTW, what did 4.0 and 3.4 produce?

Comment 4 Andrew Pinski 2006-09-09 05:48:00 UTC

For x86, 3.4.0 produces:
        movl    4(%esp), %edx
        movsbl  a+1(%edx),%eax
        movsbl  b+1(%edx),%edx
        addl    %edx, %eax
        ret

While 4.0.0 produced:
        movl    4(%esp), %edx
        incl    %edx
        movsbl  a(%edx),%eax
        movsbl  b(%edx),%edx
        addl    %edx, %eax
        ret

(Note this is also not really a target issue as PPC can be helped by this also by having a+1 and b+1 and no need for an extra increment of 1 to the argument).

Comment 5 Paolo Bonzini 2006-09-09 13:02:11 UTC

fwprop fixes the bug on i386-pc-linux-gnu

Comment 6 Paolo Bonzini 2006-09-09 13:03:19 UTC

I plan to fix it via the fwprop merge in 4.3, but not in 4.2.  Should I still assign this to me?

Comment 7 Richard Biener 2006-10-11 12:23:33 UTC

We're not folding

  return (int) *((char *) (long unsigned int) i + &a + 1B) + (int) *((char *) (long unsigned int) i + &b + 1B);

one reason is that the C frontend decomposes &a[i], one is the array-to-pointer decay code of the frontend.  Compare that to the C++ frontend where we manage to fold to

return <retval> = (int) a[(<unnamed type>) (long unsigned int) i + 1] + (int) b[(<unnamed type>) (long unsigned int) i + 1];

and so do the optimization.

In particular, fold-const.c:try_move_mult_to_index does not recognize
 <addr_expr 0x2b0bb95fd0c0
    type <pointer_type 0x2b0bb9615420
        type <integer_type 0x2b0bb96022c0 char public string-flag QI
            size <integer_cst 0x2b0bb95f27e0 constant invariant 8>
            unit size <integer_cst 0x2b0bb95f2810 constant invariant 1>
            align 8 symtab 0 alias set -1 precision 8 min <integer_cst 0x2b0bb95f28a0 -128> max <integer_cst 0x2b0bb95f2960 127>
            pointer_to_this <pointer_type 0x2b0bb9615420>>
        public unsigned DI
        size <integer_cst 0x2b0bb95f2d80 constant invariant 64>
        unit size <integer_cst 0x2b0bb95f2db0 constant invariant 8>
        align 64 symtab 0 alias set -1>
    constant invariant
    arg 0 <var_decl 0x2b0bb97bcbb0 a
        type <array_type 0x2b0bb97bcb00 type <integer_type 0x2b0bb96022c0 char>
            BLK
            size <integer_cst 0x2b0bb97ce3f0 constant invariant 80>
            unit size <integer_cst 0x2b0bb97ce390 constant invariant 10>
            align 8 symtab 0 alias set -1 domain <integer_type 0x2b0bb97bca50>>
        addressable used public static common BLK defer-output file t.i line 1 size <integer_cst 0x2b0bb97ce3f0 80> unit size <integer_cst 0x2b0bb97ce390 10>
        align 8>>
as &a[0].

Comment 8 Richard Biener 2006-10-11 12:30:44 UTC

One minimal fix for this is the following (patches for this I sent many times long time ago, but poking in the C frontend is tedious):

Index: c-typeck.c
===================================================================
*** c-typeck.c  (revision 117629)
--- c-typeck.c  (working copy)
*************** build_unary_op (enum tree_code code, tre
*** 3032,3046 ****
        }
  
        /* For &x[y], return x+y */
!       if (TREE_CODE (arg) == ARRAY_REF)
        {
          tree op0 = TREE_OPERAND (arg, 0);
          if (!c_mark_addressable (op0))
            return error_mark_node;
!         return build_binary_op (PLUS_EXPR,
!                                 (TREE_CODE (TREE_TYPE (op0)) == ARRAY_TYPE
!                                  ? array_to_pointer_conversion (op0)
!                                  : op0),
                                  TREE_OPERAND (arg, 1), 1);
        }
  
--- 3032,3044 ----
        }
  
        /* For &x[y], return x+y */
!       if (TREE_CODE (arg) == ARRAY_REF
!         && TREE_CODE (TREE_TYPE (TREE_OPERAND (arg, 0))) != ARRAY_TYPE)
        {
          tree op0 = TREE_OPERAND (arg, 0);
          if (!c_mark_addressable (op0))
            return error_mark_node;
!         return build_binary_op (PLUS_EXPR, op0,
                                  TREE_OPERAND (arg, 1), 1);
        }

Comment 9 Paolo Bonzini 2006-10-11 12:47:16 UTC

For this, on i386-pc-linux-gnu, C and C++ give the exact (pessimized) same code:

char a[10], b[10];

int f1(int i)
{
        return a[i+1] + b[i+1];
}

That RTL address selection sucks is just a fact. :-)

Comment 10 Richard Biener 2006-10-11 12:53:18 UTC

I get (-O -m32) for C++:

_Z2f1i:
.LFB2:
        pushl   %ebp
.LCFI0:
        movl    %esp, %ebp
.LCFI1:
        movl    8(%ebp), %edx
        addl    $1, %edx
        movsbl  b(%edx),%eax
        movsbl  a(%edx),%edx
        addl    %edx, %eax
        popl    %ebp
        ret

which I thought was the good behavior.

Comment 11 paolo.bonzini@lu.unisi.ch 2006-10-11 13:05:01 UTC

Subject: Re:  [4.0/4.1/4.2 Regression] address selection does
 not work correctly


>         movl    8(%ebp), %edx
>         addl    $1, %edx
>         movsbl  b(%edx),%eax
>         movsbl  a(%edx),%edx
>   
No, the good behavior has "b+1(%edx)" and "a+1(%edx)" (for non-PIC code).

Paolo

Comment 12 Andrew Pinski 2007-01-15 20:20:44 UTC

On the trunk I get:
        movsbl  b+1(%edx),%eax
        movsbl  a+1(%edx),%edx

so this has been fixed there.

Comment 13 Paolo Bonzini 2007-07-05 10:36:56 UTC

Can we mark it as WONTFIX for 4.0 to 4.2?

Comment 14 Eric Botcazou 2007-07-05 10:43:05 UTC

> Can we mark it as WONTFIX for 4.0 to 4.2?

Seems reasonable to me, but it would be nice to detect a future regression.

Comment 15 paolo.bonzini@lu.unisi.ch 2007-07-05 10:46:00 UTC

Subject: Re:  [4.0/4.1/4.2 Regression] address
 selection does not work correctly

Yes, we should add a testcase.

Comment 16 Eric Botcazou 2007-11-03 07:53:11 UTC

Subject: Bug 28940

Author: ebotcazou
Date: Sat Nov  3 07:53:01 2007
New Revision: 129868

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=129868
Log:
	PR rtl-optimization/28940
	* gcc.target/i386/addr-sel-1.c: New test.


Added:
    trunk/gcc/testsuite/gcc.target/i386/addr-sel-1.c
Modified:
    trunk/gcc/testsuite/ChangeLog

Comment 17 Eric Botcazou 2007-11-03 07:54:11 UTC

Fixed in the upcoming 4.3 series.