Bug 48316 - missed CSE / reassoc with array offsets
Summary: missed CSE / reassoc with array offsets
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.7.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2011-03-28 13:56 UTC by Richard Biener
Modified: 2021-12-08 08:10 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-12-05 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2011-03-28 13:56:24 UTC
int
foo (int *p, int i)
{
  int i1 = i + 1;
  int i2 = i + 2;
  return p[i1] + p[i2];
}

int
bar (int *p, unsigned long i)
{
  unsigned long i1 = i + 1;
  unsigned long i2 = i + 2;
  return p[i1] + p[i2];
}

For both testcases (the latter being the more "optimal" input due to
pointer-plus-expr constraints) we miss to CSE the multiplication
of i by 4 which makes the memory references not trivially independent
(based on the same pointer, offsetted by different constants).  Such
a case causes vectorization for alias checks being inserted for
gfortran.dg/reassoc_4.f with --param max-completely-peeled-insns=4000

IL on x86_64 is

<bb 2>:
  i1_2 = i_1(D) + 1;
  i2_3 = i_1(D) + 2;
  D.2702_4 = (long unsigned int) i1_2;
  D.2703_5 = D.2702_4 * 4;
  D.2704_7 = p_6(D) + D.2703_5;
  D.2705_8 = MEM[(int *)D.2704_7];
  D.2706_9 = (long unsigned int) i2_3;
  D.2707_10 = D.2706_9 * 4;
  D.2708_11 = p_6(D) + D.2707_10;
  D.2709_12 = MEM[(int *)D.2708_11];
  D.2701_13 = D.2705_8 + D.2709_12;
  return D.2701_13;

vs.

<bb 2>:
  i1_2 = i_1(D) + 1;
  i2_3 = i_1(D) + 2;
  D.2694_4 = i1_2 * 4;
  D.2695_6 = p_5(D) + D.2694_4;
  D.2696_7 = MEM[(int *)D.2695_6];
  D.2697_8 = i2_3 * 4;
  D.2698_9 = p_5(D) + D.2697_8;
  D.2699_10 = MEM[(int *)D.2698_9];
  D.2693_11 = D.2696_7 + D.2699_10;
  return D.2693_11;

For the reassoc_4.f testcase the question is whether either SCEV or
data-dependence can be enhanced to handle the cases (the multiplications
are in BB2, outside of any loop).
Comment 1 davidxl 2014-03-12 23:53:52 UTC
int a[8][8];


int foo(int i)
{
  return a[i][3] + a[i+1][3] + a[i+2][3];
}

ICC generates:

        movslq    %edi, %rdi                                    #5.1
        shlq      $5, %rdi                                      #6.10
        movl      12+a(%rdi), %eax                              #6.10
        addl      44+a(%rdi), %eax                              #6.20
        addl      76+a(%rdi), %eax            

GCC can not CSE the common code for address arithmetic:

       leal    1(%rdi), %eax
        movslq  %edi, %rdx
        addl    $2, %edi
        cltq
        movslq  %edi, %rdi
        salq    $5, %rdx
        salq    $5, %rax
        salq    $5, %rdi
        movl    a+12(%rax), %eax
        addl    a+12(%rdx), %eax
        addl    a+12(%rdi), %eax
        ret
Comment 2 rguenther@suse.de 2014-03-13 08:56:26 UTC
On Wed, 12 Mar 2014, xinliangli at gmail dot com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48316
> 
> davidxl <xinliangli at gmail dot com> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |xinliangli at gmail dot com
> 
> --- Comment #1 from davidxl <xinliangli at gmail dot com> ---
> int a[8][8];
> 
> 
> int foo(int i)
> {
>   return a[i][3] + a[i+1][3] + a[i+2][3];
> }
> 
> ICC generates:
> 
>         movslq    %edi, %rdi                                    #5.1
>         shlq      $5, %rdi                                      #6.10
>         movl      12+a(%rdi), %eax                              #6.10
>         addl      44+a(%rdi), %eax                              #6.20
>         addl      76+a(%rdi), %eax            
> 
> GCC can not CSE the common code for address arithmetic:
> 
>        leal    1(%rdi), %eax
>         movslq  %edi, %rdx
>         addl    $2, %edi
>         cltq
>         movslq  %edi, %rdi
>         salq    $5, %rdx
>         salq    $5, %rax
>         salq    $5, %rdi
>         movl    a+12(%rax), %eax
>         addl    a+12(%rdx), %eax
>         addl    a+12(%rdi), %eax
>         ret

This particular case is a RTL optimization deficiency (or a
missed SLSR case).  The address operations are only lowered
during RTL expansion.