48316 – missed CSE / reassoc with array offsets

Bug 48316 - missed CSE / reassoc with array offsets

Summary: missed CSE / reassoc with array offsets

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	4.7.0

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2011-03-28 13:56 UTC by Richard Biener
Modified:	2021-12-08 08:10 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2021-12-05 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Richard Biener 2011-03-28 13:56:24 UTC

int
foo (int *p, int i)
{
  int i1 = i + 1;
  int i2 = i + 2;
  return p[i1] + p[i2];
}

int
bar (int *p, unsigned long i)
{
  unsigned long i1 = i + 1;
  unsigned long i2 = i + 2;
  return p[i1] + p[i2];
}

For both testcases (the latter being the more "optimal" input due to
pointer-plus-expr constraints) we miss to CSE the multiplication
of i by 4 which makes the memory references not trivially independent
(based on the same pointer, offsetted by different constants).  Such
a case causes vectorization for alias checks being inserted for
gfortran.dg/reassoc_4.f with --param max-completely-peeled-insns=4000

IL on x86_64 is

<bb 2>:
  i1_2 = i_1(D) + 1;
  i2_3 = i_1(D) + 2;
  D.2702_4 = (long unsigned int) i1_2;
  D.2703_5 = D.2702_4 * 4;
  D.2704_7 = p_6(D) + D.2703_5;
  D.2705_8 = MEM[(int *)D.2704_7];
  D.2706_9 = (long unsigned int) i2_3;
  D.2707_10 = D.2706_9 * 4;
  D.2708_11 = p_6(D) + D.2707_10;
  D.2709_12 = MEM[(int *)D.2708_11];
  D.2701_13 = D.2705_8 + D.2709_12;
  return D.2701_13;

vs.

<bb 2>:
  i1_2 = i_1(D) + 1;
  i2_3 = i_1(D) + 2;
  D.2694_4 = i1_2 * 4;
  D.2695_6 = p_5(D) + D.2694_4;
  D.2696_7 = MEM[(int *)D.2695_6];
  D.2697_8 = i2_3 * 4;
  D.2698_9 = p_5(D) + D.2697_8;
  D.2699_10 = MEM[(int *)D.2698_9];
  D.2693_11 = D.2696_7 + D.2699_10;
  return D.2693_11;

For the reassoc_4.f testcase the question is whether either SCEV or
data-dependence can be enhanced to handle the cases (the multiplications
are in BB2, outside of any loop).

Comment 1 davidxl 2014-03-12 23:53:52 UTC

int a[8][8];


int foo(int i)
{
  return a[i][3] + a[i+1][3] + a[i+2][3];
}

ICC generates:

        movslq    %edi, %rdi                                    #5.1
        shlq      $5, %rdi                                      #6.10
        movl      12+a(%rdi), %eax                              #6.10
        addl      44+a(%rdi), %eax                              #6.20
        addl      76+a(%rdi), %eax            

GCC can not CSE the common code for address arithmetic:

       leal    1(%rdi), %eax
        movslq  %edi, %rdx
        addl    $2, %edi
        cltq
        movslq  %edi, %rdi
        salq    $5, %rdx
        salq    $5, %rax
        salq    $5, %rdi
        movl    a+12(%rax), %eax
        addl    a+12(%rdx), %eax
        addl    a+12(%rdi), %eax
        ret

Comment 2 rguenther@suse.de 2014-03-13 08:56:26 UTC

On Wed, 12 Mar 2014, xinliangli at gmail dot com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48316
> 
> davidxl <xinliangli at gmail dot com> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |xinliangli at gmail dot com
> 
> --- Comment #1 from davidxl <xinliangli at gmail dot com> ---
> int a[8][8];
> 
> 
> int foo(int i)
> {
>   return a[i][3] + a[i+1][3] + a[i+2][3];
> }
> 
> ICC generates:
> 
>         movslq    %edi, %rdi                                    #5.1
>         shlq      $5, %rdi                                      #6.10
>         movl      12+a(%rdi), %eax                              #6.10
>         addl      44+a(%rdi), %eax                              #6.20
>         addl      76+a(%rdi), %eax            
> 
> GCC can not CSE the common code for address arithmetic:
> 
>        leal    1(%rdi), %eax
>         movslq  %edi, %rdx
>         addl    $2, %edi
>         cltq
>         movslq  %edi, %rdi
>         salq    $5, %rdx
>         salq    $5, %rax
>         salq    $5, %rdi
>         movl    a+12(%rax), %eax
>         addl    a+12(%rdx), %eax
>         addl    a+12(%rdi), %eax
>         ret

This particular case is a RTL optimization deficiency (or a
missed SLSR case).  The address operations are only lowered
during RTL expansion.