Bug 19581 - Missed store motion on the tree level
Summary: Missed store motion on the tree level
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.0.0
: P2 enhancement
Target Milestone: 4.2.0
Assignee: Richard Biener
URL:
Keywords: alias, missed-optimization, TREE
Depends on:
Blocks: 19580 22366
  Show dependency treegraph
 
Reported: 2005-01-23 01:43 UTC by Andrew Pinski
Modified: 2006-02-03 12:32 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2006-01-15 21:30:49


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Pinski 2005-01-23 01:43:22 UTC
This is like PR 19580 but a request to optimize it on the tree level instead of just on the RTL level.  
Mainly because we don't catch it for PPC (with 3.3.3 or with -fno-gcse).
int r[6];

void f (int n)
{
  while (-- n)
    {
      r [0] += r [5];
      r [1] += r [0];
      r [2] += r [1];
      r [3] += r [2];
      r [4] += r [3];
      r [5] += r [4];
    }
}

We should be able to pull out the load and stores for r[0] ... r[5].
If we do it on the tree level, we also just fix the problem for 4.0 also (but I think this might need more 
aliasing analysis or I could be wrong).
Comment 1 Daniel Berlin 2005-01-23 01:47:51 UTC
This needs more alias analysis
Structure aliasing could do this with a little work, so i'll take this for now
Comment 2 Steven Bosscher 2005-01-23 19:27:08 UTC
Right now it is the *old* loop optimizer that promotes the array elements 
to registers.  How about that. 
 
Is there an "old loop optimizer" meta bug?  We'll probably want to do this 
elsewhere before the old loop optimizer can die. 
 
Comment 3 Steven Bosscher 2005-01-23 19:29:58 UTC
Has nothing to do with PPC 
Comment 4 Zdenek Dvorak 2005-01-23 19:44:29 UTC
Gcse store motion could catch this as well (if I understand PR 19580 well, it
does not, which is a bug), so this pr alone would not be a reason to block
removal of old loop optimizer.  However hopefully we should have a usable alias
analysis
on tree level in 4.1, of course.
Comment 5 Steven Bosscher 2005-01-23 19:53:35 UTC
GCSE store motion does catch this.  But it is disabled for GCC 4.0 because 
it is buggy and does not really work well in most cases. 
 
Comment 6 Andrew Pinski 2005-01-23 19:59:59 UTC
(In reply to comment #5)
> GCSE store motion does catch this.  But it is disabled for GCC 4.0 because 
> it is buggy and does not really work well in most cases. 
Yes but we don't catch in 3.4.0 so that means GCSE store motion did not catch it since GCSE store 
motion was only disabled for 4.0.0 but was enabled after the rewrite for 3.4.0.
Comment 7 Zdenek Dvorak 2005-01-23 20:01:49 UTC
The same holds for the old loop optimizer :-)
Comment 8 Daniel Berlin 2005-07-26 13:18:41 UTC
Richard guenther is working on aliasing for arrays, i'll leave this one for him
Comment 9 Richard Biener 2005-07-26 13:29:05 UTC
For a reduced array with only 4 elements (I know - this should be a --param) we
now get in .vars (with the array aliasing patch):

f (n)
{
  int n.39;
  unsigned int ivtmp.33;
  int lsm_tmp.32;
  int lsm_tmp.31;
  int lsm_tmp.30;
  int lsm_tmp.29;

<bb 0>:
  n.39 = n - 1;
  if (n.39 != 0) goto <L6>; else goto <L2>;

<L6>:;
  lsm_tmp.29 = r[2];
  lsm_tmp.30 = r[1];
  lsm_tmp.31 = r[3];
  lsm_tmp.32 = r[0];
  ivtmp.33 = 0;

<L0>:;
  lsm_tmp.32 = lsm_tmp.32 + lsm_tmp.31;
  lsm_tmp.30 = lsm_tmp.32 + lsm_tmp.30;
  lsm_tmp.29 = lsm_tmp.30 + lsm_tmp.29;
  lsm_tmp.31 = lsm_tmp.31 + lsm_tmp.29;
  ivtmp.33 = ivtmp.33 + 1;
  if (ivtmp.33 != (unsigned int) n.39) goto <L0>; else goto <L10>;

<L10>:;
  r[2] = lsm_tmp.29;
  r[1] = lsm_tmp.30;
  r[3] = lsm_tmp.31;
  r[0] = lsm_tmp.32;

<L2>:;
  return;

}

and the asm loop looks like

.L4:
        addl    %edx, %eax
        incl    %esi
        addl    %eax, %ecx
        addl    %ecx, %ebx
        addl    %ebx, %edx
        cmpl    %edi, %esi
        jne     .L4

as in exactly what you want.  Don't hold your breath for 4.1, though.
Comment 10 Richard Biener 2006-02-03 12:32:26 UTC
On trunk with --param salias-max-array-elements=6 we now get

.L4:
        addl    %edx, %eax
        addl    %eax, %ecx
        addl    %ecx, %ebx
        addl    %ebx, %esi
        addl    %esi, %edi
        addl    %edi, %edx
        subl    $1, -16(%ebp)
        jne     .L4

so this is fixed.  LSM could also be teached to use the infrastrucure I invented for the copyprop enhancement.