Bug 34160 - Useful loop invariant motion missing
Summary: Useful loop invariant motion missing
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.3.0
: P3 enhancement
Target Milestone: 4.4.0
Assignee: Not yet assigned to anyone
URL:
Keywords: alias, missed-optimization
Depends on:
Blocks:
 
Reported: 2007-11-20 12:05 UTC by Alexander Monakov
Modified: 2009-04-03 12:27 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail: 4.3.3
Last reconfirmed:


Attachments
Preprocessed source (12.97 KB, text/plain)
2007-11-20 12:06 UTC, Alexander Monakov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Monakov 2007-11-20 12:05:40 UTC
int main()
{
  static int i, n;
  static double a[200], b[200];
  ... (more variables and control flow)
  for (i = 0; i < n; i++)
    a[i] = b[i];
  ...
}

Tree-level optimisations do not pull out loads of i and n and store to i out of loop. As a result, GCC generates five memory accesses on ia64 for each iteration (4.3.0 20071112):

.L9:
        .mii
        nop 0
        sxt4 r14 = r16
        nop 0
        .mmi
        ld4 r15 = [r32]
        ld4 r58 = [r33]
        nop 0
        ;;
        .mii
        shladd r14 = r14, 3, r0
        adds r16 = 1, r15
        ;;
        add r15 = r35, r14
        .mmi
        add r14 = r34, r14
        st4 [r32] = r16
        cmp4.lt p6, p7 = r16, r58
        ;;
        .mmi
        nop 0
        ldfd f6 = [r15]
        nop 0
        ;;
        .mib
        stfd [r14] = f6
        nop 0
        (p6) br.cond.dptk .L9

On x86_64 situation is better (4.3.0 20070930), but not good:
.L13:
        movslq  %eax,%rdx
        movq    b.3894(,%rdx,8), %rax
        movq    %rax, x.3895(,%rdx,8)
        leal    1(%rcx), %eax
        cmpl    %eax, %edi
        movl    %eax, %ecx
        movl    %eax, i.3912(%rip)
        jg      .L13

but the optimization happened on RTL level, as final_cleanup dump reads:

<bb 13>:
  # MPT.140_429 = VDEF <MPT.140_645>
  x[i.265] = b[i.265];
  # VUSE <MPT.140_429>
  i.23 = i;
  i.265 = i.23 + 1;
  # MPT.140_430 = VDEF <MPT.140_429>
  i = i.265;
  # VUSE <MPT.140_430>
  n.274 = n;
  if (n.274 > i.265)
    goto <bb 13>;
  else
    goto <bb 14>;
Comment 1 Alexander Monakov 2007-11-20 12:06:46 UTC
Created attachment 14584 [details]
Preprocessed source

Testcase attached.
Comment 2 Richard Biener 2007-11-20 15:43:45 UTC
Alias got in the way here.
Comment 3 Richard Biener 2009-04-03 12:27:39 UTC
<bb 10>:
  # i.21_754 = PHI <i.21_1090(10), 0(9)>
  # i.21_484 = PHI <i.21_1090(10), 0(9)>
  D.4090_71 = b[i.21_754];
  x[i.21_754] = D.4090_71;
  D.4345_1088 = (unsigned int) i.21_484;
  D.4346_1089 = D.4345_1088 + 1;
  i.21_1090 = (int) D.4346_1089;
  if (n.17_753 > i.21_1090)
    goto <bb 10>;
  else
    goto <bb 11>;

seems to work with 4.4.