Bug 37239 - peeling last iteration of a <= loop
Summary: peeling last iteration of a <= loop
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.3.2
: P3 enhancement
Target Milestone: ---
Assignee: Michael Matz
URL: https://gcc.gnu.org/ml/gcc-patches/20...
Keywords: missed-optimization, patch
Depends on:
Blocks: spec 30521 37240 37242
  Show dependency treegraph
 
Reported: 2008-08-26 11:34 UTC by Paolo Bonzini
Modified: 2023-12-11 22:24 UTC (History)
9 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail: 4.2.3, 4.4.0
Last reconfirmed: 2018-02-16 00:00:00


Attachments
benchmark (3.94 KB, text/plain)
2008-08-26 11:36 UTC, Paolo Bonzini
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Paolo Bonzini 2008-08-26 11:34:06 UTC
If the condition of the loop is tested within the loop with == or !=, it may be beneficial to peel off the final iteration of the loop by changing the condition to <.

This happens in the attached benchmark's heapsort function where

    while ((maxIdx += maxIdx) <= last) { 
        if (maxIdx != last && numbers[maxIdx] < numbers[maxIdx + 1]) maxIdx++;
        if (tmp >= numbers[maxIdx]) break;
        numbers[top] = numbers[maxIdx];
        top = maxIdx;
    }

can become

    while ((maxIdx += maxIdx) <= last) { 
        if (numbers[maxIdx] < numbers[maxIdx + 1]) maxIdx++;
        if (tmp >= numbers[maxIdx]) break;
        numbers[top] = numbers[maxIdx];
        top = maxIdx;
    }
    if (maxIdx == last && tmp < numbers[maxIdx]) {
        numbers[top] = numbers[maxIdx];
        top = maxIdx;
    }

enabling in turn if-conversion of the first branch.

Performance of the benchmark is (-O3)

    basic               2.990
    peeling only        2.730
    if-conversion only  2.290
    peel+if-convert     2.010   (faster than quicksort!!)

ICC does this optimization.
Comment 1 Paolo Bonzini 2008-08-26 11:36:02 UTC
Created attachment 16148 [details]
benchmark
Comment 2 Richard Biener 2008-08-27 11:58:09 UTC
Hm, I don't know if the loop optimization code can peel the last iteration.
Comment 3 Paolo Bonzini 2008-08-27 12:08:14 UTC
no, it does not and I think it should not except for this particular case (or other similar ones).
Comment 4 Andrew Pinski 2015-07-02 16:18:00 UTC
There is another case where this optimization can come into handy and shows up in SPEC 2006.:

for (i = 1; i <= N; i++)
{
  ...
  if (i < N)
  {
    ...
  }
}

This is inside hammer's inner most loop.
Comment 6 Eric Gallager 2018-02-16 05:47:49 UTC
(In reply to Andrew Pinski from comment #5)
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00253.html

This was approved with a minor nit fixed
Comment 7 Richard Biener 2018-02-16 08:12:54 UTC
While hmmer is now split the testcase in this bug isn't.  We do find some split points but appearantly never split the loop.
Comment 8 Michael Matz 2018-02-16 15:05:48 UTC
(In reply to Richard Biener from comment #7)
> While hmmer is now split the testcase in this bug isn't.  We do find some
> split points but appearantly never split the loop.

This is not a case for normal loop splitting.  The loop in question has multiple exits.  The potential split point that is found is in the loop in question,
but relative to the outer loop:

    for (  qty--; i >  0; i--) siftDown  (numbers, i, qty);

which inline-expanded looks somewhat like:

  for (  qty--; i >  0; i--)
    top = i, maxIdx = i, last = qty;
    while (last >= (maxIdx += maxIdx)) {

        /* This is where the comparison occurrs and where a sufficiently
           good compiler can use a computed conditional result rather
           than using control logic. */
        if (maxIdx != last && numbers[maxIdx] < numbers[maxIdx + 1]) maxIdx++;

        if (tmp >= numbers[maxIdx]) break;
        numbers[top] = numbers[maxIdx];
        top = maxIdx;
    }

The potential split point is the pre-header check if the inner loop is to
be entered at all (i.e. "last >= 2*maxIdx").  The outer loop isn't split
because i and 2*maxIdx (aka 2*i) don't have the same steps.

Of course even if we'd split the outer loop we wouldn't gain anything here.

Also note that the wanted transformation isn't exactly peeling the last iteration.  The last iteration must only be entered if the non-normal exit
isn't taken.  Dealing with all that is currently not implemented.
Comment 9 Eric Gallager 2019-05-16 04:21:48 UTC
(In reply to Eric Gallager from comment #6)
> (In reply to Andrew Pinski from comment #5)
> > https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00253.html
> 
> This was approved with a minor nit fixed

cc-ing Jeff from that thread
Comment 10 Andrew Pinski 2023-12-11 22:24:54 UTC
lsplit reports:
Found potential split point: if (maxIdx_171 <= qty_7)
 { i_6 * 2 + I*-2 } le_expr qty_7

But nothing else ...