Bug 37239

Summary: peeling last iteration of a <= loop
Product: gcc Reporter: Paolo Bonzini <bonzini>
Component: tree-optimizationAssignee: Michael Matz <matz>
Status: ASSIGNED ---    
Severity: enhancement CC: egallager, esigra, gcc-bugs, jeffreyalaw, kugan, matz, pinskia, rakdver, rguenth
Priority: P3 Keywords: missed-optimization, patch
Version: 4.3.2   
Target Milestone: ---   
URL: https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00253.html
Host: Target:
Build: Known to work:
Known to fail: 4.2.3, 4.4.0 Last reconfirmed: 2018-02-16 00:00:00
Bug Depends on:    
Bug Blocks: 26163, 30521, 37240, 37242    
Attachments: benchmark

Description Paolo Bonzini 2008-08-26 11:34:06 UTC
If the condition of the loop is tested within the loop with == or !=, it may be beneficial to peel off the final iteration of the loop by changing the condition to <.

This happens in the attached benchmark's heapsort function where

    while ((maxIdx += maxIdx) <= last) { 
        if (maxIdx != last && numbers[maxIdx] < numbers[maxIdx + 1]) maxIdx++;
        if (tmp >= numbers[maxIdx]) break;
        numbers[top] = numbers[maxIdx];
        top = maxIdx;
    }

can become

    while ((maxIdx += maxIdx) <= last) { 
        if (numbers[maxIdx] < numbers[maxIdx + 1]) maxIdx++;
        if (tmp >= numbers[maxIdx]) break;
        numbers[top] = numbers[maxIdx];
        top = maxIdx;
    }
    if (maxIdx == last && tmp < numbers[maxIdx]) {
        numbers[top] = numbers[maxIdx];
        top = maxIdx;
    }

enabling in turn if-conversion of the first branch.

Performance of the benchmark is (-O3)

    basic               2.990
    peeling only        2.730
    if-conversion only  2.290
    peel+if-convert     2.010   (faster than quicksort!!)

ICC does this optimization.
Comment 1 Paolo Bonzini 2008-08-26 11:36:02 UTC
Created attachment 16148 [details]
benchmark
Comment 2 Richard Biener 2008-08-27 11:58:09 UTC
Hm, I don't know if the loop optimization code can peel the last iteration.
Comment 3 Paolo Bonzini 2008-08-27 12:08:14 UTC
no, it does not and I think it should not except for this particular case (or other similar ones).
Comment 4 Andrew Pinski 2015-07-02 16:18:00 UTC
There is another case where this optimization can come into handy and shows up in SPEC 2006.:

for (i = 1; i <= N; i++)
{
  ...
  if (i < N)
  {
    ...
  }
}

This is inside hammer's inner most loop.
Comment 6 Eric Gallager 2018-02-16 05:47:49 UTC
(In reply to Andrew Pinski from comment #5)
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00253.html

This was approved with a minor nit fixed
Comment 7 Richard Biener 2018-02-16 08:12:54 UTC
While hmmer is now split the testcase in this bug isn't.  We do find some split points but appearantly never split the loop.
Comment 8 Michael Matz 2018-02-16 15:05:48 UTC
(In reply to Richard Biener from comment #7)
> While hmmer is now split the testcase in this bug isn't.  We do find some
> split points but appearantly never split the loop.

This is not a case for normal loop splitting.  The loop in question has multiple exits.  The potential split point that is found is in the loop in question,
but relative to the outer loop:

    for (  qty--; i >  0; i--) siftDown  (numbers, i, qty);

which inline-expanded looks somewhat like:

  for (  qty--; i >  0; i--)
    top = i, maxIdx = i, last = qty;
    while (last >= (maxIdx += maxIdx)) {

        /* This is where the comparison occurrs and where a sufficiently
           good compiler can use a computed conditional result rather
           than using control logic. */
        if (maxIdx != last && numbers[maxIdx] < numbers[maxIdx + 1]) maxIdx++;

        if (tmp >= numbers[maxIdx]) break;
        numbers[top] = numbers[maxIdx];
        top = maxIdx;
    }

The potential split point is the pre-header check if the inner loop is to
be entered at all (i.e. "last >= 2*maxIdx").  The outer loop isn't split
because i and 2*maxIdx (aka 2*i) don't have the same steps.

Of course even if we'd split the outer loop we wouldn't gain anything here.

Also note that the wanted transformation isn't exactly peeling the last iteration.  The last iteration must only be entered if the non-normal exit
isn't taken.  Dealing with all that is currently not implemented.
Comment 9 Eric Gallager 2019-05-16 04:21:48 UTC
(In reply to Eric Gallager from comment #6)
> (In reply to Andrew Pinski from comment #5)
> > https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00253.html
> 
> This was approved with a minor nit fixed

cc-ing Jeff from that thread
Comment 10 Andrew Pinski 2023-12-11 22:24:54 UTC
lsplit reports:
Found potential split point: if (maxIdx_171 <= qty_7)
 { i_6 * 2 + I*-2 } le_expr qty_7

But nothing else ...