37239 – peeling last iteration of a <= loop

Bug 37239 - peeling last iteration of a <= loop

Summary: peeling last iteration of a <= loop

Status:	ASSIGNED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	4.3.2

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	Michael Matz

URL:	https://gcc.gnu.org/ml/gcc-patches/20...
Keywords:	missed-optimization, patch

Depends on:
Blocks:	spec 30521 37240 37242
	Show dependency tree / graph

Reported:	2008-08-26 11:34 UTC by Paolo Bonzini
Modified:	2023-12-11 22:24 UTC (History)
CC List:	9 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:	4.2.3, 4.4.0
Last reconfirmed:	2018-02-16 00:00:00

Attachments
benchmark (3.94 KB, text/plain) 2008-08-26 11:36 UTC, Paolo Bonzini	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Paolo Bonzini 2008-08-26 11:34:06 UTC

If the condition of the loop is tested within the loop with == or !=, it may be beneficial to peel off the final iteration of the loop by changing the condition to <.

This happens in the attached benchmark's heapsort function where

    while ((maxIdx += maxIdx) <= last) { 
        if (maxIdx != last && numbers[maxIdx] < numbers[maxIdx + 1]) maxIdx++;
        if (tmp >= numbers[maxIdx]) break;
        numbers[top] = numbers[maxIdx];
        top = maxIdx;
    }

can become

    while ((maxIdx += maxIdx) <= last) { 
        if (numbers[maxIdx] < numbers[maxIdx + 1]) maxIdx++;
        if (tmp >= numbers[maxIdx]) break;
        numbers[top] = numbers[maxIdx];
        top = maxIdx;
    }
    if (maxIdx == last && tmp < numbers[maxIdx]) {
        numbers[top] = numbers[maxIdx];
        top = maxIdx;
    }

enabling in turn if-conversion of the first branch.

Performance of the benchmark is (-O3)

    basic               2.990
    peeling only        2.730
    if-conversion only  2.290
    peel+if-convert     2.010   (faster than quicksort!!)

ICC does this optimization.

Comment 1 Paolo Bonzini 2008-08-26 11:36:02 UTC

Created attachment 16148 [details]
benchmark

Comment 2 Richard Biener 2008-08-27 11:58:09 UTC

Hm, I don't know if the loop optimization code can peel the last iteration.

Comment 3 Paolo Bonzini 2008-08-27 12:08:14 UTC

no, it does not and I think it should not except for this particular case (or other similar ones).

Comment 4 Andrew Pinski 2015-07-02 16:18:00 UTC

There is another case where this optimization can come into handy and shows up in SPEC 2006.:

for (i = 1; i <= N; i++)
{
  ...
  if (i < N)
  {
    ...
  }
}

This is inside hammer's inner most loop.

Comment 5 Andrew Pinski 2016-07-25 07:40:31 UTC

https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00253.html

Comment 6 Eric Gallager 2018-02-16 05:47:49 UTC

(In reply to Andrew Pinski from comment #5)
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00253.html

This was approved with a minor nit fixed

Comment 7 Richard Biener 2018-02-16 08:12:54 UTC

While hmmer is now split the testcase in this bug isn't.  We do find some split points but appearantly never split the loop.

Comment 8 Michael Matz 2018-02-16 15:05:48 UTC

(In reply to Richard Biener from comment #7)
> While hmmer is now split the testcase in this bug isn't.  We do find some
> split points but appearantly never split the loop.

This is not a case for normal loop splitting.  The loop in question has multiple exits.  The potential split point that is found is in the loop in question,
but relative to the outer loop:

    for (  qty--; i >  0; i--) siftDown  (numbers, i, qty);

which inline-expanded looks somewhat like:

  for (  qty--; i >  0; i--)
    top = i, maxIdx = i, last = qty;
    while (last >= (maxIdx += maxIdx)) {

        /* This is where the comparison occurrs and where a sufficiently
           good compiler can use a computed conditional result rather
           than using control logic. */
        if (maxIdx != last && numbers[maxIdx] < numbers[maxIdx + 1]) maxIdx++;

        if (tmp >= numbers[maxIdx]) break;
        numbers[top] = numbers[maxIdx];
        top = maxIdx;
    }

The potential split point is the pre-header check if the inner loop is to
be entered at all (i.e. "last >= 2*maxIdx").  The outer loop isn't split
because i and 2*maxIdx (aka 2*i) don't have the same steps.

Of course even if we'd split the outer loop we wouldn't gain anything here.

Also note that the wanted transformation isn't exactly peeling the last iteration.  The last iteration must only be entered if the non-normal exit
isn't taken.  Dealing with all that is currently not implemented.

Comment 9 Eric Gallager 2019-05-16 04:21:48 UTC

(In reply to Eric Gallager from comment #6)
> (In reply to Andrew Pinski from comment #5)
> > https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00253.html
> 
> This was approved with a minor nit fixed

cc-ing Jeff from that thread

Comment 10 Andrew Pinski 2023-12-11 22:24:54 UTC

lsplit reports:
Found potential split point: if (maxIdx_171 <= qty_7)
 { i_6 * 2 + I*-2 } le_expr qty_7

But nothing else ...