58122 – loops are not evaluated at compile time if loop count > 17

Bug 58122 - loops are not evaluated at compile time if loop count > 17

Summary: loops are not evaluated at compile time if loop count > 17

Status:	RESOLVED DUPLICATE of bug 57511

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	4.9.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2013-08-11 11:13 UTC by Oleg Endo
Modified:	2021-12-17 07:58 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:
Build:
Known to work:	4.7.4, 4.9.0, 7.1.0
Known to fail:	4.1.2, 4.8.1
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Oleg Endo 2013-08-11 11:13:37 UTC

This one was originally reported here:
http://gcc.gnu.org/ml/gcc-help/2013-08/msg00124.html

The original example was:
#include <stdio.h>
       
template <typename T>
inline T const& max (T const& a, T const& b)
{
  return a < b ? b : a;
}
       
int main()
{
  long long unsigned sum = 0;

  for (int x = 1; x <= 100000000; x++)
    sum += max (x, x + 1);

  printf("%llu\n", sum);
}

It seems that GCC 4.7 was able to evaluate the loop at compile time and reduce it to a constant value, but GCC 4.8 fails to do so.

I've also briefly checked with trunk rev 201282 and the problem seems to be still there.  Here is a reduced test case:

int test (void)
{
  int sum = 0;
  for (int x = 0; x < 100; x++)
    sum += x;

  return sum;
}

I've checked this with an SH cross compiler setup, but I don't think it matters.
The loops do get eliminated if the number of loop iterations is max. 17, for both the reduced example and the originally reported case.

Comment 1 Oleg Endo 2013-08-11 11:17:27 UTC

(In reply to Oleg Endo from comment #0)
> 
> I've checked this with an SH cross compiler setup, but I don't think it
> matters.
> The loops do get eliminated if the number of loop iterations is max. 17, for
> both the reduced example and the originally reported case.

Forgot to mention:
The loops get eliminated when compiling with -O3

Comment 2 Anthony Foiani 2013-08-11 16:51:24 UTC

Using this build of 4.8.1:

  $ /usr/local/gcc-4.8.1/bin/gcc -v
  Using built-in specs.
  COLLECT_GCC=/usr/local/gcc-4.8.1/bin/gcc
  COLLECT_LTO_WRAPPER=/usr/local/gcc-4.8.1/libexec/gcc/x86_64-unknown-linux-gnu/4.8.1/lto-wrapper
  Target: x86_64-unknown-linux-gnu
  Configured with: ../gcc-4.8.1/configure --prefix=/usr/local/gcc-4.8.1 --with-local-prefix=/usr/local/gcc-4.8.1/local --enable-languages=c,c++ --enable-threads --disable-multilib
  Thread model: posix
  gcc version 4.8.1 (GCC) 

With this test case (thanks for the nudge, Oleg!)

  #include <stdint.h>

  uint64_t test()
  {
      uint64_t rv = 0;
      uint32_t i;
      for ( i = 0; i < ITERS; ++i )
          rv += i;
      return rv;
  }

I did a quick loop to find where it switched from precomputing to looping.  Interestingly, it was at 71 (precomputed) to 72 (loop) iterations:

  $ for i in {1..100} ;
  do
    /usr/local/gcc-4.8.1/bin/gcc -O3 -S test.c -DITERS=$i ;
    if grep -q jne test.s ;
    then 
      echo "$i: loop" ;
    else 
      echo "$i: precomputed" ;
    fi ;
  done
  1: precomputed
  2: precomputed
  3: precomputed
  4: precomputed
  5: precomputed
  ...
  70: precomputed
  71: precomputed
  72: loop
  73: loop

Curious whether the 17/71 difference from Oleg's SH environment is a typo/transposition, or if it's a difference in native word size, or something else.

For whatever it's worth, this bug is not something that is affecting me; it was just a question brought up on the IRC channel.  So I don't have any priority to assign to it, other than that it seems a regression in quality-of-implementation since 4.7.  (Although, as I asked in my original e-mail, it might be that 4.8 is just more cautious about looping, so the correct flags could restore the desired performance.)

Thanks,
Tony

Comment 3 Oleg Endo 2013-08-11 18:00:46 UTC

Looking at the tree dump (-fdump-tree-all -fdump-tree-all-details), it seems that this is related to loop unrolling (dump file *t.cunroll) and induction variable optimization.

If the number of iterations is small enough the loop gets unrolled and then the calculations on 'sum' are folded away.

For this one:

int test2 (void)
{
  int sum = 0;
  for (int x = 0; x < 5; x++)
    sum += x;

  return sum;
}


t.cunroll shows:

;; Function int test2() (_Z5test2v, funcdef_no=0, decl_uid=1593, symbol_order=0)

int test2() ()
{
  int x;
  int sum;

  <bb 2>:
  return 10;
}

However, with this one:

int test2 (void)
{
  int sum = 0;
  for (int x = 0; x < 6; x++)
    sum += x;

  return sum;
}

The unrolled result is:

int test2() ()
{
  int x;
  int sum;
  unsigned int ivtmp_2;
  unsigned int ivtmp_14;
  unsigned int ivtmp_20;
  unsigned int ivtmp_26;
  unsigned int ivtmp_32;
  unsigned int ivtmp_38;

  <bb 2>:
  sum_12 = 0;
  x_13 = 1;
  ivtmp_14 = 5;
  sum_18 = sum_12 + x_13;
  x_19 = x_13 + 1;
  ivtmp_20 = ivtmp_14 + 4294967295;
  sum_24 = sum_18 + x_19;
  x_25 = x_19 + 1;
  ivtmp_26 = ivtmp_20 + 4294967295;
  sum_30 = sum_24 + x_25;
  x_31 = x_25 + 1;
  ivtmp_32 = ivtmp_26 + 4294967295;
  sum_36 = sum_30 + x_31;
  x_37 = x_31 + 1;
  ivtmp_38 = ivtmp_32 + 4294967295;
  sum_3 = sum_36 + x_37;
  x_4 = x_37 + 1;
  ivtmp_2 = ivtmp_38 + 4294967295;
  return sum_3;

}

Comment 4 Oleg Endo 2014-07-31 18:09:04 UTC

(In reply to Oleg Endo from comment #0)
> 
> I've also briefly checked with trunk rev 201282 and the problem seems to be
> still there.  Here is a reduced test case:
> 
> int test (void)
> {
>   int sum = 0;
>   for (int x = 0; x < 100; x++)
>     sum += x;
> 
>   return sum;
> }
> 

As of r213381 the reduced test case seems to work OK with at least loop counts up to 40000.

Comment 5 Oleg Endo 2016-09-25 10:33:49 UTC

This issue seems to be working just fine.  Not sure what kind of test case to add for this though... just scanning final assembler code for some expected hex or dec constant?

Comment 6 Andrew Pinski 2021-12-17 07:58:46 UTC

Dup of bug 57511.  There is a testcase even now for this issue here so it won't regress again.

*** This bug has been marked as a duplicate of bug 57511 ***