This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug tree-optimization/69042] New: [6 regression] Missed optimization in ivopts

From: "ienkovich at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Thu, 24 Dec 2015 14:00:24 +0000
Subject: [Bug tree-optimization/69042] New: [6 regression] Missed optimization in ivopts
Auto-submitted: auto-generated

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042

            Bug ID: 69042
           Summary: [6 regression] Missed optimization in ivopts
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ienkovich at gcc dot gnu.org
  Target Milestone: ---

Created attachment 37127
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37127&action=edit
Reproducer

Here is a reduced loop:

  for (i = 1; i < 64; i++) {
    if (data[indexes[i]]) {
      j++;
    } else {
      if (p (j))
        return 0;
      j = 0;
    }
  }

'i' is used for indexes access only.  Therefore it may be optimized out and
replaced with (indexes + i).  That's what GCC 5.3 actually does.  Here is a
GIMPLE for GCC 5.3 after ivopts:

  <bb 3>:
  # j_23 = PHI <j_2(7), 0(2)>
  # ivtmp.14_21 = PHI <ivtmp.14_20(7), ivtmp.14_19(2)>
  _6 = (void *) ivtmp.14_21;
  _9 = MEM[base: _6, offset: 4B];
  _10 = (unsigned int) _9;
  _11 = _10 * 2;
  _13 = data_12(D) + _11;
  _14 = *_13;
  if (_14 != 0)
    goto <bb 4>;
  else
    goto <bb 5>;

  <bb 4>:
  j_15 = j_23 + 1;
  goto <bb 6>;

  <bb 5>:
  _18 = p (j_23);
  if (_18 != 0)
    goto <bb 8>;
  else
    goto <bb 6>;

  <bb 6>:
  # j_2 = PHI <j_15(4), 0(5)>
  ivtmp.14_20 = ivtmp.14_21 + 4;
  if (ivtmp.14_20 != _1)
    goto <bb 7>;
  else
    goto <bb 8>;

  <bb 7>:
  goto <bb 3>;

But GCC 6 doesn't do it starting from r230647 resulting in additional
address computation and increased registers pressure. Here is a GIMPLE
for GCC6 after ivopts:

  <bb 3>:
  # i_23 = PHI <i_16(7), 1(2)>
  # j_24 = PHI <j_2(7), 0(2)>
  _21 = (sizetype) i_23;
  _20 = _21 * 4;
  _9 = MEM[symbol: indexes, index: _20, offset: 0B];
  _10 = (unsigned int) _9;
  _11 = _10 * 2;
  _13 = data_12(D) + _11;
  _14 = *_13;
  if (_14 != 0)
    goto <bb 4>;
  else
    goto <bb 5>;

  <bb 4>:
  j_15 = j_24 + 1;
  goto <bb 6>;

  <bb 5>:
  _18 = p (j_24);
  if (_18 != 0)
    goto <bb 8>;
  else
    goto <bb 6>;

  <bb 6>:
  # j_2 = PHI <j_15(4), 0(5)>
  i_16 = i_23 + 1;
  if (i_16 != 64)
    goto <bb 7>;
  else
    goto <bb 8>;

  <bb 7>:
  goto <bb 3>;

Testcase was made on base of EEMBC cjpegv2 benchmarks whose loop
has a ~15% performance loss on Silvermont due to this issue.

In attached testcase I put __asm__ to emulate register pressure and
demonstrate resulting load in a regressed loop version for i386.

Here is how I build the test:

gcc -S -m32 -fPIE -pie -O2 test.c

Used compiler:

Target: x86_64-pc-linux-gnu
Configured with: /export/users/gnutester/stability/svn/trunk/configure
--with-arch=corei7 --with-cpu=corei7 --enable-clocale=gnu --with-system-zlib
--enable-shared --with-demangler-in-ld --enable-cloog-backend=isl
--with-fpmath=sse --with-pkgversion=Revision=231837
--prefix=/export/users/gnutester/stability/work/trunk/64/install
--enable-languages=c,c++,fortran,java,lto
Thread model: posix
gcc version 6.0.0 20151218 (experimental) (Revision=231837)

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]