Bug 78543 - [6 Regression] ICE in push_reload, at reload.c:1349 on powerpc64le-linux-gnu
Summary: [6 Regression] ICE in push_reload, at reload.c:1349 on powerpc64le-linux-gnu
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 6.2.1
: P3 normal
Target Milestone: 6.4
Assignee: Michael Meissner
URL: https://gcc.gnu.org/ml/gcc-patches/20...
Keywords: ice-on-valid-code
Depends on:
Blocks:
 
Reported: 2016-11-26 14:17 UTC by Matthias Klose
Modified: 2017-03-27 19:43 UTC (History)
7 users (show)

See Also:
Host:
Target: powerpc64le-linux-gnu
Build:
Known to work: 5.4.1, 7.0
Known to fail: 6.2.1
Last reconfirmed: 2017-03-06 00:00:00


Attachments
gcc dump with pre processed file (59.76 KB, text/plain)
2016-11-28 20:10 UTC, Breno Leitao
Details
preprocessed source (fileinfo.i) (164.45 KB, application/x-xz)
2017-03-05 10:24 UTC, Matthias Klose
Details
Proposed patch to fix the problem (2.06 KB, patch)
2017-03-23 14:28 UTC, Michael Meissner
Details | Diff
Proposed patch to fix the problem (rework) (3.08 KB, patch)
2017-03-24 22:58 UTC, Michael Meissner
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Matthias Klose 2016-11-26 14:17:02 UTC
seen with the gcc-6-branch r242827, works with 5 and the trunk, works with -O2 instead of -O3:

$ cat message_print_format_dig.i 
char b[] = {};
short c, d;
int e, f;
short *g;
void h();
int i();
int t() {
  short j, s;
  unsigned k;
  int l;
  char m[65535];
  for (; s;)
    if (i())
      return e;
  for (int o = 1; o < 4; o++)
    for (unsigned n = d; n; n--) {
      if (i(f, m, sizeof(m)) < 0)
        return e;
      char p = 0;
      c = *g;
      short q = 0, r = ({
                     j = *g;
                     (short)(j >> 8 & 255u | j << 8);
                   });
      unsigned a = k + 4;
      if (b[o] && d) {
        h(l, "%", &p);
        h(l, "%7", &q);
        while (k < a)
          h(c, r);
        h("");
      }
    }
}
$ gcc -O3 -c message_print_format_dig.i 
message_print_format_dig.i: In function 't':
message_print_format_dig.i:34:1: internal compiler error: in push_reload, at reload.c:1349
 }
 ^
Please submit a full bug report,
with preprocessed source if appropriate.
Comment 1 Arseny Solokha 2016-11-28 06:34:27 UTC
*** Bug 77346 has been marked as a duplicate of this bug. ***
Comment 2 Peter Bergner 2016-11-28 14:26:08 UTC
I'll have a look.
Comment 3 Peter Bergner 2016-11-28 17:06:58 UTC
I cannot recreate this using r242827 of the FSF 6 branch.  What configure options are you using?   ...and in case it makes a difference (it can), what binutils version?
Comment 4 Matthias Klose 2016-11-28 19:06:09 UTC
that was binutils 2.27.51.20161124, and the Debian GCC version (unstable), configured with
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 6.2.1-5ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-6/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-6 --program-prefix=powerpc64le-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libquadmath --enable-objc-gc --enable-secureplt --with-cpu=power8 --enable-targets=powerpcle-linux --disable-multilib --enable-multiarch --disable-werror --with-long-double-128 --enable-checking=release --build=powerpc64le-linux-gnu --host=powerpc64le-linux-gnu --target=powerpc64le-linux-gnu
Comment 5 Breno Leitao 2016-11-28 19:42:18 UTC
I can reproduce this (or a very similar version of this) bug on my system. 

I am seeing this failure when compiling package yadifa-2.2.2 on Debian testing with (Debian 6.2.1-4) 6.2.1 20161119.

If I use -O2 instead of -O3, I see no failure.
Comment 6 Peter Bergner 2016-11-28 19:55:12 UTC
(In reply to Breno Leitao from comment #5)
> I can reproduce this (or a very similar version of this) bug on my system. 
> 
> I am seeing this failure when compiling package yadifa-2.2.2 on Debian
> testing with (Debian 6.2.1-4) 6.2.1 20161119.
> 
> If I use -O2 instead of -O3, I see no failure.

Can you attach the preprocessed source of the file that is failing?
Comment 7 Breno Leitao 2016-11-28 20:10:51 UTC
Created attachment 40182 [details]
gcc dump with pre processed file
Comment 8 Peter Bergner 2016-11-28 21:53:48 UTC
(In reply to Breno Leitao from comment #7)
> Created attachment 40182 [details]
> gcc dump with pre processed file

Ok, this on I have recreated.  I'll dig into it.
Comment 9 Jakub Jelinek 2016-12-21 10:54:55 UTC
GCC 6.3 is being released, adjusting target milestone.
Comment 10 Matthias Klose 2017-03-05 10:23:40 UTC
here is one more build failure seen with 20170221, with -O1 (works with -O0), seen in the gtk-gnutella package:

$ gcc-6 -c -O1 -msecure-plt -mcpu=power8 fileinfo.i
fileinfo.c: In function 'file_info_retrieve_binary':
fileinfo.c:2113:1: internal compiler error: in push_reload, at reload.c:1349
Please submit a full bug report,
with preprocessed source if appropriate.

$ cat fileinfo.i
typedef long a;
enum c { e, f, g, h, i, ab } j();
int l, n, o, p;
a q, r;
void *memcpy();
void b();
static int k(int *s) {
  int m;
  if (j(&m))
    *s = m;
  return !0;
}
void d(char s) {
  int af[4];
  int ag;
  enum c ah;
  char ai[24 << 11];
  unsigned aj;
  if (!k(&aj))
    goto ak;
  for (;;) {
    if (!k(&ag))
      goto ak;
    switch (ah) {
    case e:
      b("");
      b("bad length %d for GUID in fileinfo v%u for \"%s\"");
    case i:
      b("bad length %d for TTH in fileinfo v%u for \"%s\"", aj);
    case ab:
      if (ag % 24)
        b("for \"%s\"", s);
    case f:
      if (20 == ag)
      case h:
        if (20 == ag)
          o = 0;
      break;
    case g:
      memcpy(af, ai, sizeof af);
      b();
      if (p) {
        a al, am;
        r = al << 2 | am;
        n = af[2];
        al = n;
        l = __builtin_bswap32(af[3]);
        am = q = n | l;
      }
    default:
      b("%s0 unhandled field ID %u 0", __func__);
    }
  }
ak:;
}
Comment 11 Matthias Klose 2017-03-05 10:24:54 UTC
Created attachment 40883 [details]
preprocessed source (fileinfo.i)
Comment 12 Matthias Klose 2017-03-06 14:02:15 UTC
still reproducible with r245899 on the gcc-6-branch
Comment 13 Peter Bergner 2017-03-06 15:53:29 UTC
(In reply to Matthias Klose from comment #10)
> here is one more build failure seen with 20170221, with -O1 (works with
> -O0), seen in the gtk-gnutella package:

I have a patch that fixes this and the original test case.
Comment 14 Peter Bergner 2017-03-06 18:17:59 UTC
Patch submitted.
Comment 15 Peter Bergner 2017-03-10 18:27:20 UTC
(In reply to Peter Bergner from comment #14)
> Patch submitted.

Uli commented that the patch needs reworking, so I'm back to digging into this.
Comment 16 Breno Leitao 2017-03-21 21:00:37 UTC
If it helps, the problem is reproducible on some other packages as well. Here is another example:

https://nopaste.linux-dev.org/?1122124
Comment 17 Michael Meissner 2017-03-22 23:08:10 UTC
While Peter is away, I looked into this.

The bug happens when the RELOAD register allocator tries to do a bswap operation.  I believe in the original test case, the compiler identifies the case as something that could be optimized to __builtin_bswap16.  In the second case, the code has an explicit __builtin_bswap32 call.

This bug shows up on little endian systems and not on big endian systems.  This is due to the bswap instruction being called with a SUBREG.  Reload gets confused with this, and LRA decides to do a store, so that it can then do a LWBRX instruction.

If we copy the SUBREG to a new register, it works for the second case provided.  I can't get the first case to fail.
Comment 18 Michael Meissner 2017-03-23 14:28:51 UTC
Created attachment 41035 [details]
Proposed patch to fix the problem

This patch does not allow SUBREG's in bswap functions, which confuses the register allocators.  It has passed without regression on a little endian power8 system, and I'm testing it on a big endian power8 system and big endian power7 system.

I also will be testing the back port to gcc 6 (and later gcc 5).
Comment 19 Michael Meissner 2017-03-24 18:24:18 UTC
On Tue, Mar 21, 2017 at 09:00:37PM +0000, brenohl at br dot ibm.com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78543
> 
> --- Comment #16 from Breno Leitao <brenohl at br dot ibm.com> ---
> If it helps, the problem is reproducible on some other packages as well. Here
> is another example:
> 
> https://nopaste.linux-dev.org/?1122124

Note, the source is not complete.  Please attach the file as a text file to
this bug report.
Comment 20 Michael Meissner 2017-03-24 22:58:51 UTC
Created attachment 41050 [details]
Proposed patch to fix the problem (rework)

This patch reworks the original patch I submitted, to try and make it less hacky.  It separates the bswap insns where there is hardware support into separate read, write, and register swap instructions. This is because the register allocators will try to push the bswap value in a register to the stack and do the load based swap with reverse bytes.  Reload fumbles in certain conditions.  LRA generates working code, but the store and the load with byte reverse from the same location, can slow things down compared to the operation on registers.

I only did this optimization where we had the hardware support (i.e. bswap for HImode all of the time, bswap for SImode all of the time, and bswap for DImode if we are executing 64-bit instructions and the machine has LDBRX/STDBRX -- power7 and newer/cell ppc).

I have done bootstrap builds on a little endian power8 system, on a big endian power8 system, and a big endian power7 system (both 32/64-bit support on this last system).  There were no regressions.

I am building the patches applied to gcc 6 right now.  The patches apply cleanly to gcc 6.  I suspect it will also build on gcc 5.

I built spec 2006 benchmarks with the compiler.  There are 12 benchmarks that generate one or more load/store with byte swap instructions (perlbench, gcc, gamess, milc, zeusmp, calculix, h264ref, tonto, omnetpp, wrf, sphinx3, xalancbmk).

I compared the instructions generated.  10 of the benchmarks generated the same instructions.

Milc generated 1 less load with byte swap instruction and 1 more store with byte swap instruction.

Sphinx3 generated 6 less load with byte swap instructions and 6 more store with byte swap instructions.

So I count this as the same level of byte swapping is being generated.
Comment 21 Michael Meissner 2017-03-24 23:07:05 UTC
The backport to GCC 6 also succeeded with no regressions on a little endian power8 system.
Comment 22 Michael Meissner 2017-03-27 19:19:31 UTC
Author: meissner
Date: Mon Mar 27 19:19:00 2017
New Revision: 246508

URL: https://gcc.gnu.org/viewcvs?rev=246508&root=gcc&view=rev
Log:
[gcc]
2017-03-27  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/78543
	* config/rs6000/rs6000.md (bswaphi2_extenddi): Combine bswap
	HImode and SImode with zero extend to DImode to one insn.
	(bswap<mode>2_extenddi): Likewise.
	(bswapsi2_extenddi): Likewise.
	(bswaphi2_extendsi): Likewise.
	(bswaphi2): Combine bswap HImode and SImode into one insn.
	Separate memory insns from swapping register.
	(bswapsi2): Likewise.
	(bswap<mode>2): Likewise.
	(bswaphi2_internal): Delete, no longer used.
	(bswapsi2_internal): Likewise.
	(bswap<mode>2_load): Split bswap HImode/SImode into separate load,
	store, and gpr<-gpr swap insns.
	(bswap<mode>2_store): Likewise.
	(bswaphi2_reg): Register only splitter, combine with the splitter.
	(bswaphi2 splitter): Likewise.
	(bswapsi2_reg): Likewise.
	(bswapsi2 splitter): Likewise.
	(bswapdi2): If we have the LDBRX and STDBRX instructions, split
	the insns into load, store, and register/register insns.
	(bswapdi2_ldbrx): Likewise.
	(bswapdi2_load): Likewise.
	(bswapdi2_store): Likewise.
	(bswapdi2_reg): Likewise.

[gcc/testsuite]
2017-03-27  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/78543
	* gcc.target/powerpc/pr78543.c: New test.


Added:
    trunk/gcc/testsuite/gcc.target/powerpc/pr78543.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/rs6000/rs6000.md
    trunk/gcc/testsuite/ChangeLog
Comment 23 Michael Meissner 2017-03-27 19:36:08 UTC
Author: meissner
Date: Mon Mar 27 19:35:35 2017
New Revision: 246509

URL: https://gcc.gnu.org/viewcvs?rev=246509&root=gcc&view=rev
Log:
[gcc]
2017-03-27  Michael Meissner  <meissner@linux.vnet.ibm.com>

	Back port from trunk
	2017-03-27  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/78543
	* config/rs6000/rs6000.md (bswaphi2_extenddi): Combine bswap
	HImode and SImode with zero extend to DImode to one insn.
	(bswap<mode>2_extenddi): Likewise.
	(bswapsi2_extenddi): Likewise.
	(bswaphi2_extendsi): Likewise.
	(bswaphi2): Combine bswap HImode and SImode into one insn.
	Separate memory insns from swapping register.
	(bswapsi2): Likewise.
	(bswap<mode>2): Likewise.
	(bswaphi2_internal): Delete, no longer used.
	(bswapsi2_internal): Likewise.
	(bswap<mode>2_load): Split bswap HImode/SImode into separate load,
	store, and gpr<-gpr swap insns.
	(bswap<mode>2_store): Likewise.
	(bswaphi2_reg): Register only splitter, combine with the splitter.
	(bswaphi2 splitter): Likewise.
	(bswapsi2_reg): Likewise.
	(bswapsi2 splitter): Likewise.
	(bswapdi2): If we have the LDBRX and STDBRX instructions, split
	the insns into load, store, and register/register insns.
	(bswapdi2_ldbrx): Likewise.
	(bswapdi2_load): Likewise.
	(bswapdi2_store): Likewise.
	(bswapdi2_reg): Likewise.

[gcc/testsuite]
2017-03-27  Michael Meissner  <meissner@linux.vnet.ibm.com>

	Back port from trunk
	2017-03-27  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/78543
	* gcc.target/powerpc/pr78543.c: New test.


Added:
    branches/gcc-6-branch/gcc/testsuite/gcc.target/powerpc/pr78543.c
      - copied unchanged from r246508, trunk/gcc/testsuite/gcc.target/powerpc/pr78543.c
Modified:
    branches/gcc-6-branch/gcc/ChangeLog
    branches/gcc-6-branch/gcc/config/rs6000/rs6000.md
    branches/gcc-6-branch/gcc/testsuite/ChangeLog
Comment 24 Michael Meissner 2017-03-27 19:43:09 UTC
Fixed in the trunk subversion id 246508, and on the GCC 6.x branch in subversion id 246509.