Bug 71294 - [6 Regression] ICE in gen_add2_insn, at optabs.c:4442 on powerpc64le-linux
Summary: [6 Regression] ICE in gen_add2_insn, at optabs.c:4442 on powerpc64le-linux
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 7.0
: P2 normal
Target Milestone: 6.4
Assignee: Michael Meissner
URL:
Keywords: ice-on-valid-code
Depends on:
Blocks:
 
Reported: 2016-05-26 15:22 UTC by Martin Liška
Modified: 2017-04-03 19:55 UTC (History)
8 users (show)

See Also:
Host:
Target: powerpc64le-linux
Build:
Known to work: 5.4.0, 7.0
Known to fail: 6.3.0
Last reconfirmed: 2016-05-26 00:00:00


Attachments
Proposed patch to fix the problem (1.64 KB, patch)
2016-05-26 23:42 UTC, Michael Meissner
Details | Diff
Proposed patch to fix the problem (1.46 KB, patch)
2017-03-15 20:08 UTC, Michael Meissner
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Liška 2016-05-26 15:22:39 UTC
Hello.

Following code snippet fails with:
./xgcc -B. -O3 -fstack-protector ice2.ii
ice2.ii: In function ‘void fn1()’:
ice2.ii:51:1: internal compiler error: in gen_add2_insn, at optabs.c:4442
 }
 ^
0xc00edb gen_add2_insn(rtx_def*, rtx_def*)
	../../gcc/optabs.c:4442
0xc768b4 gen_reload
	../../gcc/reload1.c:8710
0xc7a35e emit_input_reload_insns
	../../gcc/reload1.c:7664
0xc7b3bc do_input_reload
	../../gcc/reload1.c:7950
0xc7b3bc emit_reload_insns
	../../gcc/reload1.c:8142
0xc7dd69 reload_as_needed
	../../gcc/reload1.c:4661
0xc81a7b reload(rtx_insn*, int)
	../../gcc/reload1.c:1062
0xb39da2 do_reload
	../../gcc/ira.c:5393
0xb39da2 execute
	../../gcc/ira.c:5565

$ cat ice2.ii
class A;
template <typename _Tp, int m, int n> class B {
public:
  _Tp val[m * n];
};
class C {
public:
  C(A);
};
struct D {
  D();
  unsigned long &operator[](int);
  unsigned long *p;
};
class A {
public:
  template <typename _Tp, int m, int n> A(const B<_Tp, m, n> &, bool);
  int rows, cols;
  unsigned char *data;
  unsigned char *datastart;
  unsigned char *dataend;
  unsigned char *datalimit;
  D step;
};
template <typename _Tp, int m, int n>
A::A(const B<_Tp, m, n> &p1, bool)
    : rows(m), cols(n) {
  step[0] = cols * sizeof(_Tp);
  datastart = data = (unsigned char *)p1.val;
  datalimit = dataend = datastart + rows * step[0];
}
class F {
public:
  static void compute(C);
  template <typename _Tp, int m, int n, int nm>
  static void compute(const B<_Tp, m, n> &, B<_Tp, nm, 1> &, B<_Tp, m, nm> &,
                      B<_Tp, n, nm> &);
};
D::D() {}
unsigned long &D::operator[](int p1) { return p[p1]; }
template <typename _Tp, int m, int n, int nm>
void F::compute(const B<_Tp, m, n> &, B<_Tp, nm, 1> &, B<_Tp, m, nm> &,
                B<_Tp, n, nm> &p4) {
  A a(p4, false);
  compute(a);
}
void fn1() {
  B<double, 4, 4> b, c, e;
  B<double, 4, 1> d;
  F::compute(b, d, c, e);
}

gcc version 5.2.1 works fine.

Martin
Comment 1 Martin Liška 2016-05-26 16:06:30 UTC
Started with r230091 which enabled a vectorization, thus the bug looks latent:


Author: rguenth <rguenth@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Nov 10 09:43:54 2015 +0000

    2015-11-10  Richard Biener  <rguenther@suse.de>
    
        PR tree-optimization/56118
        * tree-vect-slp.c (vect_bb_vectorization_profitable_p): Make equal
        cost favor vectorized version.
    
        * gcc.target/i386/pr56118.c: New testcase.
Comment 2 Martin Liška 2016-05-26 16:15:02 UTC
tree-dump of the problematic function:

;; Function void fn1() (_Z3fn1v, funcdef_no=6, decl_uid=2937, cgraph_uid=4, symbol_order=4)

void fn1() ()
{
  struct A a;
  struct C D.3115;
  struct C D.3114;
  struct B e;
  long unsigned int _4;
  vector(2) long unsigned int vect_cst__5;
  long unsigned int _6;
  vector(2) long unsigned int vect_cst__20;

  <bb 2>:
  _6 = (long unsigned int) &e.val;
  vect_cst__20 = {_6, _6};
  _4 = (long unsigned int) &MEM[(void *)&e + 128B];
  vect_cst__5 = {_4, _4};
  MEM[(struct  &)&a] ={v} {CLOBBER};
  a.rows = 4;
  a.cols = 4;
  MEM[(struct  &)&a + 40] ={v} {CLOBBER};
  MEM[(unsigned char * *)&a + 8B] = vect_cst__20;
  MEM[(unsigned char * *)&a + 24B] = vect_cst__5;
  C::C (&D.3114, a);
  F::compute (D.3115);
  D.3114 ={v} {CLOBBER};
  a ={v} {CLOBBER};
  e ={v} {CLOBBER};
  return;

}

236r.ira:

(insn 13 29 33 2 (set (mem/c:V2DI (plus:DI (reg/f:DI 113 sfp)
                (reg:DI 166)) [5 MEM[(unsigned char * *)&a + 8B]+0 S16 A64])
        (vec_select:V2DI (reg:V2DI 163)
            (parallel:V2DI [
                    (const_int 1 [0x1])
                    (const_int 0 [0])
                ]))) /tmp/ice2.ii:29 841 {*vsx_stxvd2x2_le_v2di}
     (expr_list:REG_DEAD (reg:DI 166)
        (expr_list:REG_DEAD (reg:V2DI 163)
            (nil))))

237r.reload:

Reloads for insn # 13
Reload 0: reload_in (DI) = (plus:DI (reg/f:DI 1 1)
                                                    (const_int 240 [0xf0]))
	BASE_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 0)
	reload_in_reg: (plus:DI (reg/f:DI 1 1)
                                                    (const_int 240 [0xf0]))
	reload_reg_rtx: (reg:DI 10 10)
Reload 1: BASE_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 0), optional, can't combine, secondary_reload_p
Reload 2: GENERAL_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 0), optional, can't combine, secondary_reload_p
	secondary_out_reload = 1

	secondary_out_icode = reload_v2di_di_store
Reload 3: reload_out (V2DI) = (mem/c:V2DI (plus:DI (plus:DI (reg/f:DI 1 1)
                                                            (const_int 240 [0xf0]))
                                                        (reg:DI 9 9 [166])) [5 MEM[(unsigned char * *)&a + 8B]+0 S16 A64])
	NO_REGS, RELOAD_FOR_OUTPUT (opnum = 0), optional
	reload_out_reg: (mem/c:V2DI (plus:DI (plus:DI (reg/f:DI 1 1)
                                                            (const_int 240 [0xf0]))
                                                        (reg:DI 9 9 [166])) [5 MEM[(unsigned char * *)&a + 8B]
(crashes here)
Comment 3 Martin Liška 2016-05-26 16:16:47 UTC
The issues can be debugged with a cross-compiler:
../configure --enable-languages=c,c++ --disable-bootstrap --target=ppc64le-suse-linux
Comment 4 acsawdey 2016-05-26 21:18:28 UTC
I tested this with trunk 236751 and I see the ICE.

It does not fail if you add -mlra.

It also does not fail with -mcpu=power7.
Comment 5 Segher Boessenkool 2016-05-26 21:21:31 UTC
It also fails on BE.  It needs -O3 -fstack-protector -mcpu=power8 to fail.
Comment 6 Michael Meissner 2016-05-26 23:25:28 UTC
The reason it is failing due to GCC wanting to vectorize an address off of the soft frame pointer.  If you don't do automatic vectorization, or if you don't ask for the stack protection, it works.  It also works for LRA.
Comment 7 Michael Meissner 2016-05-26 23:42:50 UTC
Created attachment 38580 [details]
Proposed patch to fix the problem

This patch fixes the problem by copying frame related registers into a temporary if they are used as part of a SPLAT operation.
Comment 8 Bill Schmidt 2016-06-29 17:30:36 UTC
Just recording that this patch was rejected on the list at https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02136.html, so we still need a fix for this.
Comment 9 Martin Liška 2016-08-05 08:41:51 UTC
Fixed on trunk (r239105) as rs6000 switch to LRA.
Comment 10 Richard Biener 2016-08-22 08:21:08 UTC
GCC 6.2 is being released, adjusting target milestone.
Comment 11 Richard Biener 2016-08-22 08:22:11 UTC
GCC 6.2 is being released, adjusting target milestone.
Comment 12 Jakub Jelinek 2016-12-21 10:58:16 UTC
GCC 6.3 is being released, adjusting target milestone.
Comment 13 Michael Meissner 2017-03-15 18:16:13 UTC
FWIW, it does not fail for -mcpu=power7 or -mcpu=power9.  If you use -mcpu=power7, there is no direct move.  If you use -mcpu=power9, the MTVSRDD instruction is generated which bypasses the part that is failing under reload.
Comment 14 Bernd Schmidt 2017-03-15 19:06:03 UTC
I can't reproduce this, but that's probably because of
  cc1plus: warning: will not generate power8 instructions because assembler lacks power8 support

Binutils was just configured for ppc64le-linux - do I need any special options on the configure command line there?
Comment 15 Segher Boessenkool 2017-03-15 19:55:56 UTC
You need to build GCC with a new enough binutils, 2.24 I believe.
Comment 16 Michael Meissner 2017-03-15 19:59:57 UTC
You need power8 support for the bug to show itself.  In order to have power8 (ISA 2.07) support, you need a binutils that supports at least the power8 instructions.
Comment 17 Michael Meissner 2017-03-15 20:08:16 UTC
Created attachment 40980 [details]
Proposed patch to fix the problem
Comment 18 Michael Meissner 2017-03-16 20:09:53 UTC
Author: meissner
Date: Thu Mar 16 20:09:21 2017
New Revision: 246209

URL: https://gcc.gnu.org/viewcvs?rev=246209&root=gcc&view=rev
Log:
[gcc]
2017-03-16  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/71294
	* config/rs6000/vsx.md (vsx_splat_<mode>, VSX_D iterator): Allow a
	SPLAT operation on ISA 2.07 64-bit systems that have direct move,
	but no MTVSRDD support, by doing MTVSRD and XXPERMDI.

[gcc/testsuite]
2017-03-16  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/71294
	* g++.dg/pr71294.C: New test.


Added:
    trunk/gcc/testsuite/g++.dg/pr71294.C
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/rs6000/vsx.md
    trunk/gcc/testsuite/ChangeLog
Comment 19 Michael Meissner 2017-03-29 23:16:23 UTC
Author: meissner
Date: Wed Mar 29 23:15:51 2017
New Revision: 246577

URL: https://gcc.gnu.org/viewcvs?rev=246577&root=gcc&view=rev
Log:
[gcc]
2017-03-29  Michael Meissner  <meissner@linux.vnet.ibm.com>

	Back port from trunk
	2017-03-21  Aaron Sawdey  <acsawdey@linux.vnet.ibm.com>

	PR target/80123
	* doc/md.texi (Constraints): Document wA constraint.
	* config/rs6000/constraints.md (wA): New.
	* config/rs6000/rs6000.c (rs6000_debug_reg_global): Add wA reg_class.
	(rs6000_init_hard_regno_mode_ok): Init wA constraint.
	* config/rs6000/rs6000.h (RS6000_CONSTRAINT_wA): New.
	* config/rs6000/vsx.md (vsx_splat_<mode>): Use wA constraint.

	2017-03-16  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/71294
	* config/rs6000/vsx.md (vsx_splat_<mode>, VSX_D iterator): Allow a
	SPLAT operation on ISA 2.07 64-bit systems that have direct move,
	but no MTVSRDD support, by doing MTVSRD and XXPERMDI.

[gcc/testsuite]
2017-03-29  Michael Meissner  <meissner@linux.vnet.ibm.com>

	Back port from trunk
	2017-03-16  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/71294
	* g++.dg/pr71294.C: New test.


Added:
    branches/gcc-6-branch/gcc/testsuite/g++.dg/pr71294.C
      - copied unchanged from r246332, trunk/gcc/testsuite/g++.dg/pr71294.C
Modified:
    branches/gcc-6-branch/gcc/ChangeLog
    branches/gcc-6-branch/gcc/config/rs6000/constraints.md
    branches/gcc-6-branch/gcc/config/rs6000/rs6000.c
    branches/gcc-6-branch/gcc/config/rs6000/rs6000.h
    branches/gcc-6-branch/gcc/config/rs6000/vsx.md
    branches/gcc-6-branch/gcc/doc/md.texi
    branches/gcc-6-branch/gcc/testsuite/ChangeLog
Comment 20 Michael Meissner 2017-04-03 19:55:43 UTC
Trunk fixed on March 16th, gcc 6 branch on March 29th.