Bug 69677 - [6 Regression] bootstrap failed with --with-arch=corei7 --with-cpu=corei7
Summary: [6 Regression] bootstrap failed with --with-arch=corei7 --with-cpu=corei7
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: bootstrap (show other bugs)
Version: 6.0
: P1 normal
Target Milestone: 6.0
Assignee: Jakub Jelinek
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-02-04 19:45 UTC by H.J. Lu
Modified: 2016-02-05 16:24 UTC (History)
2 users (show)

See Also:
Host:
Target: i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-02-04 00:00:00


Attachments
gcc6-pr69677.patch (805 bytes, patch)
2016-02-04 21:16 UTC, Jakub Jelinek
Details | Diff
A different patch (1.63 KB, patch)
2016-02-05 03:38 UTC, H.J. Lu
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description H.J. Lu 2016-02-04 19:45:25 UTC
On ia32, r233128 failed to bootstrap when configured with

--with-arch=corei7 --with-cpu=corei7 --prefix=/usr/6.0.0 --enable-clocale=gnu --with-system-zlib --enable-shared --with-demangler-in-ld --enable-libmpx i686-linux --with-fpmath=sse --enable-languages=c,c++,fortran,java,lto,objc

https://gcc.gnu.org/ml/gcc-regression/2016-02/msg00068.html

[hjl@gnu-skl-1 gcc]$ cat /tmp/x.i
typedef int DItype __attribute__ ((mode (DI)));
typedef int SItype __attribute__ ((mode (SI)));
typedef unsigned int USItype __attribute__ ((mode (SI)));
struct DWstruct {SItype low, high;};
typedef union
{
  struct DWstruct s;
  DItype ll;
} DWunion;
DItype
__negdi2 (DItype u)
{
  const DWunion uu = {.ll = u};
  const DWunion w = { {.low = -uu.s.low,
         .high = -uu.s.high - ((USItype) -uu.s.low > 0) } };

  return w.ll;
}
[hjl@gnu-skl-1 gcc]$ ./xgcc -B./ -S -O2 /tmp/x.i
/tmp/x.i: In function ‘__negdi2’:
/tmp/x.i:18:1: internal compiler error: Segmentation fault
 }
 ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.
[hjl@gnu-skl-1 gcc]$ 

(gdb) bt
#0  0x086ffe63 in val_signbit_p(machine_mode, unsigned long long) ()
#1  0x08d73438 in combine_simplify_rtx(rtx_def*, machine_mode, int, int) ()
#2  0x08d750ad in subst(rtx_def*, rtx_def*, rtx_def*, int, int, int) ()
#3  0x08d74e1f in subst(rtx_def*, rtx_def*, rtx_def*, int, int, int) ()
#4  0x08d76327 in try_combine(rtx_insn*, rtx_insn*, rtx_insn*, rtx_insn*, int*, rtx_insn*) ()
#5  0x08d7c4b1 in (anonymous namespace)::pass_combine::execute(function*) ()
#6  0x08662051 in execute_one_pass(opt_pass*) ()
#7  0x08662669 in execute_pass_list_1(opt_pass*) ()
#8  0x0866267c in execute_pass_list_1(opt_pass*) ()
#9  0x086626c9 in execute_pass_list(function*, opt_pass*) ()
#10 0x08358b33 in cgraph_node::expand() ()
#11 0x0835a3e4 in symbol_table::compile() [clone .part.51] ()
#12 0x0835c73f in symbol_table::finalize_compilation_unit() ()
#13 0x0872b897 in compile_file() ()
#14 0x081d6871 in toplev::main(int, char**) ()
#15 0x081d8911 in main ()
(gdb) disass
Dump of assembler code for function _Z13val_signbit_p12machine_modey:
   0x086ffe10 <+0>:	push   %ebx
   0x086ffe11 <+1>:	xor    %eax,%eax
   0x086ffe13 <+3>:	sub    $0x10,%esp
   0x086ffe16 <+6>:	mov    0x18(%esp),%edx
   0x086ffe1a <+10>:	mov    0x1c(%esp),%ecx
   0x086ffe1e <+14>:	mov    0x20(%esp),%ebx
   0x086ffe22 <+18>:	cmpb   $0x2,0x8f99440(%edx)
   0x086ffe29 <+25>:	mov    %ecx,(%esp)
   0x086ffe2c <+28>:	mov    %ebx,0x4(%esp)
   0x086ffe30 <+32>:	je     0x86ffe40 <_Z13val_signbit_p12machine_modey+48>
   0x086ffe32 <+34>:	add    $0x10,%esp
   0x086ffe35 <+37>:	pop    %ebx
   0x086ffe36 <+38>:	ret    
   0x086ffe37 <+39>:	mov    %esi,%esi
   0x086ffe39 <+41>:	lea    0x0(%edi,%eiz,1),%edi
   0x086ffe40 <+48>:	movzwl 0x8f99360(%edx,%edx,1),%ecx
   0x086ffe48 <+56>:	sub    $0x1,%ecx
   0x086ffe4b <+59>:	cmp    $0x3f,%ecx
   0x086ffe4e <+62>:	ja     0x86ffe32 <_Z13val_signbit_p12machine_modey+34>
   0x086ffe50 <+64>:	movq   0x8f98ea0(,%edx,8),%xmm0
   0x086ffe59 <+73>:	xor    %eax,%eax
   0x086ffe5b <+75>:	xor    %edx,%edx
---Type <return> to continue, or q <return> to quit---
   0x086ffe5d <+77>:	test   $0x20,%cl
   0x086ffe60 <+80>:	sete   %al
=> 0x086ffe63 <+83>:	movdqa (%esp),%xmm2
   0x086ffe68 <+88>:	setne  %dl
   0x086ffe6b <+91>:	shl    %cl,%eax
   0x086ffe6d <+93>:	shl    %cl,%edx
   0x086ffe6f <+95>:	pand   %xmm0,%xmm2
   0x086ffe73 <+99>:	movd   %eax,%xmm0
   0x086ffe77 <+103>:	pinsrd $0x1,%edx,%xmm0
   0x086ffe7d <+109>:	pxor   %xmm2,%xmm0
   0x086ffe81 <+113>:	punpcklqdq %xmm0,%xmm0
   0x086ffe85 <+117>:	ptest  %xmm0,%xmm0
   0x086ffe8a <+122>:	sete   %al
   0x086ffe8d <+125>:	add    $0x10,%esp
   0x086ffe90 <+128>:	pop    %ebx
   0x086ffe91 <+129>:	ret    
End of assembler dump.
(gdb) p $esp
$1 = (void *) 0xffffcb68
(gdb)
Comment 1 Uroš Bizjak 2016-02-04 19:58:44 UTC
(In reply to H.J. Lu from comment #0)

...
>    0x086ffe29 <+25>:	mov    %ecx,(%esp)
>    0x086ffe2c <+28>:	mov    %ebx,0x4(%esp)
...
> => 0x086ffe63 <+83>:	movdqa (%esp),%xmm2
...

> (gdb) p $esp
> $1 = (void *) 0xffffcb68

This looks like a problem in STV pass itself. Here we load DImode value, so movq should be used, not movdqa.
Comment 2 Uroš Bizjak 2016-02-04 20:01:32 UTC
In addition, movdqa loads uninitialized stack location to high part of the SSE reg.

Adding CC.
Comment 3 Jakub Jelinek 2016-02-04 20:12:24 UTC
x86_64 gcc doesn't ICE on this with -m32 -O2 -march=corei7 -mtune=corei7 -mfpmath=sse, nor i686-linux one.
Comment 4 H.J. Lu 2016-02-04 20:18:00 UTC
(In reply to Jakub Jelinek from comment #3)
> x86_64 gcc doesn't ICE on this with -m32 -O2 -march=corei7 -mtune=corei7
> -mfpmath=sse, nor i686-linux one.

x86_64 gcc doesn't use STV.
Comment 5 Jakub Jelinek 2016-02-04 20:51:53 UTC
Ah, I see, simplify-rtx.c is miscompiled.
So, it could be just the
convert_scalars_to_vector
hunk of the patch, because STV is clearly enabled in there.
If we have guaranteed that both preferred and incoming stack boundaries are at least 64, can we readd that hunk back?  Though, as it uses 128 instead of 64, perhaps we should disable TARGET_STV whenever the preferred/incoming stack boundaries are smaller than 128 bits and readd the hunk in convert_scalars_to_vector.
Comment 6 H.J. Lu 2016-02-04 21:12:51 UTC
STV turns:

insn 21 19 23 4 (parallel [
            (set (reg:DI 102 [ val ])
                (and:DI (reg/v:DI 97 [ val ])
                    (mem/u:DI (plus:SI (mult:SI (reg/v:SI 96 [ mode ])
                                (const_int 8 [0x8]))
                            (symbol_ref:SI ("mode_mask_array") [flags 0x40]  <var_decl 0xf517e160 mode_mask_array>)) [13 mode_mask_array S8 A64])))
            (clobber (reg:CC 17 flags))
        ]) /export/gnu/import/git/gcc-regression/gcc/gcc/simplify-rtx.c:134 332 {*anddi3_doubleword}
     (expr_list:REG_DEAD (reg/v:DI 97 [ val ])
        (expr_list:REG_DEAD (reg/v:SI 96 [ mode ])
            (expr_list:REG_UNUSED (reg:CC 17 flags)
                (nil)))))
(insn 23 21 24 4 (parallel [
            (set (reg:DI 103)
                (ashift:DI (const_int 1 [0x1])
                    (subreg:QI (reg:SI 91 [ _8 ]) 0)))
            (clobber (reg:CC 17 flags))
        ]) /export/gnu/import/git/gcc-regression/gcc/gcc/simplify-rtx.c:135 449 {*ashldi3_doubleword}
     (expr_list:REG_DEAD (reg:SI 91 [ _8 ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))

into
(insn 47 19 21 4 (set (reg:DI 111)
        (mem/u:DI (plus:SI (mult:SI (reg/v:SI 96 [ mode ])
                    (const_int 8 [0x8]))
                (symbol_ref:SI ("mode_mask_array") [flags 0x40]  <var_decl 0xf517e160 mode_mask_array>)) [13 mode_mask_array S8 A64])) /export/gnu/import/git/gcc-regression/gcc/gcc/simplify-rtx.c:134 -1
     (nil))
(insn 21 47 23 4 (set (subreg:V2DI (reg:DI 102 [ val ]) 0)
        (and:V2DI (subreg:V2DI (reg/v:DI 97 [ val ]) 0)
            (subreg:V2DI (reg:DI 111) 0))) /export/gnu/import/git/gcc-regression/gcc/gcc/simplify-rtx.c:134 3253 {*andv2di3}
     (expr_list:REG_DEAD (reg/v:DI 97 [ val ])
        (expr_list:REG_DEAD (reg/v:SI 96 [ mode ])
            (expr_list:REG_UNUSED (reg:CC 17 flags)
                (nil)))))
(insn 23 21 45 4 (parallel [
            (set (reg:DI 103)
                (ashift:DI (const_int 1 [0x1])
                    (subreg:QI (reg:SI 91 [ _8 ]) 0)))
            (clobber (reg:CC 17 flags))
        ]) /export/gnu/import/git/gcc-regression/gcc/gcc/simplify-rtx.c:135 449 {*ashldi3_doubleword}
     (expr_list:REG_DEAD (reg:SI 91 [ _8 ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))

So STV requires 16 byte stack alignment and uses 16 byte stack alignment.
Comment 7 Jakub Jelinek 2016-02-04 21:16:20 UTC
Created attachment 37584 [details]
gcc6-pr69677.patch

I'm currently bootstrapping/regtesting --with-arch=corei7 --with-cpu=corei7 --with-fpmath=sse i686-linux compiler with this patch, and got already well past the spot where it previously ICEd.
The patch should restore STV behavior to the previous one, except for the non-default lower preferred/incoming stack boundaries.
Comment 8 H.J. Lu 2016-02-04 21:25:00 UTC
(In reply to Jakub Jelinek from comment #7)
> Created attachment 37584 [details]
> gcc6-pr69677.patch
> 
> I'm currently bootstrapping/regtesting --with-arch=corei7 --with-cpu=corei7
> --with-fpmath=sse i686-linux compiler with this patch, and got already well
> past the spot where it previously ICEd.
> The patch should restore STV behavior to the previous one, except for the
> non-default lower preferred/incoming stack boundaries.

This is wrong and I don't think it is needed. Assert

crtl->stack_alignment_needed >= 128

is OK. Change it isn't allowed after rtl->stack_realign_processed is set.

@@ -3588,6 +3588,16 @@ convert_scalars_to_vector ()
   bitmap_obstack_release (NULL);
   df_process_deferred_rescans ();
 
+  /* Conversion means we may have 128bit register spills/fills
+     which require aligned stack.  */
+  if (converted_insns)
+    {
+      if (crtl->stack_alignment_needed < 128)
+	crtl->stack_alignment_needed = 128;
+      if (crtl->stack_alignment_estimated < 128)
+	crtl->stack_alignment_estimated = 128;
+    }
+
   return 0;
 }
Comment 9 H.J. Lu 2016-02-05 03:23:04 UTC
Another problem.  STV is disabled even when stack is aligned:

[hjl@gnu-skl-1 pr69677]$ cat x.i
struct bar
{
  int i[16] __attribute__ ((aligned(16)));
};

extern void fn2 (void);
long long a, b;

struct bar
fn1 (struct bar x)
{
  long long c = a;
  a = b ^ a;
  fn2 ();
  a = c;
  return x;
}
[hjl@gnu-skl-1 pr69677]$ make x.s
/export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc -B/export/build/gnu/gcc/build-x86_64-linux/gcc/ -m32 -O2 -msse2 -mno-accumulate-outgoing-args -mpreferred-stack-boundary=2 -S x.i
[hjl@gnu-skl-1 pr69677]$ cat x.s
	.file	"x.i"
	.text
	.p2align 4,,15
	.globl	fn1
	.type	fn1, @function
fn1:
.LFB0:
	.cfi_startproc
	pushl	%ebp
	.cfi_def_cfa_offset 8
	.cfi_offset 5, -8
	movl	%esp, %ebp
	.cfi_def_cfa_register 5
	pushl	%edi
	pushl	%esi
	pushl	%ebx
	pushl	%ecx
	.cfi_offset 7, -12
	.cfi_offset 6, -16
	.cfi_offset 3, -20
	.cfi_offset 1, -24
	leal	8(%ebp), %ecx
	subl	$4, %esp
	movl	a, %eax
	movl	a+4, %edi
	movl	(%ecx), %ebx
	movl	%ecx, %esi
	movl	%eax, %edx
	xorl	b, %edx
	movl	%eax, -20(%ebp)
	movl	%edx, a
	movl	b+4, %edx
	xorl	%edi, %edx
	movl	%edx, a+4
	call	fn2
	movl	-20(%ebp), %eax
	movl	%edi, a+4
	movl	%eax, a
	movl	4(%esi), %eax
	movl	%eax, (%ebx)
	movl	8(%esi), %eax
	movl	%eax, 4(%ebx)
	movl	12(%esi), %eax
	movl	%eax, 8(%ebx)
	movl	16(%esi), %eax
	movl	%eax, 12(%ebx)
	movl	20(%esi), %eax
	movl	%eax, 16(%ebx)
	movl	24(%esi), %eax
	movl	%eax, 20(%ebx)
	movl	28(%esi), %eax
	movl	%eax, 24(%ebx)
	movl	32(%esi), %eax
	movl	%eax, 28(%ebx)
	movl	36(%esi), %eax
	movl	%eax, 32(%ebx)
	movl	40(%esi), %eax
	movl	%eax, 36(%ebx)
	movl	44(%esi), %eax
	movl	%eax, 40(%ebx)
	movl	48(%esi), %eax
	movl	%eax, 44(%ebx)
	movl	52(%esi), %eax
	movl	%eax, 48(%ebx)
	movl	56(%esi), %eax
	movl	%eax, 52(%ebx)
	movl	60(%esi), %eax
	movl	%eax, 56(%ebx)
	movl	64(%esi), %eax
	movl	%eax, 60(%ebx)
	addl	$4, %esp
	movl	%ebx, %eax
	popl	%ecx
	.cfi_restore 1
	popl	%ebx
	.cfi_restore 3
	popl	%esi
	.cfi_restore 6
	popl	%edi
	.cfi_restore 7
	popl	%ebp
	.cfi_restore 5
	.cfi_def_cfa 4, 4
	ret	$4
	.cfi_endproc
.LFE0:
	.size	fn1, .-fn1
	.comm	b,8,8
	.comm	a,8,8
	.ident	"GCC: (GNU) 6.0.0 20160205 (experimental)"
	.section	.note.GNU-stack,"",@progbits
[hjl@gnu-skl-1 pr69677]$
Comment 10 H.J. Lu 2016-02-05 03:38:01 UTC
Created attachment 37589 [details]
A different patch

I am testing this.
Comment 11 Jakub Jelinek 2016-02-05 08:52:31 UTC
(In reply to H.J. Lu from comment #9)
> Another problem.  STV is disabled even when stack is aligned:

See the other PR, that has been discussed as one of the possible approaches, but as TARGET_STV does not affect just the stv pass, but also expansion, it is not at all clear which of the approaches is better.  Therefore, it was agreed on to disable STV in these rare cases altogether for GCC 6, and for GCC 7 somebody who cares enough can perform appropriate testing/benchmarking and tweak it.
Comment 12 Jakub Jelinek 2016-02-05 09:23:35 UTC
Author: jakub
Date: Fri Feb  5 09:23:03 2016
New Revision: 233167

URL: https://gcc.gnu.org/viewcvs?rev=233167&root=gcc&view=rev
Log:
	PR bootstrap/69677
	* config/i386/i386.c (convert_scalars_to_vector): Readd stack
	alignment fixes.
	(ix86_option_override_internal): Disable TARGET_STV even for
	-m{incoming,preferred}-stack-boundary=3.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
Comment 13 Jakub Jelinek 2016-02-05 09:24:02 UTC
Fixed.
Comment 14 Ilya Enkovich 2016-02-05 10:42:20 UTC
(In reply to H.J. Lu from comment #6)
> STV turns:
> 
> insn 21 19 23 4 (parallel [
>             (set (reg:DI 102 [ val ])
>                 (and:DI (reg/v:DI 97 [ val ])
>                     (mem/u:DI (plus:SI (mult:SI (reg/v:SI 96 [ mode ])
>                                 (const_int 8 [0x8]))
>                             (symbol_ref:SI ("mode_mask_array") [flags 0x40] 
> <var_decl 0xf517e160 mode_mask_array>)) [13 mode_mask_array S8 A64])))
>             (clobber (reg:CC 17 flags))
>         ]) /export/gnu/import/git/gcc-regression/gcc/gcc/simplify-rtx.c:134
> 332 {*anddi3_doubleword}
>      (expr_list:REG_DEAD (reg/v:DI 97 [ val ])
>         (expr_list:REG_DEAD (reg/v:SI 96 [ mode ])
>             (expr_list:REG_UNUSED (reg:CC 17 flags)
>                 (nil)))))
> (insn 23 21 24 4 (parallel [
>             (set (reg:DI 103)
>                 (ashift:DI (const_int 1 [0x1])
>                     (subreg:QI (reg:SI 91 [ _8 ]) 0)))
>             (clobber (reg:CC 17 flags))
>         ]) /export/gnu/import/git/gcc-regression/gcc/gcc/simplify-rtx.c:135
> 449 {*ashldi3_doubleword}
>      (expr_list:REG_DEAD (reg:SI 91 [ _8 ])
>         (expr_list:REG_UNUSED (reg:CC 17 flags)
>             (nil))))
> 
> into
> (insn 47 19 21 4 (set (reg:DI 111)
>         (mem/u:DI (plus:SI (mult:SI (reg/v:SI 96 [ mode ])
>                     (const_int 8 [0x8]))
>                 (symbol_ref:SI ("mode_mask_array") [flags 0x40]  <var_decl
> 0xf517e160 mode_mask_array>)) [13 mode_mask_array S8 A64]))
> /export/gnu/import/git/gcc-regression/gcc/gcc/simplify-rtx.c:134 -1
>      (nil))
> (insn 21 47 23 4 (set (subreg:V2DI (reg:DI 102 [ val ]) 0)
>         (and:V2DI (subreg:V2DI (reg/v:DI 97 [ val ]) 0)
>             (subreg:V2DI (reg:DI 111) 0)))
> /export/gnu/import/git/gcc-regression/gcc/gcc/simplify-rtx.c:134 3253
> {*andv2di3}
>      (expr_list:REG_DEAD (reg/v:DI 97 [ val ])
>         (expr_list:REG_DEAD (reg/v:SI 96 [ mode ])
>             (expr_list:REG_UNUSED (reg:CC 17 flags)
>                 (nil)))))
> (insn 23 21 45 4 (parallel [
>             (set (reg:DI 103)
>                 (ashift:DI (const_int 1 [0x1])
>                     (subreg:QI (reg:SI 91 [ _8 ]) 0)))
>             (clobber (reg:CC 17 flags))
>         ]) /export/gnu/import/git/gcc-regression/gcc/gcc/simplify-rtx.c:135
> 449 {*ashldi3_doubleword}
>      (expr_list:REG_DEAD (reg:SI 91 [ _8 ])
>         (expr_list:REG_UNUSED (reg:CC 17 flags)
>             (nil))))
> 
> So STV requires 16 byte stack alignment and uses 16 byte stack alignment.

I don't see how STV requires 16 byte stack alignment here.  All registers are DI here.  Emitted movdqa is not produced by STV, it seems like LRA bug which writes 8 bytes to stack but reads 16.
Comment 15 Ilya Enkovich 2016-02-05 10:50:29 UTC
(In reply to Jakub Jelinek from comment #13)
> Fixed.

I think you just hide LRA issue disabling STV and LRA still may generate incorrect register fill
Comment 16 Uroš Bizjak 2016-02-05 11:21:35 UTC
(In reply to Ilya Enkovich from comment #15)
> (In reply to Jakub Jelinek from comment #13)
> > Fixed.
> 
> I think you just hide LRA issue disabling STV and LRA still may generate
> incorrect register fill

Yes, but stage 4 is not the time to solve these kind of bugs. As said, the fix disables STV for rare and unusual cases. I expect that proper fix will be developed during gcc-7 time, and this includes various RA fixes.

Previous patch relaxed alignment requirements to 64 bits, and this triggered RA bug. So, unless we want to delay release for a couple of weeks (a month?), we will leave alignment to 128 bits.

That doesn't mean that incorrect register fill RA bug should't be fixed during stage-4. Do we have a small testcase that exposes it?
Comment 17 Ilya Enkovich 2016-02-05 13:59:56 UTC
(In reply to Uroš Bizjak from comment #16)
> That doesn't mean that incorrect register fill RA bug should't be fixed
> during stage-4. Do we have a small testcase that exposes it?

I submitted PR69693 for RA problem.
Comment 18 hjl@gcc.gnu.org 2016-02-05 16:24:38 UTC
Author: hjl
Date: Fri Feb  5 16:24:06 2016
New Revision: 233180

URL: https://gcc.gnu.org/viewcvs?rev=233180&root=gcc&view=rev
Log:
Add a testcase for PR target/69677

	PR target/69677
	* gcc.target/i386/pr69677.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/pr69677.c
Modified:
    trunk/gcc/testsuite/ChangeLog