On ia32, r233128 failed to bootstrap when configured with --with-arch=corei7 --with-cpu=corei7 --prefix=/usr/6.0.0 --enable-clocale=gnu --with-system-zlib --enable-shared --with-demangler-in-ld --enable-libmpx i686-linux --with-fpmath=sse --enable-languages=c,c++,fortran,java,lto,objc https://gcc.gnu.org/ml/gcc-regression/2016-02/msg00068.html [hjl@gnu-skl-1 gcc]$ cat /tmp/x.i typedef int DItype __attribute__ ((mode (DI))); typedef int SItype __attribute__ ((mode (SI))); typedef unsigned int USItype __attribute__ ((mode (SI))); struct DWstruct {SItype low, high;}; typedef union { struct DWstruct s; DItype ll; } DWunion; DItype __negdi2 (DItype u) { const DWunion uu = {.ll = u}; const DWunion w = { {.low = -uu.s.low, .high = -uu.s.high - ((USItype) -uu.s.low > 0) } }; return w.ll; } [hjl@gnu-skl-1 gcc]$ ./xgcc -B./ -S -O2 /tmp/x.i /tmp/x.i: In function ‘__negdi2’: /tmp/x.i:18:1: internal compiler error: Segmentation fault } ^ Please submit a full bug report, with preprocessed source if appropriate. See <http://gcc.gnu.org/bugs.html> for instructions. [hjl@gnu-skl-1 gcc]$ (gdb) bt #0 0x086ffe63 in val_signbit_p(machine_mode, unsigned long long) () #1 0x08d73438 in combine_simplify_rtx(rtx_def*, machine_mode, int, int) () #2 0x08d750ad in subst(rtx_def*, rtx_def*, rtx_def*, int, int, int) () #3 0x08d74e1f in subst(rtx_def*, rtx_def*, rtx_def*, int, int, int) () #4 0x08d76327 in try_combine(rtx_insn*, rtx_insn*, rtx_insn*, rtx_insn*, int*, rtx_insn*) () #5 0x08d7c4b1 in (anonymous namespace)::pass_combine::execute(function*) () #6 0x08662051 in execute_one_pass(opt_pass*) () #7 0x08662669 in execute_pass_list_1(opt_pass*) () #8 0x0866267c in execute_pass_list_1(opt_pass*) () #9 0x086626c9 in execute_pass_list(function*, opt_pass*) () #10 0x08358b33 in cgraph_node::expand() () #11 0x0835a3e4 in symbol_table::compile() [clone .part.51] () #12 0x0835c73f in symbol_table::finalize_compilation_unit() () #13 0x0872b897 in compile_file() () #14 0x081d6871 in toplev::main(int, char**) () #15 0x081d8911 in main () (gdb) disass Dump of assembler code for function _Z13val_signbit_p12machine_modey: 0x086ffe10 <+0>: push %ebx 0x086ffe11 <+1>: xor %eax,%eax 0x086ffe13 <+3>: sub $0x10,%esp 0x086ffe16 <+6>: mov 0x18(%esp),%edx 0x086ffe1a <+10>: mov 0x1c(%esp),%ecx 0x086ffe1e <+14>: mov 0x20(%esp),%ebx 0x086ffe22 <+18>: cmpb $0x2,0x8f99440(%edx) 0x086ffe29 <+25>: mov %ecx,(%esp) 0x086ffe2c <+28>: mov %ebx,0x4(%esp) 0x086ffe30 <+32>: je 0x86ffe40 <_Z13val_signbit_p12machine_modey+48> 0x086ffe32 <+34>: add $0x10,%esp 0x086ffe35 <+37>: pop %ebx 0x086ffe36 <+38>: ret 0x086ffe37 <+39>: mov %esi,%esi 0x086ffe39 <+41>: lea 0x0(%edi,%eiz,1),%edi 0x086ffe40 <+48>: movzwl 0x8f99360(%edx,%edx,1),%ecx 0x086ffe48 <+56>: sub $0x1,%ecx 0x086ffe4b <+59>: cmp $0x3f,%ecx 0x086ffe4e <+62>: ja 0x86ffe32 <_Z13val_signbit_p12machine_modey+34> 0x086ffe50 <+64>: movq 0x8f98ea0(,%edx,8),%xmm0 0x086ffe59 <+73>: xor %eax,%eax 0x086ffe5b <+75>: xor %edx,%edx ---Type <return> to continue, or q <return> to quit--- 0x086ffe5d <+77>: test $0x20,%cl 0x086ffe60 <+80>: sete %al => 0x086ffe63 <+83>: movdqa (%esp),%xmm2 0x086ffe68 <+88>: setne %dl 0x086ffe6b <+91>: shl %cl,%eax 0x086ffe6d <+93>: shl %cl,%edx 0x086ffe6f <+95>: pand %xmm0,%xmm2 0x086ffe73 <+99>: movd %eax,%xmm0 0x086ffe77 <+103>: pinsrd $0x1,%edx,%xmm0 0x086ffe7d <+109>: pxor %xmm2,%xmm0 0x086ffe81 <+113>: punpcklqdq %xmm0,%xmm0 0x086ffe85 <+117>: ptest %xmm0,%xmm0 0x086ffe8a <+122>: sete %al 0x086ffe8d <+125>: add $0x10,%esp 0x086ffe90 <+128>: pop %ebx 0x086ffe91 <+129>: ret End of assembler dump. (gdb) p $esp $1 = (void *) 0xffffcb68 (gdb)
(In reply to H.J. Lu from comment #0) ... > 0x086ffe29 <+25>: mov %ecx,(%esp) > 0x086ffe2c <+28>: mov %ebx,0x4(%esp) ... > => 0x086ffe63 <+83>: movdqa (%esp),%xmm2 ... > (gdb) p $esp > $1 = (void *) 0xffffcb68 This looks like a problem in STV pass itself. Here we load DImode value, so movq should be used, not movdqa.
In addition, movdqa loads uninitialized stack location to high part of the SSE reg. Adding CC.
x86_64 gcc doesn't ICE on this with -m32 -O2 -march=corei7 -mtune=corei7 -mfpmath=sse, nor i686-linux one.
(In reply to Jakub Jelinek from comment #3) > x86_64 gcc doesn't ICE on this with -m32 -O2 -march=corei7 -mtune=corei7 > -mfpmath=sse, nor i686-linux one. x86_64 gcc doesn't use STV.
Ah, I see, simplify-rtx.c is miscompiled. So, it could be just the convert_scalars_to_vector hunk of the patch, because STV is clearly enabled in there. If we have guaranteed that both preferred and incoming stack boundaries are at least 64, can we readd that hunk back? Though, as it uses 128 instead of 64, perhaps we should disable TARGET_STV whenever the preferred/incoming stack boundaries are smaller than 128 bits and readd the hunk in convert_scalars_to_vector.
STV turns: insn 21 19 23 4 (parallel [ (set (reg:DI 102 [ val ]) (and:DI (reg/v:DI 97 [ val ]) (mem/u:DI (plus:SI (mult:SI (reg/v:SI 96 [ mode ]) (const_int 8 [0x8])) (symbol_ref:SI ("mode_mask_array") [flags 0x40] <var_decl 0xf517e160 mode_mask_array>)) [13 mode_mask_array S8 A64]))) (clobber (reg:CC 17 flags)) ]) /export/gnu/import/git/gcc-regression/gcc/gcc/simplify-rtx.c:134 332 {*anddi3_doubleword} (expr_list:REG_DEAD (reg/v:DI 97 [ val ]) (expr_list:REG_DEAD (reg/v:SI 96 [ mode ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))))) (insn 23 21 24 4 (parallel [ (set (reg:DI 103) (ashift:DI (const_int 1 [0x1]) (subreg:QI (reg:SI 91 [ _8 ]) 0))) (clobber (reg:CC 17 flags)) ]) /export/gnu/import/git/gcc-regression/gcc/gcc/simplify-rtx.c:135 449 {*ashldi3_doubleword} (expr_list:REG_DEAD (reg:SI 91 [ _8 ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil)))) into (insn 47 19 21 4 (set (reg:DI 111) (mem/u:DI (plus:SI (mult:SI (reg/v:SI 96 [ mode ]) (const_int 8 [0x8])) (symbol_ref:SI ("mode_mask_array") [flags 0x40] <var_decl 0xf517e160 mode_mask_array>)) [13 mode_mask_array S8 A64])) /export/gnu/import/git/gcc-regression/gcc/gcc/simplify-rtx.c:134 -1 (nil)) (insn 21 47 23 4 (set (subreg:V2DI (reg:DI 102 [ val ]) 0) (and:V2DI (subreg:V2DI (reg/v:DI 97 [ val ]) 0) (subreg:V2DI (reg:DI 111) 0))) /export/gnu/import/git/gcc-regression/gcc/gcc/simplify-rtx.c:134 3253 {*andv2di3} (expr_list:REG_DEAD (reg/v:DI 97 [ val ]) (expr_list:REG_DEAD (reg/v:SI 96 [ mode ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))))) (insn 23 21 45 4 (parallel [ (set (reg:DI 103) (ashift:DI (const_int 1 [0x1]) (subreg:QI (reg:SI 91 [ _8 ]) 0))) (clobber (reg:CC 17 flags)) ]) /export/gnu/import/git/gcc-regression/gcc/gcc/simplify-rtx.c:135 449 {*ashldi3_doubleword} (expr_list:REG_DEAD (reg:SI 91 [ _8 ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil)))) So STV requires 16 byte stack alignment and uses 16 byte stack alignment.
Created attachment 37584 [details] gcc6-pr69677.patch I'm currently bootstrapping/regtesting --with-arch=corei7 --with-cpu=corei7 --with-fpmath=sse i686-linux compiler with this patch, and got already well past the spot where it previously ICEd. The patch should restore STV behavior to the previous one, except for the non-default lower preferred/incoming stack boundaries.
(In reply to Jakub Jelinek from comment #7) > Created attachment 37584 [details] > gcc6-pr69677.patch > > I'm currently bootstrapping/regtesting --with-arch=corei7 --with-cpu=corei7 > --with-fpmath=sse i686-linux compiler with this patch, and got already well > past the spot where it previously ICEd. > The patch should restore STV behavior to the previous one, except for the > non-default lower preferred/incoming stack boundaries. This is wrong and I don't think it is needed. Assert crtl->stack_alignment_needed >= 128 is OK. Change it isn't allowed after rtl->stack_realign_processed is set. @@ -3588,6 +3588,16 @@ convert_scalars_to_vector () bitmap_obstack_release (NULL); df_process_deferred_rescans (); + /* Conversion means we may have 128bit register spills/fills + which require aligned stack. */ + if (converted_insns) + { + if (crtl->stack_alignment_needed < 128) + crtl->stack_alignment_needed = 128; + if (crtl->stack_alignment_estimated < 128) + crtl->stack_alignment_estimated = 128; + } + return 0; }
Another problem. STV is disabled even when stack is aligned: [hjl@gnu-skl-1 pr69677]$ cat x.i struct bar { int i[16] __attribute__ ((aligned(16))); }; extern void fn2 (void); long long a, b; struct bar fn1 (struct bar x) { long long c = a; a = b ^ a; fn2 (); a = c; return x; } [hjl@gnu-skl-1 pr69677]$ make x.s /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc -B/export/build/gnu/gcc/build-x86_64-linux/gcc/ -m32 -O2 -msse2 -mno-accumulate-outgoing-args -mpreferred-stack-boundary=2 -S x.i [hjl@gnu-skl-1 pr69677]$ cat x.s .file "x.i" .text .p2align 4,,15 .globl fn1 .type fn1, @function fn1: .LFB0: .cfi_startproc pushl %ebp .cfi_def_cfa_offset 8 .cfi_offset 5, -8 movl %esp, %ebp .cfi_def_cfa_register 5 pushl %edi pushl %esi pushl %ebx pushl %ecx .cfi_offset 7, -12 .cfi_offset 6, -16 .cfi_offset 3, -20 .cfi_offset 1, -24 leal 8(%ebp), %ecx subl $4, %esp movl a, %eax movl a+4, %edi movl (%ecx), %ebx movl %ecx, %esi movl %eax, %edx xorl b, %edx movl %eax, -20(%ebp) movl %edx, a movl b+4, %edx xorl %edi, %edx movl %edx, a+4 call fn2 movl -20(%ebp), %eax movl %edi, a+4 movl %eax, a movl 4(%esi), %eax movl %eax, (%ebx) movl 8(%esi), %eax movl %eax, 4(%ebx) movl 12(%esi), %eax movl %eax, 8(%ebx) movl 16(%esi), %eax movl %eax, 12(%ebx) movl 20(%esi), %eax movl %eax, 16(%ebx) movl 24(%esi), %eax movl %eax, 20(%ebx) movl 28(%esi), %eax movl %eax, 24(%ebx) movl 32(%esi), %eax movl %eax, 28(%ebx) movl 36(%esi), %eax movl %eax, 32(%ebx) movl 40(%esi), %eax movl %eax, 36(%ebx) movl 44(%esi), %eax movl %eax, 40(%ebx) movl 48(%esi), %eax movl %eax, 44(%ebx) movl 52(%esi), %eax movl %eax, 48(%ebx) movl 56(%esi), %eax movl %eax, 52(%ebx) movl 60(%esi), %eax movl %eax, 56(%ebx) movl 64(%esi), %eax movl %eax, 60(%ebx) addl $4, %esp movl %ebx, %eax popl %ecx .cfi_restore 1 popl %ebx .cfi_restore 3 popl %esi .cfi_restore 6 popl %edi .cfi_restore 7 popl %ebp .cfi_restore 5 .cfi_def_cfa 4, 4 ret $4 .cfi_endproc .LFE0: .size fn1, .-fn1 .comm b,8,8 .comm a,8,8 .ident "GCC: (GNU) 6.0.0 20160205 (experimental)" .section .note.GNU-stack,"",@progbits [hjl@gnu-skl-1 pr69677]$
Created attachment 37589 [details] A different patch I am testing this.
(In reply to H.J. Lu from comment #9) > Another problem. STV is disabled even when stack is aligned: See the other PR, that has been discussed as one of the possible approaches, but as TARGET_STV does not affect just the stv pass, but also expansion, it is not at all clear which of the approaches is better. Therefore, it was agreed on to disable STV in these rare cases altogether for GCC 6, and for GCC 7 somebody who cares enough can perform appropriate testing/benchmarking and tweak it.
Author: jakub Date: Fri Feb 5 09:23:03 2016 New Revision: 233167 URL: https://gcc.gnu.org/viewcvs?rev=233167&root=gcc&view=rev Log: PR bootstrap/69677 * config/i386/i386.c (convert_scalars_to_vector): Readd stack alignment fixes. (ix86_option_override_internal): Disable TARGET_STV even for -m{incoming,preferred}-stack-boundary=3. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c
Fixed.
(In reply to H.J. Lu from comment #6) > STV turns: > > insn 21 19 23 4 (parallel [ > (set (reg:DI 102 [ val ]) > (and:DI (reg/v:DI 97 [ val ]) > (mem/u:DI (plus:SI (mult:SI (reg/v:SI 96 [ mode ]) > (const_int 8 [0x8])) > (symbol_ref:SI ("mode_mask_array") [flags 0x40] > <var_decl 0xf517e160 mode_mask_array>)) [13 mode_mask_array S8 A64]))) > (clobber (reg:CC 17 flags)) > ]) /export/gnu/import/git/gcc-regression/gcc/gcc/simplify-rtx.c:134 > 332 {*anddi3_doubleword} > (expr_list:REG_DEAD (reg/v:DI 97 [ val ]) > (expr_list:REG_DEAD (reg/v:SI 96 [ mode ]) > (expr_list:REG_UNUSED (reg:CC 17 flags) > (nil))))) > (insn 23 21 24 4 (parallel [ > (set (reg:DI 103) > (ashift:DI (const_int 1 [0x1]) > (subreg:QI (reg:SI 91 [ _8 ]) 0))) > (clobber (reg:CC 17 flags)) > ]) /export/gnu/import/git/gcc-regression/gcc/gcc/simplify-rtx.c:135 > 449 {*ashldi3_doubleword} > (expr_list:REG_DEAD (reg:SI 91 [ _8 ]) > (expr_list:REG_UNUSED (reg:CC 17 flags) > (nil)))) > > into > (insn 47 19 21 4 (set (reg:DI 111) > (mem/u:DI (plus:SI (mult:SI (reg/v:SI 96 [ mode ]) > (const_int 8 [0x8])) > (symbol_ref:SI ("mode_mask_array") [flags 0x40] <var_decl > 0xf517e160 mode_mask_array>)) [13 mode_mask_array S8 A64])) > /export/gnu/import/git/gcc-regression/gcc/gcc/simplify-rtx.c:134 -1 > (nil)) > (insn 21 47 23 4 (set (subreg:V2DI (reg:DI 102 [ val ]) 0) > (and:V2DI (subreg:V2DI (reg/v:DI 97 [ val ]) 0) > (subreg:V2DI (reg:DI 111) 0))) > /export/gnu/import/git/gcc-regression/gcc/gcc/simplify-rtx.c:134 3253 > {*andv2di3} > (expr_list:REG_DEAD (reg/v:DI 97 [ val ]) > (expr_list:REG_DEAD (reg/v:SI 96 [ mode ]) > (expr_list:REG_UNUSED (reg:CC 17 flags) > (nil))))) > (insn 23 21 45 4 (parallel [ > (set (reg:DI 103) > (ashift:DI (const_int 1 [0x1]) > (subreg:QI (reg:SI 91 [ _8 ]) 0))) > (clobber (reg:CC 17 flags)) > ]) /export/gnu/import/git/gcc-regression/gcc/gcc/simplify-rtx.c:135 > 449 {*ashldi3_doubleword} > (expr_list:REG_DEAD (reg:SI 91 [ _8 ]) > (expr_list:REG_UNUSED (reg:CC 17 flags) > (nil)))) > > So STV requires 16 byte stack alignment and uses 16 byte stack alignment. I don't see how STV requires 16 byte stack alignment here. All registers are DI here. Emitted movdqa is not produced by STV, it seems like LRA bug which writes 8 bytes to stack but reads 16.
(In reply to Jakub Jelinek from comment #13) > Fixed. I think you just hide LRA issue disabling STV and LRA still may generate incorrect register fill
(In reply to Ilya Enkovich from comment #15) > (In reply to Jakub Jelinek from comment #13) > > Fixed. > > I think you just hide LRA issue disabling STV and LRA still may generate > incorrect register fill Yes, but stage 4 is not the time to solve these kind of bugs. As said, the fix disables STV for rare and unusual cases. I expect that proper fix will be developed during gcc-7 time, and this includes various RA fixes. Previous patch relaxed alignment requirements to 64 bits, and this triggered RA bug. So, unless we want to delay release for a couple of weeks (a month?), we will leave alignment to 128 bits. That doesn't mean that incorrect register fill RA bug should't be fixed during stage-4. Do we have a small testcase that exposes it?
(In reply to Uroš Bizjak from comment #16) > That doesn't mean that incorrect register fill RA bug should't be fixed > during stage-4. Do we have a small testcase that exposes it? I submitted PR69693 for RA problem.
Author: hjl Date: Fri Feb 5 16:24:06 2016 New Revision: 233180 URL: https://gcc.gnu.org/viewcvs?rev=233180&root=gcc&view=rev Log: Add a testcase for PR target/69677 PR target/69677 * gcc.target/i386/pr69677.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr69677.c Modified: trunk/gcc/testsuite/ChangeLog