Bug 83330 - [7/8 Regression] generating unaligned store to stack for SSE register with -mno-push-args
Summary: [7/8 Regression] generating unaligned store to stack for SSE register with -m...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 8.0
: P2 normal
Target Milestone: 7.3
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2017-12-08 15:52 UTC by Zdenek Sojka
Modified: 2018-01-15 16:14 UTC (History)
2 users (show)

See Also:
Host: x86_64-pc-linux-gnu
Target: x86_64-pc-linux-gnu
Build:
Known to work:
Known to fail: 7.2.1, 8.0
Last reconfirmed: 2017-12-16 00:00:00


Attachments
reduced testcase (195 bytes, text/plain)
2017-12-08 15:52 UTC, Zdenek Sojka
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Zdenek Sojka 2017-12-08 15:52:01 UTC
Created attachment 42816 [details]
reduced testcase

Output:
$ x86_64-pc-linux-gnu-gcc -O2 -fno-tree-dce -mno-push-args testcase.c
$ ./a.out 
Segmentation fault


foo() disassembly:

foo:
	sub	rsp, 16
	pxor	xmm0, xmm0
	movaps	XMMWORD PTR [rsp], xmm0 <== crashes HERE
	pop	rax
	mov	rax, QWORD PTR g[rip]
	pop	rdx
	ret

According to the ABI, rsp is aligned to 16 bytes before the function call; thus, rsp % 16 == 0 at the point of crash.

Also, I do not understand why bar() is apparently inlined, when the function has noinline,noclone attributes (noipa helps; -fno-ipa-pure-const helps too); but "noipa" implies just noinline,noclone,no_icf according to https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
Comment 1 Jakub Jelinek 2017-12-15 20:22:52 UTC
(In reply to Zdenek Sojka from comment #0)
> Also, I do not understand why bar() is apparently inlined, when the function

bar isn't inlined.  Instead ipa-pure-const determines it is a const function and because nothing really needs its return value, it is just not called at all.

> has noinline,noclone attributes (noipa helps; -fno-ipa-pure-const helps
> too); but "noipa" implies just noinline,noclone,no_icf according to
> https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-
> Function-Attributes

No, it doesn't say anything like that.  It says it implies those attributes, but is not a mere union of those 3 attributes, but disables many other IPA optimizes between that function and the callers.
E.g. mere noinline, noclone, no_icf doesn't disable IPA VRP, or IPA bitwise CCP, etc.
Comment 2 H.J. Lu 2017-12-16 18:36:29 UTC
Add -mno-stv fixes the crash.
Comment 3 H.J. Lu 2017-12-16 18:51:17 UTC
RTL expand is correct:

;; _2 = bar (0, 0, 0, 0, 0); 

(insn 6 5 7 (parallel [
            (set (reg/f:DI 7 sp) 
                (plus:DI (reg/f:DI 7 sp) 
                    (const_int -16 [0xfffffffffffffff0])))
            (clobber (reg:CC 17 flags))
        ]) "x.i":17 -1
     (expr_list:REG_ARGS_SIZE (const_int 16 [0x10])
        (nil)))

(insn 7 6 8 (set (reg:DI 90) 
        (reg/f:DI 84 virtual-outgoing-args)) "x.i":17 -1
     (nil))

(insn 8 7 9 (set (reg:DI 91) 
        (const_int 0 [0])) "x.i":17 -1
     (nil))

(insn 9 8 10 (set (reg:DI 92) 
        (const_int 0 [0])) "x.i":17 -1
     (nil))

(insn 10 9 11 (set (mem:TI (reg:DI 90) [0  S16 A128])
        (const_int 0 [0])) "x.i":17 -1
     (nil))

But vregs generates:

(insn 5 2 6 2 (set (reg:DI 87 [ v.0_1 ])
        (mem/c:DI (symbol_ref:DI ("v") [flags 0x2]  <var_decl 0x7f520f0ddab0 v>) [1 v+0 S8 A64])) "x.i":17 85 {*movdi_internal}
     (nil))
(insn 6 5 7 2 (parallel [
            (set (reg/f:DI 7 sp)
                (plus:DI (reg/f:DI 7 sp)
                    (const_int -16 [0xfffffffffffffff0])))
            (clobber (reg:CC 17 flags))
        ]) "x.i":17 222 {*adddi_1}
     (expr_list:REG_ARGS_SIZE (const_int 16 [0x10])
        (nil)))
(insn 7 6 8 2 (set (reg:DI 90)
        (reg/f:DI 7 sp)) "x.i":17 85 {*movdi_internal}
     (nil))
(insn 8 7 9 2 (set (reg:DI 91)
        (const_int 0 [0])) "x.i":17 85 {*movdi_internal}
     (nil))
(insn 9 8 10 2 (set (reg:DI 92)
        (const_int 0 [0])) "x.i":17 85 {*movdi_internal}
     (nil))
(insn 10 9 11 2 (set (mem:TI (reg:DI 90) [0  S16 A128]) <<<< The alignment is wrong.
        (const_int 0 [0])) "x.i":17 84 {*movti_internal}
     (nil))
Comment 4 H.J. Lu 2017-12-16 19:39:58 UTC
call expand is wrong:

(insn 6 5 7 2 (parallel [
            (set (reg/f:DI 7 sp)
                (plus:DI (reg/f:DI 7 sp)
                    (const_int -16 [0xfffffffffffffff0])))
            (clobber (reg:CC 17 flags))
        ]) "x.i":17 222 {*adddi_1}
     (expr_list:REG_ARGS_SIZE (const_int 16 [0x10])
        (nil)))

doesn't count the alignment requirement later for

(insn 10 9 11 (set (mem:TI (reg:DI 90) [0  S16 A128])
        (const_int 0 [0])) "x.i":17 -1
     (nil))
Comment 5 H.J. Lu 2017-12-16 20:00:25 UTC
call expand is OK.  pro_and_epilogue doesn't adjust outgoing stack
to properly align stack for

(insn 27 10 17 2 (set (mem:V1TI (reg/f:DI 7 sp) [0  S16 A128])
        (reg:V1TI 21 xmm0 [95])) "x.i":17 1258 {movv1ti_internal}
     (nil))

since ud_dce removes the call to bar.
Comment 6 Zdenek Sojka 2017-12-18 11:56:28 UTC
(In reply to Jakub Jelinek from comment #1)
> (In reply to Zdenek Sojka from comment #0)
> > Also, I do not understand why bar() is apparently inlined, when the function
> 
> bar isn't inlined.  Instead ipa-pure-const determines it is a const function
> and because nothing really needs its return value, it is just not called at
> all.

Thank you for the explanation. That also explains why -fno-ipa-pure-const prevents removing the call.

> 
> > has noinline,noclone attributes (noipa helps; -fno-ipa-pure-const helps
> > too); but "noipa" implies just noinline,noclone,no_icf according to
> > https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-
> > Function-Attributes
> 
> No, it doesn't say anything like that.  It says it implies those attributes,
> but is not a mere union of those 3 attributes, but disables many other IPA
> optimizes between that function and the callers.
> E.g. mere noinline, noclone, no_icf doesn't disable IPA VRP, or IPA bitwise
> CCP, etc.

I am sorry, I totally misread the description, so it had the very opposite meaning. Thank you for the comment.
Comment 7 hjl@gcc.gnu.org 2018-01-11 20:45:28 UTC
Author: hjl
Date: Thu Jan 11 20:44:46 2018
New Revision: 256555

URL: https://gcc.gnu.org/viewcvs?rev=256555&root=gcc&view=rev
Log:
i386: Align stack frame if argument is passed on stack

When a function call is removed, it may become a leaf function.  But if
argument may be passed on stack, we need to align the stack frame when
there is no tail call.

Tested on Linux/i686 and Linux/x86-64.

gcc/

	PR target/83330
	* config/i386/i386.c (ix86_compute_frame_layout): Align stack
	frame if argument is passed on stack.

gcc/testsuite/

	PR target/83330
	* gcc.target/i386/pr83330.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/pr83330.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/testsuite/ChangeLog
Comment 8 hjl@gcc.gnu.org 2018-01-15 16:13:55 UTC
Author: hjl
Date: Mon Jan 15 16:13:23 2018
New Revision: 256703

URL: https://gcc.gnu.org/viewcvs?rev=256703&root=gcc&view=rev
Log:
i386: Align stack frame if argument is passed on stack

When a function call is removed, it may become a leaf function.  But if
argument may be passed on stack, we need to align the stack frame when
there is no tail call.

Tested on Linux/i686 and Linux/x86-64.

gcc/

	Backport from mainline
	PR target/83330
	* config/i386/i386.c (ix86_function_arg_advance): Set
	outgoing_args_on_stack to true if there are outgoing arguments
	on stack.
	(ix86_function_arg): Likewise.
	(ix86_compute_frame_layout): Align stack frame if argument is
	passed on stack.
	* config/i386/i386.h (machine_function): Add
	outgoing_args_on_stack.

gcc/testsuite/

	Backport from mainline
	PR target/83330
	* gcc.target/i386/pr83330.c: New test.

Added:
    branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr83330.c
Modified:
    branches/gcc-7-branch/gcc/ChangeLog
    branches/gcc-7-branch/gcc/config/i386/i386.c
    branches/gcc-7-branch/gcc/config/i386/i386.h
    branches/gcc-7-branch/gcc/testsuite/ChangeLog
Comment 9 H.J. Lu 2018-01-15 16:14:41 UTC
Fixed for GCC 8 and 7.3