Summary: | [4.7 regression] vzeroupper clobbers argument with AVX | ||
---|---|---|---|
Product: | gcc | Reporter: | Eric Botcazou <ebotcazou> |
Component: | target | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | hjl.tools |
Priority: | P2 | Keywords: | wrong-code |
Version: | 4.7.3 | ||
Target Milestone: | 4.7.3 | ||
Host: | Target: | x86_64-*-* | |
Build: | Known to work: | 4.5.3, 4.8.0 | |
Known to fail: | 4.6.3, 4.7.2 | Last reconfirmed: | 2013-03-07 00:00:00 |
Attachments: |
A patch
A patch |
Description
Eric Botcazou
2013-03-07 09:47:31 UTC
Confirmed, same code generated on the 4.6 branch. Works on the 4.5 branch where no vzeroupper is inserted. Likewise no vzeroupper on trunk. > Confirmed, same code generated on the 4.6 branch. Works on the 4.5 branch
> where no vzeroupper is inserted. Likewise no vzeroupper on trunk.
Thanks for confirming. The vzeroupper pass didn't exist on the 4.5 branch and has been rewritten to use the mode-switching machinery on mainline.
Adding author of 4,6/4.7 vzerouopper pass to CC. The caller info is lost by (gdb) bt #0 init_cumulative_args (cum=0x7fffffffc3f0, fntype=0x7ffff1472e70, libname=0x0, fndecl=0x0, caller=1) at /export/gnu/import/git/gcc-release/gcc/config/i386/i386.c:5562 #1 0x0000000000640011 in block_move_libcall_safe_for_call_parm () at /export/gnu/import/git/gcc-release/gcc/expr.c:1244 #2 0x000000000063fc07 in emit_block_move_hints (x=0x7ffff1470780, y=0x7ffff1470750, size=0x7ffff133a530, method=BLOCK_OP_CALL_PARM, expected_align=0, expected_size=-1) at /export/gnu/import/git/gcc-release/gcc/expr.c:1139 #3 0x000000000063ff3c in emit_block_move (x=0x7ffff1470780, y=0x7ffff1470750, size=0x7ffff133a530, method=BLOCK_OP_CALL_PARM) at /export/gnu/import/git/gcc-release/gcc/expr.c:1206 #4 0x000000000064693a in emit_push_insn (x=0x7ffff1470750, mode=BLKmode, type=0x7ffff1472690, size=0x7ffff133a530, align=64, partial=0, reg=0x0, extra=4, args_addr=0x7ffff1334560, args_so_far=0x7ffff133a470, reg_parm_stack_space=0, alignment_pad=0x7ffff133a470) at /export/gnu/import/git/gcc-release/gcc/expr.c:4116 #5 0x000000000056d1ad in store_one_arg (arg=0x7fffffffc760, argblock=0x7ffff1334560, flags=0, variable_size=0, reg_parm_stack_space=0) at /export/gnu/import/git/gcc-release/gcc/calls.c:4646 #6 0x0000000000568b51 in expand_call (exp=0x7ffff1333cb0, target=0x7ffff1461f00, ignore=0) at /export/gnu/import/git/gcc-release/gcc/calls.c:3023 when storing struct S on stack. This patch: diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index c1f6c88..8005207 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -5562,7 +5562,7 @@ init_cumulative_args (CUMULATIVE_ARGS *cum, /* Argument info to initialize */ memset (cum, 0, sizeof (*cum)); /* Initialize for the current callee. */ - if (caller) + if (caller && fndecl) { cfun->machine->callee_pass_avx256_p = false; cfun->machine->callee_return_avx256_p = false; fixes it. > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index c1f6c88..8005207 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -5562,7 +5562,7 @@ init_cumulative_args (CUMULATIVE_ARGS *cum, /* Argument
> info to initialize */
> memset (cum, 0, sizeof (*cum));
>
> /* Initialize for the current callee. */
> - if (caller)
> + if (caller && fndecl)
> {
> cfun->machine->callee_pass_avx256_p = false;
> cfun->machine->callee_return_avx256_p = false;
>
> fixes it.
I don't think it's correct, fndecl is NULL_TREE for indirect calls as well.
Created attachment 29645 [details]
A patch
This patch adds expand_args to track library calls to
expend arguments. We add vzeroupper when expand_args == 1,
which indicates we are expanding the actual function.
> This patch adds expand_args to track library calls to
> expend arguments. We add vzeroupper when expand_args == 1,
> which indicates we are expanding the actual function.
This looks complicated. A simpler approach could be to record the AVX state in the CUMULATIVE_ARGS structure and transfer it to cfun->machine only at the end of the argument processing. Comments and code in calls.c appear to guarantee that
targetm.calls.function_arg (&args_so_far, VOIDmode, void_type_node, true)
is invoked immediately before the call instruction is emitted.
Created attachment 29655 [details]
A patch
This patch adds callee_pass_avx256_p and callee_return_avx256_p
to ix86_args. ix86_function_arg copies them to cfun->machine
when ix86_function_arg is called with VOIDmode, which is called
just before emitting call. cfun->machine->callee_return_avx256_p
is set in init_cumulative_args for ix86_function_ok_for_sibcall.
> This patch adds callee_pass_avx256_p and callee_return_avx256_p
> to ix86_args. ix86_function_arg copies them to cfun->machine
> when ix86_function_arg is called with VOIDmode, which is called
> just before emitting call. cfun->machine->callee_return_avx256_p
> is set in init_cumulative_args for ix86_function_ok_for_sibcall.
This looks good to me, but I don't know the i386 back-end much (and of course cannot approve anything). Btw, you should add a comment before the new
if (cum->caller && mode == VOIDmode)
block explaining why you need to do this dance on the caller side.
Thanks for working on this.
GCC 4.6.4 has been released and the branch has been closed. Author: hjl Date: Fri Mar 22 16:36:22 2013 New Revision: 196976 URL: http://gcc.gnu.org/viewcvs?rev=196976&root=gcc&view=rev Log: Set callee_pass_avx256_p before emitting call instruction gcc/ PR target/56560 * config/i386/i386.c (init_cumulative_args): Also set cum->callee_return_avx256_p. (ix86_function_arg): Set cum->callee_pass_avx256_p. Set cfun->machine->callee_pass_avx256_p only when MODE == VOIDmode. * config/i386/i386.h (ix86_args): Add callee_pass_avx256_p and callee_return_avx256_p. gcc/ PR target/56560 * gcc.target/i386/pr56560.c: New file. Added: branches/gcc-4_7-branch/gcc/testsuite/gcc.target/i386/pr56560.c Modified: branches/gcc-4_7-branch/gcc/ChangeLog branches/gcc-4_7-branch/gcc/config/i386/i386.c branches/gcc-4_7-branch/gcc/config/i386/i386.h branches/gcc-4_7-branch/gcc/testsuite/ChangeLog |