Bug 56560

Summary: [4.7 regression] vzeroupper clobbers argument with AVX
Product: gcc Reporter: Eric Botcazou <ebotcazou>
Component: targetAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: hjl.tools
Priority: P2 Keywords: wrong-code
Version: 4.7.3   
Target Milestone: 4.7.3   
Host: Target: x86_64-*-*
Build: Known to work: 4.5.3, 4.8.0
Known to fail: 4.6.3, 4.7.2 Last reconfirmed: 2013-03-07 00:00:00
Attachments: A patch
A patch

Description Eric Botcazou 2013-03-07 09:47:31 UTC
The vzeroupper optimization pass can go awry on the 4.7 branch and clobbers arguments passed to functions:

eric@polaris:~> cat t.c
/* { dg-do compile } */
/* { dg-options "-O2 -mavx" } */

extern void abort (void);

typedef double vec_t __attribute__((vector_size(32)));

struct S { int i1; int i2; int i3; };

extern int bar (vec_t, int, int, int, int, int, struct S);

void foo (vec_t v, struct S s)
{
  int i = bar (v, 1, 2, 3, 4, 5, s);
  if (i == 0)
    abort ();
}

/* { dg-final { scan-assembler-not "vzeroupper" } } */
eric@polaris:~> ~/install/gcc-4_7-branch/bin/gcc -S -O2 -mavx t.c
eric@polaris:~> cat t.s
        .file   "t.c"
        .text
        .p2align 4,,15
        .globl  foo
        .type   foo, @function
foo:
.LFB0:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movl    $5, %r8d
        movl    $4, %ecx
        movl    $3, %edx
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        andq    $-32, %rsp
        subq    $64, %rsp
        movq    %rdi, (%rsp)
        movl    %esi, 8(%rsp)
        movl    $1, %edi
        movl    $2, %esi
        vzeroupper
        call    bar
        testl   %eax, %eax
        je      .L5
        leave
        .cfi_remember_state
        .cfi_def_cfa 7, 8
        ret
.L5:
        .cfi_restore_state
        .p2align 4,,9
        call    abort
        .cfi_endproc
Comment 1 Richard Biener 2013-03-07 10:09:43 UTC
Confirmed, same code generated on the 4.6 branch.  Works on the 4.5 branch
where no vzeroupper is inserted.  Likewise no vzeroupper on trunk.
Comment 2 Eric Botcazou 2013-03-07 10:17:56 UTC
> Confirmed, same code generated on the 4.6 branch.  Works on the 4.5 branch
> where no vzeroupper is inserted.  Likewise no vzeroupper on trunk.

Thanks for confirming.  The vzeroupper pass didn't exist on the 4.5 branch and has been rewritten to use the mode-switching machinery on mainline.
Comment 3 Uroš Bizjak 2013-03-08 09:24:27 UTC
Adding author of 4,6/4.7 vzerouopper pass to CC.
Comment 4 H.J. Lu 2013-03-08 17:21:18 UTC
The caller info is lost by

(gdb) bt
#0  init_cumulative_args (cum=0x7fffffffc3f0, fntype=0x7ffff1472e70, 
    libname=0x0, fndecl=0x0, caller=1)
    at /export/gnu/import/git/gcc-release/gcc/config/i386/i386.c:5562
#1  0x0000000000640011 in block_move_libcall_safe_for_call_parm ()
    at /export/gnu/import/git/gcc-release/gcc/expr.c:1244
#2  0x000000000063fc07 in emit_block_move_hints (x=0x7ffff1470780, 
    y=0x7ffff1470750, size=0x7ffff133a530, method=BLOCK_OP_CALL_PARM, 
    expected_align=0, expected_size=-1)
    at /export/gnu/import/git/gcc-release/gcc/expr.c:1139
#3  0x000000000063ff3c in emit_block_move (x=0x7ffff1470780, y=0x7ffff1470750, 
    size=0x7ffff133a530, method=BLOCK_OP_CALL_PARM)
    at /export/gnu/import/git/gcc-release/gcc/expr.c:1206
#4  0x000000000064693a in emit_push_insn (x=0x7ffff1470750, mode=BLKmode, 
    type=0x7ffff1472690, size=0x7ffff133a530, align=64, partial=0, reg=0x0, 
    extra=4, args_addr=0x7ffff1334560, args_so_far=0x7ffff133a470, 
    reg_parm_stack_space=0, alignment_pad=0x7ffff133a470)
    at /export/gnu/import/git/gcc-release/gcc/expr.c:4116
#5  0x000000000056d1ad in store_one_arg (arg=0x7fffffffc760, 
    argblock=0x7ffff1334560, flags=0, variable_size=0, reg_parm_stack_space=0)
    at /export/gnu/import/git/gcc-release/gcc/calls.c:4646
#6  0x0000000000568b51 in expand_call (exp=0x7ffff1333cb0, 
    target=0x7ffff1461f00, ignore=0)
    at /export/gnu/import/git/gcc-release/gcc/calls.c:3023

when storing struct S on stack.  This patch:

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index c1f6c88..8005207 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5562,7 +5562,7 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* Argument info to initialize */
   memset (cum, 0, sizeof (*cum));
 
   /* Initialize for the current callee.  */
-  if (caller)
+  if (caller && fndecl)
     {
       cfun->machine->callee_pass_avx256_p = false;
       cfun->machine->callee_return_avx256_p = false;

fixes it.
Comment 5 Eric Botcazou 2013-03-10 15:51:03 UTC
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index c1f6c88..8005207 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -5562,7 +5562,7 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* Argument
> info to initialize */
>    memset (cum, 0, sizeof (*cum));
> 
>    /* Initialize for the current callee.  */
> -  if (caller)
> +  if (caller && fndecl)
>      {
>        cfun->machine->callee_pass_avx256_p = false;
>        cfun->machine->callee_return_avx256_p = false;
> 
> fixes it.

I don't think it's correct, fndecl is NULL_TREE for indirect calls as well.
Comment 6 H.J. Lu 2013-03-11 19:34:29 UTC
Created attachment 29645 [details]
A patch

This patch adds expand_args to track library calls to
expend arguments.  We add vzeroupper when expand_args == 1,
which indicates we are expanding the actual function.
Comment 7 Eric Botcazou 2013-03-12 11:37:27 UTC
> This patch adds expand_args to track library calls to
> expend arguments.  We add vzeroupper when expand_args == 1,
> which indicates we are expanding the actual function.

This looks complicated.  A simpler approach could be to record the AVX state in the CUMULATIVE_ARGS structure and transfer it to cfun->machine only at the end of the argument processing.  Comments and code in calls.c appear to guarantee that

  targetm.calls.function_arg (&args_so_far, VOIDmode, void_type_node, true)

is invoked immediately before the call instruction is emitted.
Comment 8 H.J. Lu 2013-03-12 16:48:45 UTC
Created attachment 29655 [details]
A patch

This patch adds callee_pass_avx256_p and callee_return_avx256_p
to ix86_args.  ix86_function_arg copies them to cfun->machine
when ix86_function_arg is called with VOIDmode, which is called
just before emitting call.  cfun->machine->callee_return_avx256_p
is set in init_cumulative_args for ix86_function_ok_for_sibcall.
Comment 9 Eric Botcazou 2013-03-12 17:07:40 UTC
> This patch adds callee_pass_avx256_p and callee_return_avx256_p
> to ix86_args.  ix86_function_arg copies them to cfun->machine
> when ix86_function_arg is called with VOIDmode, which is called
> just before emitting call.  cfun->machine->callee_return_avx256_p
> is set in init_cumulative_args for ix86_function_ok_for_sibcall.

This looks good to me, but I don't know the i386 back-end much (and of course cannot approve anything).  Btw, you should add a comment before the new

  if (cum->caller && mode == VOIDmode)

block explaining why you need to do this dance on the caller side.

Thanks for working on this.
Comment 10 Jakub Jelinek 2013-04-12 15:16:54 UTC
GCC 4.6.4 has been released and the branch has been closed.
Comment 11 Eric Botcazou 2013-05-06 14:42:04 UTC
Author: hjl
Date: Fri Mar 22 16:36:22 2013
New Revision: 196976

URL: http://gcc.gnu.org/viewcvs?rev=196976&root=gcc&view=rev
Log:
Set callee_pass_avx256_p before emitting call instruction

gcc/

	PR target/56560
	* config/i386/i386.c (init_cumulative_args): Also set
	cum->callee_return_avx256_p.
	(ix86_function_arg): Set cum->callee_pass_avx256_p.  Set
	cfun->machine->callee_pass_avx256_p only when MODE == VOIDmode.

	* config/i386/i386.h (ix86_args): Add callee_pass_avx256_p and
	callee_return_avx256_p.

gcc/

	PR target/56560
	* gcc.target/i386/pr56560.c: New file.

Added:
    branches/gcc-4_7-branch/gcc/testsuite/gcc.target/i386/pr56560.c
Modified:
    branches/gcc-4_7-branch/gcc/ChangeLog
    branches/gcc-4_7-branch/gcc/config/i386/i386.c
    branches/gcc-4_7-branch/gcc/config/i386/i386.h
    branches/gcc-4_7-branch/gcc/testsuite/ChangeLog