This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [04/32] [x86] Robustify vzeroupper handling across calls
- From: Richard Sandiford <richard dot sandiford at arm dot com>
- To: gcc-patches at gcc dot gnu dot org
- Cc: hubicka at ucw dot cz, ubizjak at gmail dot com
- Date: Wed, 25 Sep 2019 16:48:19 +0100
- Subject: Re: [04/32] [x86] Robustify vzeroupper handling across calls
- References: <mptimpyzmf1.fsf@arm.com> <mpt1rwmzma7.fsf@arm.com>
Ping
Richard Sandiford <richard.sandiford@arm.com> writes:
> One of the effects of the function_abi series is to make -fipa-ra
> work for partially call-clobbered registers. E.g. if a call preserves
> only the low 32 bits of a register R, we handled the partial clobber
> separately from -fipa-ra, and so treated the upper bits of R as
> clobbered even if we knew that the target function doesn't touch R.
>
> "Fixing" this caused problems for the vzeroupper handling on x86.
> The pass that inserts the vzerouppers assumes that no 256-bit or 512-bit
> values are live across a call unless the call takes a 256-bit or 512-bit
> argument:
>
> /* Needed mode is set to AVX_U128_CLEAN if there are
> no 256bit or 512bit modes used in function arguments. */
>
> This implicitly relies on:
>
> /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED. The only ABI that
> saves SSE registers across calls is Win64 (thus no need to check the
> current ABI here), and with AVX enabled Win64 only guarantees that
> the low 16 bytes are saved. */
>
> static bool
> ix86_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
> unsigned int regno, machine_mode mode)
> {
> return SSE_REGNO_P (regno) && GET_MODE_SIZE (mode) > 16;
> }
>
> The comment suggests that this code is only needed for Win64 and that
> not testing for Win64 is just a simplification. But in practice it was
> needed for correctness on GNU/Linux and other targets too, since without
> it the RA would be able to keep 256-bit and 512-bit values in SSE
> registers across calls that are known not to clobber them.
>
> This patch conservatively treats calls as AVX_U128_ANY if the RA can see
> that some SSE registers are not touched by a call. There are then no
> regressions if the ix86_hard_regno_call_part_clobbered check is disabled
> for GNU/Linux (not something we should do, was just for testing).
>
> If in fact we want -fipa-ra to pretend that all functions clobber
> SSE registers above 128 bits, it'd certainly be possible to arrange
> that. But IMO that would be an optimisation decision, whereas what
> the patch is fixing is a correctness decision. So I think we should
> have this check even so.
2019-09-25 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/i386/i386.c: Include function-abi.h.
(ix86_avx_u128_mode_needed): Treat function calls as AVX_U128_ANY
if they preserve some 256-bit or 512-bit SSE registers.
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c 2019-09-25 16:47:48.000000000 +0100
+++ gcc/config/i386/i386.c 2019-09-25 16:47:49.089962608 +0100
@@ -95,6 +95,7 @@ #define IN_TARGET_CODE 1
#include "i386-builtins.h"
#include "i386-expand.h"
#include "i386-features.h"
+#include "function-abi.h"
/* This file should be included last. */
#include "target-def.h"
@@ -13511,6 +13512,15 @@ ix86_avx_u128_mode_needed (rtx_insn *ins
}
}
+ /* If the function is known to preserve some SSE registers,
+ RA and previous passes can legitimately rely on that for
+ modes wider than 256 bits. It's only safe to issue a
+ vzeroupper if all SSE registers are clobbered. */
+ const function_abi &abi = insn_callee_abi (insn);
+ if (!hard_reg_set_subset_p (reg_class_contents[ALL_SSE_REGS],
+ abi.mode_clobbers (V4DImode)))
+ return AVX_U128_ANY;
+
return AVX_U128_CLEAN;
}