This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [21/32] Remove global call sets: LRA
- From: Uros Bizjak <ubizjak at gmail dot com>
- To: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Cc: Richard Sandiford <richard dot sandiford at arm dot com>, "H. J. Lu" <hjl dot tools at gmail dot com>
- Date: Sun, 6 Oct 2019 10:45:21 +0200
- Subject: Re: [21/32] Remove global call sets: LRA
>>> This caused:
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91994
>
> Thanks for reducing & tracking down the underlying cause.
>
>> This change doesn't work with -mzeroupper. When -mzeroupper is used,
>> upper bits of vector registers are clobbered upon callee return if any
>> MM/ZMM registers are used in callee. Even if YMM7 isn't used, upper
>> bits of YMM7 can still be clobbered by vzeroupper when YMM1 is used.
>
> The problem here really is that the pattern is just:
>
> (define_insn "avx_vzeroupper"
> [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)]
> "TARGET_AVX"
> "vzeroupper"
> ...)
>
> and so its effect on the registers isn't modelled at all in rtl.
> Maybe one option would be to add a parallel:
>
> (set (reg:V2DI N) (reg:V2DI N))
>
> for each register. Or we could do something like I did for the SVE
> tlsdesc calls, although here that would mean using a call pattern for
> something that isn't really a call. Or we could reinstate clobber_high
> and use that, but that's very much third out of three.
>
> I don't think we should add target hooks to get around this, since that's
> IMO papering over the issue.
>
> I'll try the parallel set thing first.
Please note that vzeroupper insertion pass runs after register
allocation, so in effect vzeroupper pattern is hidden to the register
allocator.
Uros.