This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [21/32] Remove global call sets: LRA


>>> This caused:
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91994
>
> Thanks for reducing & tracking down the underlying cause.
>
>> This change doesn't work with -mzeroupper.  When -mzeroupper is used,
>> upper bits of vector registers are clobbered upon callee return if any
>> MM/ZMM registers are used in callee.  Even if YMM7 isn't used, upper
>> bits of YMM7 can still be clobbered by vzeroupper when YMM1 is used.
>
> The problem here really is that the pattern is just:
>
> (define_insn "avx_vzeroupper"
>   [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)]
>   "TARGET_AVX"
>   "vzeroupper"
>   ...)
>
> and so its effect on the registers isn't modelled at all in rtl.
> Maybe one option would be to add a parallel:
>
>   (set (reg:V2DI N) (reg:V2DI N))
>
> for each register.  Or we could do something like I did for the SVE
> tlsdesc calls, although here that would mean using a call pattern for
> something that isn't really a call.  Or we could reinstate clobber_high
> and use that, but that's very much third out of three.
>
> I don't think we should add target hooks to get around this, since that's
> IMO papering over the issue.
>
> I'll try the parallel set thing first.

Please note that vzeroupper insertion pass runs after register
allocation, so in effect vzeroupper pattern is hidden to the register
allocator.

Uros.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]