[PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

Jeff Law law@redhat.com
Mon Nov 27 17:47:00 GMT 2017


On 11/23/2017 04:11 AM, Alan Hayward wrote:
> 
>> On 22 Nov 2017, at 17:33, Jeff Law <law@redhat.com> wrote:
>>
>> On 11/22/2017 04:31 AM, Alan Hayward wrote:
>>>
>>>> On 21 Nov 2017, at 03:13, Jeff Law <law@redhat.com> wrote:
>>>>>
>>>>>>
>>>>>> You might also look at TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  I'd
>>>>>> totally forgotten about it.  And in fact it seems to come pretty close
>>>>>> to what you need…
>>>>>
>>>>> Yes, some of the code is similar to the way
>>>>> TARGET_HARD_REGNO_CALL_PART_CLOBBERED works. Both that code and the
>>>>> CLOBBER expr code served as a starting point for writing the patch. The main difference
>>>>> here, is that _PART_CLOBBERED is around all calls and is not tied to a specific Instruction,
>>>>> it’s part of the calling abi. Whereas clobber_high is explicitly tied to an expression (tls_desc).
>>>>> It meant there wasn’t really any opportunity to resume any existing code.
>>>> Understood.  Though your first patch mentions that you're trying to
>>>> describe partial preservation "around TLS calls". Presumably those are
>>>> represented as normal insns, not call_insn.
>>>>
>>>> That brings me back to Richi's idea of exposing a set of the low subreg
>>>> to itself using whatever mode is wide enough to cover the neon part of
>>>> the register.
>>>>
>>>> That should tell the generic parts of the compiler that you're just
>>>> clobbering the upper part and at least in theory you can implement in
>>>> the aarch64 backend and the rest of the compiler should "just work"
>>>> because that's the existing semantics of a subreg store.
>>>>
>>>> The only worry would be if a pass tried to get overly smart and
>>>> considered that kind of set a nop -- but I think I'd argue that's simply
>>>> wrong given the semantics of a partial store.
>>>>
>>>
>>> So, the instead of using clobber_high(reg X), to use set(reg X, reg X).
>>> It’s something we considered, and then dismissed.
>>>
>>> The problem then is you are now using SET semantics on those registers, and it
>>> would make the register live around the function, which might not be the case.
>>> Whereas clobber semantics will just make the register dead - which is exactly
>>> what we want (but only conditionally).
>> ?!?  A set of the subreg is the *exact* semantics you want.  It says the
>> low part is preserved while the upper part is clobbered across the TLS
>> insns.
>>
>> jeff
> 
> Consider where the TLS call is inside a loop. The compiler would normally want
> to hoist that out of the loop. By adding a set(x,x) into the parallel of the tls_desc we
> are now making x live across the loop, x is dependant on the value from the previous
> iteration, and the tls_desc can no longer be hoisted.
Hmm.  I think I see the problem you're trying to point out.  Let me
restate it and see if you agree.

The low subreg set does clearly indicate the upper part of the SVE
register is clobbered.  The problem is from a liveness standpoint the
compiler is considering the low part live, even though it's a self-set.

In fact, if that is the case, then a single TLS call (independent of a
loop) would make the low part of the register globally live.  This
should be testable.  Include one of these low part self sets on the
existing TLS calls and compile a little test function and let's look at
the liveness data.


Now it could be the case that various local analysis could sub-optimally
handle things.  You mention LICM.  I know our original LICM did have a
problem in that if it saw a use of a hard reg in a loop without seeing a
set of that hard reg it considered the register varying within the loop.
 I have no idea if we carried that forward when the loop code was
rewritten (when I looked at this it was circa 1992).


> 
> Or consider a stream of code containing two tls_desc calls (ok, the compiler might
> optimise one of the tls calls away, but this approach should be reusable for other exprs).
> Between the two set(x,x)’s x is considered live so the register allocator can’t use that
> register.
> Given that we are applying this to all the neon registers, the register allocator now throws
> an ICE because it can’t find any free hard neon registers to use.
Given your statements it sounds like the liveness infrastructure is
making those neon regs globally live when it sees the low part subreg
self-set.  Let's confirm that one way or the other and see where it
takes us.

Jeff



More information about the Gcc-patches mailing list