This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression


On 11/16/2017 11:50 AM, Alan Hayward wrote:
> 
>> On 16 Nov 2017, at 18:24, Richard Biener <richard.guenther@gmail.com> wrote:
>>
>> On November 16, 2017 7:05:30 PM GMT+01:00, Jeff Law <law@redhat.com> wrote:
>>> On 11/16/2017 05:34 AM, Alan Hayward wrote:
>>>> This is a set of patches aimed at supporting aarch64 SVE register
>>>> preservation around TLS calls.
>>>>
>>>> Across a TLS call, Aarch64 SVE does not explicitly preserve the
>>>> SVE vector registers. However, the Neon vector registers are
>>> preserved.
>>>> Due to overlapping of registers, this means the lower 128bits of all
>>>> SVE vector registers will be preserved.
>>>>
>>>> The existing GCC code will currently incorrectly assume preservation
>>>> of all of the SVE registers.
>>>>
>>>> This patch introduces a CLOBBER_HIGH expression. This behaves a bit
>>> like
>>>> a CLOBBER expression. CLOBBER_HIGH can only refer to a single
>>> register.
>>>> The mode of the expression indicates the size of the lower bits which
>>>> will be preserved. If the register contains a value bigger than this
>>>> mode then the code will treat the register as clobbered.
>>>>
>>>> The means in order to evaluate if a clobber high is relevant, we need
>>> to ensure
>>>> the mode of the existing value in a register is tracked.
>>>>
>>>> The following patches in this series add support for the
>>> CLOBBER_HIGH,
>>>> with the final patch adding CLOBBER_HIGHs around TLS_DESC calls for
>>>> aarch64. The testing performed on these patches is also detailed in
>>> the
>>>> final patch.
>>>>
>>>> These patches are based on top of the linaro-dev/sve branch.
>>>>
>>>> A simpler alternative to this patch would be to assume all Neon and
>>> SVE
>>>> registers are clobbered across TLS calls, however this would be a
>>>> performance regression against all Aarch64 targets.
>>> So just a couple design questions.
>>>
>>> Presumably there's no reasonable way to set up GCC's view of the
>>> register file to avoid this problem?  ISTM that if the SVE register was
>>> split into two, one for the part that overlapped with the neon register
>>> and one that did not, then this could be handled via standard
>>> mechanisms?
>>>
> 
> Yes, that was an early alternative option for the patch.
> 
> With that it would effect every operation that uses SVE registers. A simple
> add of two registers now has 4 inputs and two outputs. It would get in the
> way when debugging any sve dumps and be generally annoying.
> Possible that the code for that in would all be in the aarch64 target,
> (making everyone else happy!) But I suspect that there would be still be
> strange dependency issues that’d need sorting in the common code.
> 
> Whereas with this patch, there are no new oddities in non-tls compiles/dumps.
> Although the patch touches a lot of files, the changes are mostly restricted
> to places where standard clobbers were already being checked.
I'm not entirely sure that it would require doubling the number of
inputs/outputs.  It's not conceptually much different than how we
describe DImode operations on 32bit targets.  The mode selects one or
more consecutive registers, so you don't actually need anything weird in
your patterns.  This is pretty standard stuff.


It would be painful in that the Neon regs would have to interleave with
the upper part of the SVE regs in terms of register numbers.  It would
also mean that you couldn't tie together multiple neon regs into
something wider.  I'm not sure if the latter would be an issue or not.

You might also look at TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  I'd
totally forgotten about it.  And in fact it seems to come pretty close
to what you need...

> 
> 
>>> Alternately would it be easier to clobber a subreg representing the
>>> high
>>> part of the register?  Hmm, probably not.
>>
>> I thought of a set of the preserved part to itself that leaves the upper part undefined. Not sure if we have such thing or if it would work in all places that a clobber does.
> 
> I’ve not seen such a thing in the code. But it would need specific handling in
> the all the existing clobber code.
It could probably be done with a set of the low part via a subreg or
somesuch (rather than trying to clobber the upper part which was my
initial flawed idea).

jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]