[PATCH][rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1

Tue Jan 20 21:45:00 GMT 2015

Seems like the thread might have died down, so just wanted to ping it.
As Marcus says, this is holding up other patches so it'd be good to get
something in soon.  Would it be OK to commit the original patch or should
we wait?

Marcus Shawcroft <marcus.shawcroft@gmail.com> writes:
> On 14 January 2015 at 07:35, Jeff Law <law@redhat.com> wrote:
>> On 01/13/15 11:55, Eric Botcazou wrote:
>>>
>>>
>>>> (1) we have a non-paradoxical subreg;
>>>> (2) both (reg:ymode xregno) and (reg:xmode xregno) occupy full
>>>>      hard registers (no padding or unused upper bits);
>>>> (3) (reg:ymode xregno) and (reg:xmode xregno) store the same number
>>>>      of bytes (X) in each constituent hard register;
>>>> (4) the offset is a multiple of X, i.e. the data we're accessing
>>>>      is aligned to a register boundary; and
>>>> (5) endianness is regular (no differences between words and bytes,
>>>>      or between registers and memory)
>>>
>>>
>>> OK, that's a nice translation of the new code. :-)
>>>
>>> It seems to me that the patch wants to extend the support of generic
>>> subregs
>>> to modes whose sizes are not multiple of each other, which is a
>>> requirement of
>>> the existing code, but does that in a very specific case for the sake of
>>> the
>>> ARM port without saying where all the above restrictions come from.
>>
>> Basically we're lifting the restriction that the the sizes are multiples of
>> each other.  The requirements above are the set where we know it will work.
>> They are target independent, but happen to match what the ARM needs.
>>
>> The certainly do short circuit the meat of the function, that's the whole
>> point, there's this set of conditions under which we know this will work and
>> when they hold, we bypass.
>>
>> Now one could argue that instead of bypassing we should put the code to
>> handle this situation further down.  I'd be leery of doing that just from a
>> complexity standpoint.  But one could also argue that short circuiting like
>> the patch does adds complexity as well and may be a bit kludgy.

Yeah, I'm worried about the complexity too.  We allow subregs that
have padding and subregs where the number of bytes in the mode doesn't
divide equally between the number of registers.  We also have subregs where
a DImode value in R can take a different number of registers from a DFmode
value in R, despite the two modes having the same number of bits.  I've
no idea how we'd generalise the code so that those cases and the new one
just fall out as particular inputs to an overarching equation.  Or how
we make sure that the equation doesn't give nonsense results for cases
that would be better off triggering an abort (e.g. DFmode subregs of
CImode when DImode and DFmode occupy different numbers of registers).

I don't think we want to allow subregs in all cases where there is
padding.  We hit a similar case with 8-byte subregs of 24-byte values
stored in 16-byte registers (DImode, EImode and TImode respectively).
That doesn't do what we want because all three DImode pieces of the
EImode aren't independently addressable, so the abort actually helped.

TBH I find even the current code too hard to understand.  I can plug
specific inputs in and follow what happens, but I don't have a feel for
why that's the right way of handling all possible inputs.

In some ways I think we've made life hard for ourselves by trying to
implement all these rules in a target-independent way.  Subregs on
memory are easy (even though we should generally be avoiding them :-):
they start SUBREG_BYTE bytes into the MEM and occupy the number of bytes
in the outer mode.  At least AFAIK, we never have situation where an N-bit
float can occupy a different number of memory bytes from an N-bit integer.
But treating REGs as an image of memory (which I think is effectively
what we're doing) has caused problems.

As well as being complicated, doing things this way is pretty restrictive.
One of the main uses of CANNOT_CHANGE_MODE_CLASS seems to be to work
around cases where the generic rules get it wrong.  Sometimes it seems
like it would be better to let the target define which subregs it can
form and on which registers.  It would be less complicated, more general,
nd (in cases where it allows C_C_M_C to be removed) hopefully more optimal.

>> Maybe the way forward here is for someone to try and integrate this support
>> in the main part of the code and see how it looks.  Then we can pick one.

I still have a mental block on how to do that :-)

>> The downside is since this probably isn't a regression that work would need
>> to happen quickly to make it into gcc-5.
>>
>> Which leads to another option, get the release managers to sign off on the
>> kludge after gcc-5 branches and only install the kludge on the gcc-5 branch
>> and insisting the other solution go in for gcc-6 and beyond.  Not sure if
>> they'd do that, but it's a discussion that could happen.
>
> This issue is currently gating a number of patches that get big endian
> working on aarch64 (all of which are on the list), it would be good if
> we could get this addressed in some form in gcc-5 rather than put out
> a second release of gcc with borked BE aarch64 support.
>
> Cheers
> /Marcus

Thanks,
Richard