[PATCH, ARM, RFC] Fix vect.exp failures for NEON in big-endian mode
Mon Mar 4 15:29:00 GMT 2013
On Mon, 4 Mar 2013 13:08:57 +0000
Paul Brook <firstname.lastname@example.org> wrote:
> > > > I can't exactly remember why we didn't do that to start with. I
> > > > think the problem was ABI-related, or to do with transferring
> > > > NEON vectors to/from ARM registers when it was necessary to do
> > > > that... I'm planning to do some archaeology to try to see if I
> > > > can figure out a definitive answer.
> > >
> > > The ABI defined vector types (uint32x4_t etc) are defined to be in
> > > vldm/vstm order.
> > There's no conflict with the ABI-defined vector order -- the ABI
> > (looking at AAPCS, IHI 0042D) describes "containerized" vectors
> > which should be used to pass and return vector quantities at ABI
> > boundaries, but I couldn't find any further restrictions.
> > Internally to a function, we are still free to use vld1/vst1 vector
> > ordering. Using "containerized"/opaque transfers, the bit pattern
> > of a vector in one function (using vld1/vst1 ordering internally)
> > will of course remain unchanged if passed to another function and
> > using the same ordering there also.
> Ah, ok. If you make the ABI defined types distinct from the GCC
> generic vector types (as used by the vectorizer), then in principle
> that should work. I agree that current GCC probably does not have the
> infrastructure to do that, and some of the vector code plays a bit
> fast and loose with type conversions/subregs.
(Subregs use memory ordering for the byte offset, so I think those are
OK if we use array-order loads/stores pervasively. I'm not 100% sure
> Remember that it's not just function arguments, it's any interface
> shared between functions. i.e. including structures and global
Ugh, I hadn't considered structures or global variables :-/. If we
decide they have to use the containerized format also, then we lose a
lot of the supposed advantage of using array-format vectors
"everywhere" (apart from at procedure call boundaries), for instance if
we want code with a global variable like:
to do the right thing (i.e., with elements of myvec corresponding
one-to-one to elements of myarr), then using the containerized format
for accesses to myvec would be a non-starter.
Skimming the AAPCS, I'm not sure it actually specifies anything about
the layout of global variables which may be shared between functions
(it'd make sense to do so -- maybe it's elsewhere in the EABI
documents). Aggregates passed by value could also be
marshalled/unmarshalled like vectors, though that starts to sound much
less tractable than dealing with vectors alone.
> > Actually making that work (especially efficiently) with GCC is a
> > slightly different matter. Let's call vldm/vstm-ordered vectors
> > "containerized" format, and vld1/vst1-ordered vectors "array"
> > format. We need to do introduce the concept of marshalling vector
> > arguments from array format to containerized format when passing
> > them to a function, and unmarshalling those vector arguments back
> > the other way on function entry. AFAICT, GCC does not have suitable
> > infrastructure for implementing such functionality at present:
> > consider that e.g. vectors passed by value on the stack should use
> > containerized format, which means the called function cannot simply
> > dereference the stack pointer to read the vector:
> IIRC I/we tried to do something very similar (possibly the other way
> around) by abusing the unaligned load mechanism. I don't remember
> why that failed.
That'd be this conversation:
we only tweaked the vectorizer to always use movmisalign, leaving
intrinsics & generic vectors using vldm/vstm order. Fixing-up the
resulting chaos using ad-hoc hacks didn't go down too well with
maintainers, so the patch fizzled out.
More information about the Gcc-patches