[PATCH, ARM] Misaligned access support for ARM Neon
Joseph S. Myers
joseph@codesourcery.com
Mon Nov 30 15:43:00 GMT 2009
On Mon, 30 Nov 2009, Paul Brook wrote:
> Are you saying you think the vectorizer should be using vldr, not vld1?
> Or that you don't like this particular way of distinguishing between array
> loads and vector copies?
> Or that we also need to fix whichever bits of the autovectorizer that know
> about vector layout and remove the BYTES_BIG_ENDIAN hack in
> arm_vector_always_misalign?
For misaligned support to work for big-endian, I believe the vectorizer
needs to know exactly what the effects of the misaligned loads are. The
support is presently disabled for big-endian (despite the always_misalign
code) because, for example, the vectorizer expected to be able to do an
operation one operand of which was the result of a misaligned load and the
other operand of which was a constant, without knowing that the constant
elements needed to be permuted in the same way they would have been by a
misaligned load from memory.
GCC has a very strongly embedded assumption that the combination of
machine mode and register number or memory address defines exactly how a
value of that type is stored in registers or in memory. GCC defines
vector element numbering in GENERIC, GIMPLE and RTL in such a way that the
memory ordering for a vector mode is array ordering.
The ARM backend can in turn define what ordering it likes for vector
values in core registers and in NEON registers, as long as all moves
between any combination of memory, core registers and NEON registers are
consistent with the definition (remembering that the machine-independent
compiler might sometimes try to synthesise moves out of moves of smaller
pieces); there are various target hooks or macros to control what SUBREG
expressions are allowed with what interpretation, if orderings are used
that would make some SUBREGs behave in unexpected ways. The backend uses
the definition that vldr/vldm order is used, which also works conveniently
for ldm/stm to/from core registers and transfers between core and NEON
registers.
Given that backend definition, when the vectorizer uses vld1/vst1 for big
endian the vectorizer needs to understand that the resulting value in the
register is not the same value, interpreted in the normal way for mode
V4HI (say), as the value in the array in memory, but a permutation of that
value. If it knows what the permutation is, then it can also know when it
is correct to carry out a vector operation between that value and another
value: the other value must be permuted in the same way. Likewise, if
using vst1 the value being stored must be permuted appropriately. This
should not in general result in explicit permutations at runtime; it
should generally involve consistently using vld1/vst1 for all operands,
and permuting constants appropriately.
Misaligned support for little-endian is of course much simpler to make
work, and still useful; I think most NEON hardware is little-endian.
--
Joseph S. Myers
joseph@codesourcery.com
More information about the Gcc-patches
mailing list