[PATCH, ARM] Misaligned access support for ARM Neon

Mon Nov 30 15:43:00 GMT 2009

On Mon, 30 Nov 2009, Paul Brook wrote:

> Are you saying you think the vectorizer should be using vldr, not vld1?
> Or that you don't like this particular way of distinguishing between array 
> loads and vector copies?
> Or that we also need to fix whichever bits of the autovectorizer that know 
> about vector layout and remove the BYTES_BIG_ENDIAN hack in 
> arm_vector_always_misalign?

For misaligned support to work for big-endian, I believe the vectorizer 
needs to know exactly what the effects of the misaligned loads are.  The 
support is presently disabled for big-endian (despite the always_misalign 
code) because, for example, the vectorizer expected to be able to do an 
operation one operand of which was the result of a misaligned load and the 
other operand of which was a constant, without knowing that the constant 
elements needed to be permuted in the same way they would have been by a 
misaligned load from memory.

GCC has a very strongly embedded assumption that the combination of 
machine mode and register number or memory address defines exactly how a 
value of that type is stored in registers or in memory.  GCC defines 
vector element numbering in GENERIC, GIMPLE and RTL in such a way that the 
memory ordering for a vector mode is array ordering.

The ARM backend can in turn define what ordering it likes for vector 
values in core registers and in NEON registers, as long as all moves 
between any combination of memory, core registers and NEON registers are 
consistent with the definition (remembering that the machine-independent 
compiler might sometimes try to synthesise moves out of moves of smaller 
pieces); there are various target hooks or macros to control what SUBREG 
expressions are allowed with what interpretation, if orderings are used 
that would make some SUBREGs behave in unexpected ways.  The backend uses 
the definition that vldr/vldm order is used, which also works conveniently 
for ldm/stm to/from core registers and transfers between core and NEON 
registers.

Given that backend definition, when the vectorizer uses vld1/vst1 for big 
endian the vectorizer needs to understand that the resulting value in the 
register is not the same value, interpreted in the normal way for mode 
V4HI (say), as the value in the array in memory, but a permutation of that 
value.  If it knows what the permutation is, then it can also know when it 
is correct to carry out a vector operation between that value and another 
value: the other value must be permuted in the same way.  Likewise, if 
using vst1 the value being stored must be permuted appropriately.  This 
should not in general result in explicit permutations at runtime; it 
should generally involve consistently using vld1/vst1 for all operands, 
and permuting constants appropriately.

Misaligned support for little-endian is of course much simpler to make 
work, and still useful; I think most NEON hardware is little-endian.

-- 
Joseph S. Myers
joseph@codesourcery.com