[AArch64_be] Fix vec_select hi/lo mask confusions.

James Greenhalgh james.greenhalgh@arm.com
Wed Jul 30 10:46:00 GMT 2014


On Wed, Jul 30, 2014 at 11:21:40AM +0100, Richard Biener wrote:
> On Wed, Jul 30, 2014 at 12:10 PM, James Greenhalgh
> <james.greenhalgh@arm.com> wrote:
> >
> > Hi,
> >
> > A vec_select mask exists in GCC's world-view of lane ordering. The
> > "low-half" of the vector { a, b, c, d } is { a, b }, which on big-endian
> > will be in the high bits of the architectural register. On little-endian,
> > these lanes will be in the low bits of the architectural register.
> > We therefore need different masks depending on our target endian-ness.
> > The diagram below may help.
> >
> > We must draw the distinction when building masks which select one half of the
> > vector.  An instruction selecting architectural low-lanes for a big-endian
> > target, must be described using a mask selecting GCC high-lanes.
> >
> >                  Big-Endian             Little-Endian
> >
> > GCC             0   1   2   3           3   2   1   0
> >               | x | x | x | x |       | x | x | x | x |
> > Architecture    3   2   1   0           3   2   1   0
> >
> > Low Mask:         { 2, 3 }                { 0, 1 }
> > High Mask:        { 0, 1 }                { 2, 3 }
> >
> > The way we implement this requires some "there is no spoon" thinking to avoid
> > pattern duplication. We define a vec_par_cnst_lo_half mask to always
> > refer to the low architectural lanes. I gave some thought to renaming this
> > vec_par_cnst_arch_lo_half, but it didn't add much meaning. I'm happy to
> > take bike-shedding towards a more self-documenting naming scheme.
> >
> > No regressions spotted on aarch64_be-none-elf or aarch64-none-elf.
> >
> > OK for trunk?
> 
> Please make sure the above is still correct if you rip out all
> if (BYTES_BIG_ENDIAN) cases from tree-vect*.c.

It will be, yes.

The RTL and Tree/Gimple level representations will have a consistent view
that still won't match up with the way the architecture thinks of lanes and
elements. On our big-endian systems, the lowest address in memory gets loaded
to the highest numbered lane in register. GCC thinks of the lowest address
in memory as the lowest element in its vectors, which makes sense, but causes
a mismatch. So when GCC wants bits 0-32 of a V4SI vector extracted it really
means it wants what would be array element 0, so we need to map that to
extract bits 94-127 from the register.

This is all just back-end magic to keep the mid-end endianness agnostic.

There will, of course, be some patterns we'll have to clean up after we fix
tree-vect*.c, but this fundamental mismatch of lane numbering won't change.
We'll just have the same pain as the other big-endian backends adjusting
patterns as needed.

Cheers,
James

> 
> Richard.
> 
> > Thanks,
> > James
> >
> > ---
> > gcc/
> >
> > 2014-07-30  James Greenhalgh  <james.greenhalgh@arm.com>
> >
> >         * config/aarch64/aarch64.c (aarch64_simd_vect_par_cnst_half): Vary
> >         the generated mask based on BYTES_BIG_ENDIAN.
> >         (aarch64_simd_check_vect_par_cnst_half): New.
> >         * config/aarch64/aarch64-protos.h
> >         (aarch64_simd_check_vect_par_cnst_half): New.
> >         * config/aarch64/predicates.md (vect_par_cnst_hi_half): Refactor
> >         the check out to aarch64_simd_check_vect_par_cnst_half.
> >         (vect_par_cnst_lo_half): Likewise.
> >         * config/aarch64/aarch64-simd.md
> >         (aarch64_simd_move_hi_quad_<mode>): Always use vec_par_cnst_lo_half.
> >         (move_hi_quad_<mode>): Always generate a low mask.
> 



More information about the Gcc-patches mailing list