[PATCH] aarch64: Use unions for vector tables in vqtbl[234] intrinsics

Richard Sandiford richard.sandiford@arm.com
Fri Jul 9 11:40:15 GMT 2021


Jonathan Wright <Jonathan.Wright@arm.com> writes:
> Hi,
>
> As subject, this patch uses a union instead of constructing a new opaque
> vector structure for each of the vqtbl[234] Neon intrinsics in arm_neon.h.
> This simplifies the header file and also improves code generation -
> superfluous move instructions were emitted for every register
> extraction/set in this additional structure.
>
> This change is safe because the C-level vector structure types e.g.
> uint8x16x4_t already provide a tie for sequential register allocation
> - which is required by the TBL instructions.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?

Looks good, but I think we should have some tests to defend the
RA improvements.  E.g. have things like:

  #include <arm_neon.h>

  …

  uint8x8_t
  f2_u8 (uint8x16x2_t x, uint8x8_t y)
  {
    return vqtbl2_u8 (x, y);
  }

  …

and add a scan-assembler-not for moves.

Union punning is UB for standard C++, but I think in practice we're
not going to be able to treat it as such for GCC.  This would be
far from the only thing to rely on union punning for correctness.

Thanks,
Richard


More information about the Gcc-patches mailing list