This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][AArch64] Improve dup pattern


On Tue, Jun 20, 2017 at 11:57:59AM +0100, Wilco Dijkstra wrote:
> Improve the dup pattern to prefer vector registers.  When doing a dup
> after a load, the register allocator thinks the costs are identical
> and chooses an integer load.  However a dup from an integer register
> includes an int->fp transfer which is not modelled.  Adding a '?' to
> the integer variant means the cost is increased slightly so we prefer
> using a vector register.  This improves the following example:
> 
> #include <arm_neon.h>
> void f(unsigned *a, uint32x4_t *b)
> {
>   b[0] = vdupq_n_u32(a[1]);
>   b[1] = vdupq_n_u32(a[2]);
> }
> 
> Before:
> 	ldr	w2, [x0, 4]
> 	dup	v0.4s, w2
> 	str	q0, [x1]
> 	ldr	w0, [x0, 8]
> 	dup	v0.4s, w0
> 	str	q0, [x1, 16]
> 	ret
> 
> After:
> 	ldr	s0, [x0, 4]
> 	dup	v0.4s, v0.s[0]
> 	str	q0, [x1]
> 	ldr	s0, [x0, 8]
> 	dup	v0.4s, v0.s[0]
> 	str	q0, [x1, 16]
> 	ret
> 
> Passes regress & bootstrap, OK for commit?
> 
> ChangeLog:
> 2017-06-20  Wilco Dijkstra  <wdijkstr@arm.com>
> 
> 	* config/aarch64/aarch64-simd.md (aarch64_simd_dup):
> 	Swap alternatives, make integer dup more expensive.

Have you tested this in cases where an integer dup is definitely the right
thing to do?

e.g. in

  #include <arm_neon.h>
  void f(unsigned a, unsigned b, uint32x4_t *c)
  {
    c[0] = vdupq_n_u32(a);
    c[1] = vdupq_n_u32(b);
  }

And similar cases? If these still look good, then the patch is OK - though
I'm still very nervous about the register allocator cost model!

Thanks,
James


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]