This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH GCC]New vectorization pattern turning cond_expr into max/min and plus/minus


Hi,
Given below test case,
int foo (unsigned short a[], unsigned int x)
{
  unsigned int i;
  for (i = 0; i < 1000; i++)
    {
      x = a[i];
      a[i] = (unsigned short)(x >= 32768 ? x - 32768 : 0);
    }
  return x;
}

it now can be vectorized on AArch64, but generated assembly is way from optimal:
.L4:
	ldr	q4, [x3, x1]
	add	w2, w2, 1
	cmp	w2, w0
	ushll	v1.4s, v4.4h, 0
	ushll2	v0.4s, v4.8h, 0
	add	v3.4s, v1.4s, v6.4s
	add	v2.4s, v0.4s, v6.4s
	cmhi	v1.4s, v1.4s, v5.4s
	cmhi	v0.4s, v0.4s, v5.4s
	and	v1.16b, v3.16b, v1.16b
	and	v0.16b, v2.16b, v0.16b
	xtn	v2.4h, v1.4s
	xtn2	v2.8h, v0.4s
	str	q2, [x3, x1]
	add	x1, x1, 16
	bcc	.L4

The vectorized loop has 15 instructions, which can be greatly simplified by turning cond_expr into max_expr, as below:
.L4:
	ldr	q1, [x3, x1]
	add	w2, w2, 1
	cmp	w2, w0
	umax	v0.8h, v1.8h, v2.8h
	add	v0.8h, v0.8h, v2.8h
	str	q0, [x3, x1]
	add	x1, x1, 16
	bcc	.L4

This patch addresses the issue by adding new vectorization pattern.
Bootstrap and test on x86_64 and AArch64.  Is it OK?

Thanks,
bin

2016-10-11  Bin Cheng  <bin.cheng@arm.com>

	* tree-vect-patterns.c (vect_recog_min_max_modify_pattern): New.
	(vect_vect_recog_func_ptrs): New element for above pattern.
	* tree-vectorizer.h (NUM_PATTERNS): Increase by 1.

gcc/testsuite/ChangeLog
2016-10-11  Bin Cheng  <bin.cheng@arm.com>

	* gcc.dg/vect/vect-umax-modify-pattern.c: New test.
	* gcc.dg/vect/vect-umin-modify-pattern.c: New test.

Attachment: umin-max-modify-pattern-20160924.txt
Description: umin-max-modify-pattern-20160924.txt


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]