This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: More type narrowing in match.pd


On Thu, 30 Apr 2015, Jeff Law wrote:

On 04/30/2015 01:17 AM, Marc Glisse wrote:

+/* This is another case of narrowing, specifically when there's an outer
+   BIT_AND_EXPR which masks off bits outside the type of the innermost
+   operands.   Like the previous case we have to convert the operands
+   to unsigned types to avoid introducing undefined behaviour for the
+   arithmetic operation.  */
+(for op (minus plus)

No mult? or widen_mult with a different pattern? (maybe that's already
done elsewhere)
No mult. When I worked on the pattern for 47477, supporting mult clearly regressed the generated code -- presumably because we can often widen the operands for free.

It would help with the testcase below, but I am willing to accept that the cases where it hurts are more common (and guessing if it will help or hurt may be hard), while with +- the cases that help are more common.

void f(short*a) {
  a = __builtin_assume_aligned(a,128);
  for (int i = 0; i < (1<<22); ++i) {
#ifdef EASY
    a[i] *= a[i];
#else
    int x = a[i];
    x *= x;
    a[i] = x;
#endif
  }
}

With EASY, a nice little loop:
.L2:
	movdqa	(%rdi), %xmm0
	addq	$16, %rdi
	pmullw	%xmm0, %xmm0
	movaps	%xmm0, -16(%rdi)
	cmpq	%rdi, %rax
	jne	.L2

while without EASY, we get the uglier:
.L2:
	movdqa	(%rdi), %xmm0
	addq	$16, %rdi
	movdqa	%xmm0, %xmm2
	movdqa	%xmm0, %xmm1
	pmullw	%xmm0, %xmm2
	pmulhw	%xmm0, %xmm1
	movdqa	%xmm2, %xmm0
	punpckhwd	%xmm1, %xmm2
	punpcklwd	%xmm1, %xmm0
	movdqa	%xmm2, %xmm1
	movdqa	%xmm0, %xmm2
	punpcklwd	%xmm1, %xmm0
	punpckhwd	%xmm1, %xmm2
	movdqa	%xmm0, %xmm1
	punpcklwd	%xmm2, %xmm0
	punpckhwd	%xmm2, %xmm1
	punpcklwd	%xmm1, %xmm0
	movaps	%xmm0, -16(%rdi)
	cmpq	%rdi, %rax
	jne	.L2

A small pattern like
(simplify
 (vec_pack_trunc (widen_mult_lo @0 @1) (widen_mult_hi:c @0 @1))
 (mult @0 @1))

probably with some tweaks (convert to unsigned? only do it before vector lowering?), would fix this particular case, but not as well as narrowing before vectorization.

--
Marc Glisse


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]