This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug tree-optimization/81673] Harmful SLP vectorization

From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Thu, 03 Aug 2017 07:56:03 +0000
Subject: [Bug tree-optimization/81673] Harmful SLP vectorization
Auto-submitted: auto-generated
References: <bug-81673-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81673

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Martin Jambor from comment #3)
> (In reply to Andrew Pinski from comment #1)
> > What happens if you use -march=intel.
> 
> With -mtune=intel, the lower half of the vector is moved directly

That's what the change tries to account for -- RA is unlikely to be
able to allocate a %xmm for the lower half.

> whereas the upper one is still done through the stack:

That one is not accounted for, but it's still one insert.  So the patch
fixes the fact that the original cost thought the first "insert" isn't
needed because the value is already in an %xmm.

> 	.cfi_startproc
> 	leaq	-56(%rsp), %rsp
> 	.cfi_def_cfa_offset 64
> 	movq	%rdx, %xmm0
> 	movq	%rcx, (%rsp)
> 	leaq	16(%rsp), %rdi
> 	movq	%r9, 8(%rsp)
> 	movhps	(%rsp), %xmm0

So with -mavx this can be a vpinsert which supports inserting from GPRs.

I wonder how fugly this insertion code gets for HImode inserts?  AFAIK
there are no HImode loads to %xmm.

Anyway, precise cost modeling is difficult without factoring out a
(pessimistic)
costing routine from the vec_init expander.  After all we do not know where
those constructor components come from -- they might come from a load
(in case of strided SLP or strided loads) in which case the story is different.

> 	movdqa	%xmm0, 32(%rsp)
> 	movq	%r8, %xmm0
> 	movhps	8(%rsp), %xmm0
> 	movdqa	%xmm0, 16(%rsp)
> 	call	bar
> 	leaq	56(%rsp), %rsp
> 	.cfi_def_cfa_offset 8
> 	ret
> 	.cfi_endproc
> 
> ...so I guess this would still incur some penalty on the benchmark,
> but I am not sure.

Adding 1 should turn the tide towards not SLP vectorizing (the 2 component
vector integer case).

References:
- [Bug tree-optimization/81673] New: Harmful SLP vectorization
  - From: jamborm at gcc dot gnu.org

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]