[PATCH][1/2] Fix PR68553

Alan Lawrence alan.lawrence@arm.com
Fri Dec 4 15:32:00 GMT 2015


On 27/11/15 08:30, Richard Biener wrote:
>
> This is part 1 of a fix for PR68533 which shows that some targets
> cannot can_vec_perm_p on an identity permutation.  I chose to fix
> this in the vectorizer by detecting the identity itself but with
> the current structure of vect_transform_slp_perm_load this is
> somewhat awkward.  Thus the following no-op patch simplifies it
> greatly (from the times it was restricted to do interleaving-kind
> of permutes).  It turned out to not be 100% no-op as we now can
> handle non-adjacent source operands so I split it out from the
> actual fix.
>
> The two adjusted testcases no longer fail to vectorize because
> of "need three vectors" but unadjusted would fail because there
> are simply not enough scalar iterations in the loop.  I adjusted
> that and now we vectorize it just fine (running into PR68559
> which I filed).
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>
> Richard.
>
> 2015-11-27  Richard Biener  <rguenther@suse.de>
>
> 	PR tree-optimization/68553
> 	* tree-vect-slp.c (vect_get_mask_element): Remove.
> 	(vect_transform_slp_perm_load): Implement in a simpler way.
>
> 	* gcc.dg/vect/pr45752.c: Adjust.
> 	* gcc.dg/vect/slp-perm-4.c: Likewise.

On aarch64 and ARM targets, this causes

PASS->FAIL: gcc.dg/vect/O3-pr36098.c scan-tree-dump-times vect "vectorizing 
stmts using SLP" 0

That is, we now vectorize using SLP, when previously we did not.

On aarch64 (and I expect ARM too), previously we used a VEC_LOAD_LANES, without 
unrolling, but now we unroll * 4, and vectorize using 3 loads and permutes:

../gcc/gcc/testsuite/gcc.dg/vect/O3-pr36098.c:15:2: note: add new stmt: 
vect__31.15_94 = VEC_PERM_EXPR <vect__31.11_87, vect__31.12_89, { 0, 1, 2, 4 }>;
../gcc/gcc/testsuite/gcc.dg/vect/O3-pr36098.c:15:2: note: add new stmt: 
vect__31.16_95 = VEC_PERM_EXPR <vect__31.12_89, vect__31.13_91, { 1, 2, 4, 5 }>;
../gcc/gcc/testsuite/gcc.dg/vect/O3-pr36098.c:15:2: note: add new stmt: 
vect__31.17_96 = VEC_PERM_EXPR <vect__31.13_91, vect__31.14_93, { 2, 4, 5, 6 }>

which *is* a valid vectorization strategy...


--Alan



More information about the Gcc-patches mailing list