[PATCH][1/2] Fix PR68553
Alan Lawrence
alan.lawrence@arm.com
Fri Dec 4 15:32:00 GMT 2015
On 27/11/15 08:30, Richard Biener wrote:
>
> This is part 1 of a fix for PR68533 which shows that some targets
> cannot can_vec_perm_p on an identity permutation. I chose to fix
> this in the vectorizer by detecting the identity itself but with
> the current structure of vect_transform_slp_perm_load this is
> somewhat awkward. Thus the following no-op patch simplifies it
> greatly (from the times it was restricted to do interleaving-kind
> of permutes). It turned out to not be 100% no-op as we now can
> handle non-adjacent source operands so I split it out from the
> actual fix.
>
> The two adjusted testcases no longer fail to vectorize because
> of "need three vectors" but unadjusted would fail because there
> are simply not enough scalar iterations in the loop. I adjusted
> that and now we vectorize it just fine (running into PR68559
> which I filed).
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>
> Richard.
>
> 2015-11-27 Richard Biener <rguenther@suse.de>
>
> PR tree-optimization/68553
> * tree-vect-slp.c (vect_get_mask_element): Remove.
> (vect_transform_slp_perm_load): Implement in a simpler way.
>
> * gcc.dg/vect/pr45752.c: Adjust.
> * gcc.dg/vect/slp-perm-4.c: Likewise.
On aarch64 and ARM targets, this causes
PASS->FAIL: gcc.dg/vect/O3-pr36098.c scan-tree-dump-times vect "vectorizing
stmts using SLP" 0
That is, we now vectorize using SLP, when previously we did not.
On aarch64 (and I expect ARM too), previously we used a VEC_LOAD_LANES, without
unrolling, but now we unroll * 4, and vectorize using 3 loads and permutes:
../gcc/gcc/testsuite/gcc.dg/vect/O3-pr36098.c:15:2: note: add new stmt:
vect__31.15_94 = VEC_PERM_EXPR <vect__31.11_87, vect__31.12_89, { 0, 1, 2, 4 }>;
../gcc/gcc/testsuite/gcc.dg/vect/O3-pr36098.c:15:2: note: add new stmt:
vect__31.16_95 = VEC_PERM_EXPR <vect__31.12_89, vect__31.13_91, { 1, 2, 4, 5 }>;
../gcc/gcc/testsuite/gcc.dg/vect/O3-pr36098.c:15:2: note: add new stmt:
vect__31.17_96 = VEC_PERM_EXPR <vect__31.13_91, vect__31.14_93, { 2, 4, 5, 6 }>
which *is* a valid vectorization strategy...
--Alan
More information about the Gcc-patches
mailing list