[Bug tree-optimization/109088] GCC does not always vectorize conditional reduction
rguenth at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Wed Sep 27 07:15:42 GMT 2023
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rdapp at gcc dot gnu.org
--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to JuzheZhong from comment #8)
> It's because the order of the operations we are doing:
>
> For code as follows:
>
> result += mask ? a[i] + x : 0;
>
> GCC:
> result_ssa_1 = PHI <result_ssa_2, 0>
> ...
> STMT 1. tmp = a[i] + x;
> STMT 2. tmp2 = tmp + result_ssa_1;
> STMT 3. result_ssa_2 = mask ? tmp2 : result_ssa_1;
>
> Here we can see both STMT 2 and STMT 3 are using 'result_ssa_1',
> we end up with 2 uses of the PHI result. Then, we failed to vectorize.
>
> Wheras LLVM:
>
> result_ssa_1 = PHI <result_ssa_2, 0>
> ...
> IR 1. tmp = a[i] + x;
> IR 2. tmp2 = mask ? tmp : 0;
> IR 3. result_ssa_2 = tmp2 + result_ssa_1.
For floating point these are not equivalent (adding zero isn't a no-op).
> LLVM only has 1 use.
>
> Is it reasonable to swap the order in match.pd ?
if-conversion could be teached to swap this (it's if-conversion creating
the IL for conditional reductions) when valid. IIRC Robin Dapp also has
a patch to make if-conversion emit .COND_ADD instead which should make
it even better to vectorize.
More information about the Gcc-bugs
mailing list