This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [RFC][vect]PR: 65930 teach vectorizer to handle SUM reductions with sign-change casts
- From: Richard Biener <rguenther at suse dot de>
- To: "Andre Vieira (lists)" <andre dot simoesdiasvieira at arm dot com>
- Cc: gcc-patches <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 27 Aug 2019 15:28:22 +0200 (CEST)
- Subject: Re: [RFC][vect]PR: 65930 teach vectorizer to handle SUM reductions with sign-change casts
- References: <780e0f8d-2b31-a9e8-1701-4a895e95d9b6@arm.com>
On Fri, 23 Aug 2019, Andre Vieira (lists) wrote:
> Hi Richard,
>
> I have come up with a way to teach the vectorizer to handle sign-changing
> reductions, restricted to SUM operations as I'm not sure other reductions are
> equivalent with different signs.
>
> The main nature of this approach is to let it recognize reductions of the
> form: Phi->NopConversion?->Plus/Minus-reduction->NopConversion?->Phi. Then
> vectorize the statements normally, with some extra workarounds to handle the
> conversions. This is mainly needed where it looks for uses of the result of
> the reduction, we now need to check the uses of the result of the conversion
> instead.
>
> I am curious to know what you think of this approach. I have regression tested
> this on aarch64 and x86_64 with AVX512 and it shows no regressions. On the 1
> month old version of trunk I tested on it even seems to make
> gcc.dg/vect/pr89268.c pass, where it used to fail with an ICE complaining
> about a definition not dominating a use.
Aww. Yeah, I had a half-way working patch along this line as well
and threw it away because of ugliness.
So I was hoping we can at some point refactor the reduction detection
code to use the path discovery in check_reduction_path (which is basically
a lame SCC finding algorithm), massage the detected reduction path
and in the reduction PHI meta-data record something like
"this reduction SUMs _1, _4, _3 and _5" plus for the conversions
"do the reduction in SIGN" and during code-generation just look at
the PHI node and the backedge def which we'd replace.
But of course I stopped short trying that because the reduction code
is a big mess. And I threw away the attempt that looked like yours
because I didn't want to make an even bigger mess out of it :/
On the branch throwing away the non-SLP paths I started to
refactor^Wrewrite all this but got stuck as well. One thing I
realized on the branch is that nested cycle handling should be
more straight-forward and done in a separate vectorizable_*
routine. Not sure it simplified things a lot, but well. Maybe
also simply always building a SLP graph for reductions only
helps.
Well.
Maybe you can try experimenting with amending check_reduction_path
with conversion support - from a quick look your patch wouldn't
handle
_1 = PHI <.., _4>
_2 = (unsigned) _1;
_3 = _2 + ...;
_4 = (signed) _3;
since the last stmt you expect is still a PLUS?
> The initial benchmarks I did also show a 14% improvement on x264_r on SPEC2017
> for aarch64.
Interesting, on x86 IIRC I didn't see any such big effect on x264_r
but it was the testcase I ran into this first.
Richard.