This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC][vect]PR: 65930 teach vectorizer to handle SUM reductions with sign-change casts


On Fri, 23 Aug 2019, Andre Vieira (lists) wrote:

> Hi Richard,
> 
> I have come up with a way to teach the vectorizer to handle sign-changing
> reductions, restricted to SUM operations as I'm not sure other reductions are
> equivalent with different signs.
> 
> The main nature of this approach is to let it recognize reductions of the
> form: Phi->NopConversion?->Plus/Minus-reduction->NopConversion?->Phi. Then
> vectorize the statements normally, with some extra workarounds to handle the
> conversions. This is mainly needed where it looks for uses of the result of
> the reduction, we now need to check the uses of the result of the conversion
> instead.
> 
> I am curious to know what you think of this approach. I have regression tested
> this on aarch64 and x86_64 with AVX512 and it shows no regressions. On the 1
> month old version of trunk I tested on it even seems to make
> gcc.dg/vect/pr89268.c pass, where it used to fail with an ICE complaining
> about a definition not dominating a use.

Aww.  Yeah, I had a half-way working patch along this line as well
and threw it away because of ugliness.

So I was hoping we can at some point refactor the reduction detection
code to use the path discovery in check_reduction_path (which is basically
a lame SCC finding algorithm), massage the detected reduction path
and in the reduction PHI meta-data record something like
"this reduction SUMs _1, _4, _3 and _5" plus for the conversions
"do the reduction in SIGN" and during code-generation just look at
the PHI node and the backedge def which we'd replace.

But of course I stopped short trying that because the reduction code
is a big mess.  And I threw away the attempt that looked like yours
because I didn't want to make an even bigger mess out of it :/

On the branch throwing away the non-SLP paths I started to 
refactor^Wrewrite all this but got stuck as well.  One thing I
realized on the branch is that nested cycle handling should be
more straight-forward and done in a separate vectorizable_*
routine.  Not sure it simplified things a lot, but well.  Maybe
also simply always building a SLP graph for reductions only
helps.

Well.

Maybe you can try experimenting with amending check_reduction_path
with conversion support - from a quick look your patch wouldn't
handle

_1 = PHI <.., _4>
_2 = (unsigned) _1;
_3 = _2 + ...;
_4 = (signed) _3;

since the last stmt you expect is still a PLUS?

> The initial benchmarks I did also show a 14% improvement on x264_r on SPEC2017
> for aarch64.

Interesting, on x86 IIRC I didn't see any such big effect on x264_r
but it was the testcase I ran into this first.

Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]