[PATCH/RFC v2 3/14] Add new optabs for reducing vectors to scalars

Thu Sep 25 16:12:00 GMT 2014

Well, even that C source, you'd need to be careful and ensure that the 
vectorized loop never went round more than once, or else the additions within 
the loop would be performed in 8 bits, different from the final reduction...

So: original patch with updated commenting attached...Segher, is there any 
chance you could test this on powerpc too? (in combination with patch 2/14, 
which will need to be applied first; you can skip patch 1, and >=4.)

--Alan

Richard Biener wrote:
> On Thu, Sep 25, 2014 at 4:32 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
>> Ok, so, I've tried making reduc_plus optab take two modes: that of the
>> vector to reduce, and the result; thus allowing platforms to provide a
>> widening reduction. However, I'm keeping reduc_[us](min|max)_optab with only
>> a single mode, as widening makes no sense there.
>>
>> I've not gone as far as making the vectorizer use any such a widening
>> reduction, however: as previously stated, I'm not really sure what the input
>> source code for that even looks like (maybe in a language other than C?). If
>> we wanted to do a non-widening reduction using such an instruction (by
>> discarding the extra bits), strikes me the platform can/should provide a
>> non-widening optab for that case...
> 
> I expect it to apply to sth like
> 
> int foo (char *in, int n)
> {
>    int res = 0;
>    for (int i = 0; i < n; ++i)
>      res += *in;
>    return res;
> }
> 
> where you'd see
> 
>   temc = *in;
>   tem = (int)temc;
>   res += tem;
> 
> we probably handle this by widening the chars to ints and unrolling
> the loop enough to make that work (thus for n == 16 it would maybe
> fail to vectorize?).  It should be more efficient to pattern-detect
> this as widening reduction.
> 
>> Testing: bootstrapped on x86_64 linux + check-gcc; cross-tested
>> aarch64-none-elf check-gcc; cross-tested aarch64_be-none-elf aarch64.exp +
>> vect.exp.
>>
>> So, my feeling is that the extra complexity here doesn't really buy us
>> anything; and that if we do want to support / use widening reductions in the
>> future, we should do so with a separate, reduc_plus_widen... optab, and
>> stick with the original patch/formulation for now. (In other words: this
>> patch is a guide to how I think a dual-mode reduc_plus_optab looks, but I
>> don't honestly like it!).
>>
>> If you agree, I shall transplant the comments on scalar_reduc_to_vector from
>> this patch into the original, and then post that revised version?
> 
> I agree.  We can come back once a target implements such widening
> reduction.
> 
> Richard.
> 
>> Cheers, Alan
>>
>>
>> Richard Biener wrote:
>>> On Mon, Sep 22, 2014 at 3:26 PM, Alan Lawrence <alan.lawrence@arm.com>
>>> wrote:
>>>> Richard Biener wrote:
>>>>>
>>>>> scalar_reduc_to_vector misses a comment.
>>>>
>>>> Ok to reuse the comment in optabs.h in optabs.c also?
>>>
>>> Sure.
>>>
>>>>> I wonder if at the end we wouldn't transition all backends and then
>>>>> renaming reduc_*_scal_optab back to reduc_*_optab makes sense.
>>>>
>>>> Yes, that sounds like a plan, the _scal is a bit of a mouthful.
>>>>
>>>>> The optabs have only one mode - I wouldn't be surprised if an ISA
>>>>> invents for example v4si -> di reduction?  So do we want to make
>>>>> reduc_plus_scal_optab a little bit more future proof (maybe there
>>>>> is already an ISA that supports this kind of reduction?).
>>>>
>>>> That sounds like a plausible thing for an ISA to do, indeed. However
>>>> given
>>>> these names are only used by the autovectorizer rather than directly, the
>>>> question is what the corresponding source code looks like, and/or what
>>>> changes to the autovectorizer we might have to make to (look for code to)
>>>> exploit such an instruction.
>>>
>>> Ah, indeed.  Would be sth like a REDUC_WIDEN_SUM_EXPR or so.
>>>
>>>> At this point I could go for a
>>>> reduc_{plus,min_max}_scal_<mode><mode> which reduces from the first
>>>> vector
>>>> mode to the second scalar mode, and then make the vectorizer look only
>>>> for
>>>> cases where the second mode was the element type of the first; but I'm
>>>> not
>>>> sure I want to do anything more complicated than that at this stage.
>>>> (However, indeed it would leave the possibility open for the future.)
>>>
>>> Yeah, agreed.  For the min/max case a widen variant isn't useful anyway.
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> --Alan
>>>>
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 3_add_new_optabs.patch
Type: text/x-patch
Size: 9062 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20140925/ea01672c/attachment.bin>