[PATCH] arm: Auto-vectorization for MVE and Neon: vhadd/vrhadd
Wed Jun 9 15:49:18 GMT 2021
On Wed, 9 Jun 2021 at 17:39, Richard Sandiford
> Christophe Lyon <email@example.com> writes:
> > This patch adds support for auto-vectorization of average value
> > computation using vhadd or vrhadd, for both MVE and Neon.
> > The patch adds the needed [u]avg<mode>3_[floor|ceil] patterns to
> > vec-common.md, I'm not sure how to factorize them without introducing
> > an unspec iterator?
> > It also adds tests for 'floor' and for 'ceil', each for MVE and Neon.
> > Vectorization works with 8-bit and 16 bit input/output vectors, but
> > not with 32-bit ones because the vectorizer expects wider types
> > availability for the intermediate values, but int32_t + int32_t does
> > not involve wider types in the IR.
> > The testcases use a cast to int64_t to workaround this and enable
> > vectorization with vectors of 32-bit elements.
> We're probably in violent agreement here and just phrasing it
> differently :-), but IMO this isn't a workaround. It would be actively
> wrong to use V(R)HADD for the (u)int32_t version without the cast:
> V(R)HADD effectively computes a 33-bit addition result, instead of
> the 32-bit addition result in the original (cast-free) source code.
> I guess one could argue that overflow is undefined for int32_t + int32_t
> and so we could compute the full 33-bit result instead of using modulo
> arithmetic. That might be too surprising though. It also wouldn't
> help with uint32_t, since we do need to use modulo arithmetic for
> uint32_t + uint32_t.
Right. It would have been clearer if I had written different functions
for the different types, only needing the cast for the int32_t.
I used "workaround" because the int64_t cast looks strange with int8_t
> So personally I think it would be better to drop the last two paragraphs.
> The patch LGTM though, thanks.
More information about the Gcc-patches