Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs

On Sun, 4 Aug 2019, Uros Bizjak wrote:

> On Sun, Aug 4, 2019 at 7:23 PM Jakub Jelinek <> wrote:
> >
> > On Sun, Aug 04, 2019 at 07:11:01PM +0200, Uros Bizjak wrote:
> > > Yes, the approach looks OK to me. It makes chain building mode
> > > agnostic, and the chain building can be used for
> > > a) DImode x86_32 (as is now), but maybe 64bit minmax operation can be added.
> > > b) SImode x86_32 and x86_64 (this will be mainly used for SImode
> > > minmax and surrounding SImode operations)
> > > c) DImode x86_64 (also, mainly used for DImode minmax and surrounding
> > > DImode operations)
> > >
> > > > Still need help with the actual patterns for minmax and how the splitters
> > > > should look like.
> > >
> > > Please look at the attached patch. Maybe we can add memory_operand as
> > > operand 1 and operand 2 predicate, but let's keep things simple for
> > > now.
> >
> > Shouldn't it be used also for p{min,max}ud rather than just p{min,max}sd?
> > What about p{min,max}{s,u}{b,w,q}?  Some of those are already in SSE.
> Sure, unsigned ops will also be added. I just went through the
> Richard's patch and looked for RTXes that Richard's patch handles. I'm
> not sure about HImode and QImode minmax operations. While these can be
> added, we would need to re-run STV in HImode and QImode - I wonder if
> it is worth.

I think we can always extend later, for now I'm trying to do {SI,DI}mode
only, but yes, u{min,max} would be nice to not miss.

> > If the conversion of the chain fails, couldn't the STV pass split those
> > SImode etc. min/max patterns into code with branches, rather than turn it
> > into cmovs?
> Since these patterns require SSE4.1, we are sure that we can split
> back to cmov. But IMO, cmov/jcc issue is orthogonal to minmax
> conversion and should be handled by some other machine-specific pass
> that would
> analyse cmove insertion and eventually split unwanted cmoves back to
> jcc (based on some yet unknown metrics). Please note that there is no
> definite proof that it is beneficial to convert cmoves to jcc for all
> x86 targets.

I guess a tunable plus (micro-)benchmarking could make this decision.
But yes, this is largely independent - and if we split to jumps
then RTL if-conversion will happily turn it back to cmoves anyway.


