This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs

On Sat, Aug 3, 2019 at 7:26 PM Richard Biener <> wrote:
> On Thu, 1 Aug 2019, Uros Bizjak wrote:
> > On Thu, Aug 1, 2019 at 11:28 AM Richard Biener <> wrote:
> >
> >>>> So you unconditionally add a smaxdi3 pattern - indeed this looks
> >>>> necessary even when going the STV route.  The actual regression
> >>>> for the testcase could also be solved by turing the smaxsi3
> >>>> back into a compare and jump rather than a conditional move sequence.
> >>>> So I wonder how you'd do that given that there's pass_if_after_reload
> >>>> after pass_split_after_reload and I'm not sure we can split
> >>>> as late as pass_split_before_sched2 (there's also a split _after_
> >>>> sched2 on x86 it seems).
> >>>>
> >>>> So how would you go implement {s,u}{min,max}{si,di}3 for the
> >>>> case STV doesn't end up doing any transform?
> >>>
> >>> If STV doesn't transform the insn, then a pre-reload splitter splits
> >>> the insn back to compare+cmove.
> >>
> >> OK, that would work.  But there's no way to force a jumpy sequence then
> >> which we know is faster than compare+cmove because later RTL
> >> if-conversion passes happily re-discover the smax (or conditional move)
> >> sequence.
> >>
> >>> However, considering the SImode move
> >>> from/to int/xmm register is relatively cheap, the cost function should
> >>> be tuned so that STV always converts smaxsi3 pattern.
> >>
> >> Note that on both Zen and even more so bdverN the int/xmm transition
> >> makes it no longer profitable but a _lot_ slower than the cmp/cmov
> >> sequence... (for the loop in hmmer which is the only one I see
> >> any effect of any of my patches).  So identifying chains that
> >> start/end in memory is important for cost reasons.
> >
> > Please note that the cost function also considers the cost of move
> > from/to xmm. So, the cost of the whole chain would disable the
> > transformation.
> >
> >> So I think the splitting has to happen after the last if-conversion
> >> pass (and thus we may need to allocate a scratch register for this
> >> purpose?)
> >
> > I really hope that the underlying issue will be solved by a machine
> > dependant pass inserted somewhere after the pre-reload split. This
> > way, we can split unconverted smax to the cmove, and this later pass
> > would handle jcc and cmove instructions. Until then... yes your
> > proposed approach is one of the ways to avoid unwanted if-conversion,
> > although sometimes we would like to split to cmove instead.
> So the following makes STV also consider SImode chains, re-using the
> DImode chain code.  I've kept a simple incomplete smaxsi3 pattern
> and also did not alter the {SI,DI}mode chain cost function - it's
> quite off for TARGET_64BIT.  With this I get the expected conversion
> for the testcase derived from hmmer.
> No further testing sofar.
> Is it OK to re-use the DImode chain code this way?  I'll clean things
> up some more of course.

Yes, the approach looks OK to me. It makes chain building mode
agnostic, and the chain building can be used for
a) DImode x86_32 (as is now), but maybe 64bit minmax operation can be added.
b) SImode x86_32 and x86_64 (this will be mainly used for SImode
minmax and surrounding SImode operations)
c) DImode x86_64 (also, mainly used for DImode minmax and surrounding
DImode operations)

> Still need help with the actual patterns for minmax and how the splitters
> should look like.

Please look at the attached patch. Maybe we can add memory_operand as
operand 1 and operand 2 predicate, but let's keep things simple for


Attachment: minmax-md.diff.txt
Description: Text document

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]