[PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs

Uros Bizjak ubizjak@gmail.com
Thu Aug 1 09:38:00 GMT 2019


On Thu, Aug 1, 2019 at 11:28 AM Richard Biener <rguenther@suse.de> wrote:

> > > So you unconditionally add a smaxdi3 pattern - indeed this looks
> > > necessary even when going the STV route.  The actual regression
> > > for the testcase could also be solved by turing the smaxsi3
> > > back into a compare and jump rather than a conditional move sequence.
> > > So I wonder how you'd do that given that there's pass_if_after_reload
> > > after pass_split_after_reload and I'm not sure we can split
> > > as late as pass_split_before_sched2 (there's also a split _after_
> > > sched2 on x86 it seems).
> > >
> > > So how would you go implement {s,u}{min,max}{si,di}3 for the
> > > case STV doesn't end up doing any transform?
> >
> > If STV doesn't transform the insn, then a pre-reload splitter splits
> > the insn back to compare+cmove.
>
> OK, that would work.  But there's no way to force a jumpy sequence then
> which we know is faster than compare+cmove because later RTL
> if-conversion passes happily re-discover the smax (or conditional move)
> sequence.
>
> > However, considering the SImode move
> > from/to int/xmm register is relatively cheap, the cost function should
> > be tuned so that STV always converts smaxsi3 pattern.
>
> Note that on both Zen and even more so bdverN the int/xmm transition
> makes it no longer profitable but a _lot_ slower than the cmp/cmov
> sequence... (for the loop in hmmer which is the only one I see
> any effect of any of my patches).  So identifying chains that
> start/end in memory is important for cost reasons.

Please note that the cost function also considers the cost of move
from/to xmm. So, the cost of the whole chain would disable the
transformation.

> So I think the splitting has to happen after the last if-conversion
> pass (and thus we may need to allocate a scratch register for this
> purpose?)

I really hope that the underlying issue will be solved by a machine
dependant pass inserted somewhere after the pre-reload split. This
way, we can split unconverted smax to the cmove, and this later pass
would handle jcc and cmove instructions. Until then... yes your
proposed approach is one of the ways to avoid unwanted if-conversion,
although sometimes we would like to split to cmove instead.

Uros.



More information about the Gcc-patches mailing list