This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs


Uros Bizjak <ubizjak@gmail.com> writes:
> On Mon, Aug 5, 2019 at 12:12 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Uros Bizjak <ubizjak@gmail.com> writes:
>> > On Mon, Aug 5, 2019 at 11:13 AM Richard Sandiford
>> > <richard.sandiford@arm.com> wrote:
>> >>
>> >> Uros Bizjak <ubizjak@gmail.com> writes:
>> >> > On Sat, Aug 3, 2019 at 7:26 PM Richard Biener <rguenther@suse.de> wrote:
>> >> >>
>> >> >> On Thu, 1 Aug 2019, Uros Bizjak wrote:
>> >> >>
>> >> >> > On Thu, Aug 1, 2019 at 11:28 AM Richard Biener <rguenther@suse.de> wrote:
>> >> >> >
>> >> >> >>>> So you unconditionally add a smaxdi3 pattern - indeed this looks
>> >> >> >>>> necessary even when going the STV route.  The actual regression
>> >> >> >>>> for the testcase could also be solved by turing the smaxsi3
>> >> >> >>>> back into a compare and jump rather than a conditional move sequence.
>> >> >> >>>> So I wonder how you'd do that given that there's pass_if_after_reload
>> >> >> >>>> after pass_split_after_reload and I'm not sure we can split
>> >> >> >>>> as late as pass_split_before_sched2 (there's also a split _after_
>> >> >> >>>> sched2 on x86 it seems).
>> >> >> >>>>
>> >> >> >>>> So how would you go implement {s,u}{min,max}{si,di}3 for the
>> >> >> >>>> case STV doesn't end up doing any transform?
>> >> >> >>>
>> >> >> >>> If STV doesn't transform the insn, then a pre-reload splitter splits
>> >> >> >>> the insn back to compare+cmove.
>> >> >> >>
>> >> >> >> OK, that would work.  But there's no way to force a jumpy sequence then
>> >> >> >> which we know is faster than compare+cmove because later RTL
>> >> >> >> if-conversion passes happily re-discover the smax (or conditional move)
>> >> >> >> sequence.
>> >> >> >>
>> >> >> >>> However, considering the SImode move
>> >> >> >>> from/to int/xmm register is relatively cheap, the cost function should
>> >> >> >>> be tuned so that STV always converts smaxsi3 pattern.
>> >> >> >>
>> >> >> >> Note that on both Zen and even more so bdverN the int/xmm transition
>> >> >> >> makes it no longer profitable but a _lot_ slower than the cmp/cmov
>> >> >> >> sequence... (for the loop in hmmer which is the only one I see
>> >> >> >> any effect of any of my patches).  So identifying chains that
>> >> >> >> start/end in memory is important for cost reasons.
>> >> >> >
>> >> >> > Please note that the cost function also considers the cost of move
>> >> >> > from/to xmm. So, the cost of the whole chain would disable the
>> >> >> > transformation.
>> >> >> >
>> >> >> >> So I think the splitting has to happen after the last if-conversion
>> >> >> >> pass (and thus we may need to allocate a scratch register for this
>> >> >> >> purpose?)
>> >> >> >
>> >> >> > I really hope that the underlying issue will be solved by a machine
>> >> >> > dependant pass inserted somewhere after the pre-reload split. This
>> >> >> > way, we can split unconverted smax to the cmove, and this later pass
>> >> >> > would handle jcc and cmove instructions. Until then... yes your
>> >> >> > proposed approach is one of the ways to avoid unwanted if-conversion,
>> >> >> > although sometimes we would like to split to cmove instead.
>> >> >>
>> >> >> So the following makes STV also consider SImode chains, re-using the
>> >> >> DImode chain code.  I've kept a simple incomplete smaxsi3 pattern
>> >> >> and also did not alter the {SI,DI}mode chain cost function - it's
>> >> >> quite off for TARGET_64BIT.  With this I get the expected conversion
>> >> >> for the testcase derived from hmmer.
>> >> >>
>> >> >> No further testing sofar.
>> >> >>
>> >> >> Is it OK to re-use the DImode chain code this way?  I'll clean things
>> >> >> up some more of course.
>> >> >
>> >> > Yes, the approach looks OK to me. It makes chain building mode
>> >> > agnostic, and the chain building can be used for
>> >> > a) DImode x86_32 (as is now), but maybe 64bit minmax operation can be added.
>> >> > b) SImode x86_32 and x86_64 (this will be mainly used for SImode
>> >> > minmax and surrounding SImode operations)
>> >> > c) DImode x86_64 (also, mainly used for DImode minmax and surrounding
>> >> > DImode operations)
>> >> >
>> >> >> Still need help with the actual patterns for minmax and how the splitters
>> >> >> should look like.
>> >> >
>> >> > Please look at the attached patch. Maybe we can add memory_operand as
>> >> > operand 1 and operand 2 predicate, but let's keep things simple for
>> >> > now.
>> >> >
>> >> > Uros.
>> >> >
>> >> > Index: i386.md
>> >> > ===================================================================
>> >> > --- i386.md   (revision 274008)
>> >> > +++ i386.md   (working copy)
>> >> > @@ -17721,6 +17721,27 @@
>> >> >      std::swap (operands[4], operands[5]);
>> >> >  })
>> >> >
>> >> > +;; min/max patterns
>> >> > +
>> >> > +(define_code_attr smaxmin_rel [(smax "ge") (smin "le")])
>> >> > +
>> >> > +(define_insn_and_split "<code><mode>3"
>> >> > +  [(set (match_operand:SWI48 0 "register_operand")
>> >> > +     (smaxmin:SWI48 (match_operand:SWI48 1 "register_operand")
>> >> > +                    (match_operand:SWI48 2 "register_operand")))
>> >> > +   (clobber (reg:CC FLAGS_REG))]
>> >> > +  "TARGET_STV && TARGET_SSE4_1
>> >> > +   && can_create_pseudo_p ()"
>> >> > +  "#"
>> >> > +  "&& 1"
>> >> > +  [(set (reg:CCGC FLAGS_REG)
>> >> > +     (compare:CCGC (match_dup 1)(match_dup 2)))
>> >> > +   (set (match_dup 0)
>> >> > +     (if_then_else:SWI48
>> >> > +       (<smaxmin_rel> (reg:CCGC FLAGS_REG)(const_int 0))
>> >> > +       (match_dup 1)
>> >> > +       (match_dup 2)))])
>> >> > +
>> >>
>> >> The pattern could in theory be matched after the last pre-RA split pass
>> >> has run, so I think the pattern still needs to have constraints and be
>> >> matchable even without can_create_pseudo_p.  It looks like the split
>> >> above should work post-RA.
>> >>
>> >> A bit pedantic, because the pattern's probably fine in practice...
>> >
>> > Currently, all unmatched STV patterns split before reload, and there
>> > were no problems. If the pattern matches after last pre-RA split, then
>> > the post-reload splitter will fail, since can_create_pseudo_p also
>> > applies to the part that splits the insn.
>>
>> But what I meant was: you should be able to remove the
>> can_create_pseudo_p () and add constraints.  (You'd have to remove
>> can_create_pseudo_p () with constraints anyway, since the insn
>> wouldn't match after RA otherwise.)
>
> I was under impression that it is better to split pseudo->pseudo, so
> reload has some more freedom on what register to choose, especially
> with matched and earlyclobbered DImode regs in x86_32 DImode patterns.
> There were some complications with andn pattern (that needed
> earlyclobber on a register to avoid clobbering registers in a memory
> address), and it was necessary to clobber the whole DImode register
> pair, wasting a SImode register. We can avoid all these complications
> by splitting before the RA, where also a pseudo can be allocated.

Yeah, splitting before RA is fine.  All I meant was that:

(define_insn_and_split "<code><mode>3"
  [(set (match_operand:SWI48 0 "register_operand" "=r")
	(smaxmin:SWI48 (match_operand:SWI48 1 "register_operand" "r")
		       (match_operand:SWI48 2 "register_operand" "r")))
   (clobber (reg:CC FLAGS_REG))]
  "TARGET_STV && TARGET_SSE4_1"
  "#"
  "&& 1"
  [(set (reg:CCGC FLAGS_REG)
	(compare:CCGC (match_dup 1) (match_dup 2)))
   (set (match_dup 0)
	(if_then_else:SWI48
	  (<smaxmin_rel> (reg:CCGC FLAGS_REG) (const_int 0))
	  (match_dup 1)
	  (match_dup 2)))])

seems like it should be correct too and avoids the theoretical
problem I mentioned.  If the instruction does survive until RA then
the split should work correctly on the reloaded instruction.

Thanks,
Richard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]