This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH rs6000]Fix PR92132

From: Segher Boessenkool <segher at kernel dot crashing dot org>
To: "Kewen.Lin" <linkw at linux dot ibm dot com>
Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Bill Schmidt <wschmidt at linux dot ibm dot com>
Date: Wed, 6 Nov 2019 17:49:51 -0600
Subject: Re: [PATCH rs6000]Fix PR92132
References: <0eacc4b3-8de4-b373-2f92-bf91bf1f2b14@linux.ibm.com> <a12fd8a6-dab1-b242-2122-0145833f155c@linux.ibm.com> <20191101231750.GG28442@gate.crashing.org> <9c16cecd-74ef-5815-e899-92fade2ff9fe@linux.ibm.com>

Hi Ke Wen,

On Tue, Nov 05, 2019 at 04:35:05PM +0800, Kewen.Lin wrote:
> >>  ;; 128-bit one's complement
> >> -(define_insn_and_split "*one_cmpl<mode>3_internal"
> >> +(define_insn_and_split "one_cmpl<mode>3_internal"
> > 
> > Instead, rename it to "one_cmpl<mode>3" and delete the define_expand that
> > serves no function?
> 
> Renamed.  Sorry, what's the "define_expand" specified here.  I thought it's
> for existing one_cmpl<mode>3 but I didn't find it. 

The expander named "one_cmpl<mode>3":

Erm.  2, not 3 :-)

(define_expand "one_cmpl<mode>2"
  [(set (match_operand:BOOL_128 0 "vlogical_operand")
        (not:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")))]
  ""
  "")

while the define_insn is

(define_insn_and_split "*one_cmpl<mode>3_internal"
  [(set (match_operand:BOOL_128 0 "vlogical_operand" "=<BOOL_REGS_OUTPUT>")
        (not:BOOL_128
          (match_operand:BOOL_128 1 "vlogical_operand" "<BOOL_REGS_UNARY>")))]
  ""
{

etc., so you can just delete the expand and rename the insn to the proper
name (one_cmpl<mode>2).  It sometimes is useful to have an expand like
this if there are multiple insns that could implement this, but that is
not the case here.

> >> +(define_code_iterator fpcmpun [ungt unge unlt unle])
> > 
> > Why these four?  Should there be more?  Should this be added to some
> > existing iterator?
> 
> For floating point comparison operator and vector type, currently rs6000
> supports eq, gt, ge, *ltgt, *unordered, *ordered, *uneq (* for unnamed).
> We can leverage gt, ge, eq for lt, le, ne, then these four left.

There are four conditions for FP: lt/gt/eq/un.  For every comparison,
exactly one of the four is true.  If not HONOR_NANS for this mode you
never have un, so it is one of lt/gt/eq then, just like with integers.

If we have HONOR_NANS(mode) (or !flag_finite_math_only), there are 14
possible combinations to test for (testing for any of the four or none
of the four is easy ;-) )

Four test just if lt, gt, eq, or un is set.  Another four test if one of
the flags is *not* set, or said differently, if one of three flags is set:
ordered, ne, unle, unge.  The remaining six test two flags each: ltgt, le,
unlt, ge, ungt, uneq.

> I originally wanted to merge them into the existing unordered or uneq, but
> I found it's hard to share their existing patterns.  For example, the uneq
> looks like:
> 
>   [(set (match_dup 3)
> 	(gt:VEC_F (match_dup 1)
> 		  (match_dup 2)))
>    (set (match_dup 4)
> 	(gt:VEC_F (match_dup 2)
> 		  (match_dup 1)))
>    (set (match_dup 0)
> 	(and:VEC_F (not:VEC_F (match_dup 3))
> 		   (not:VEC_F (match_dup 4))))]

Or ge/ge/eqv, etc. -- there are multiple options.

> While ungt looks like:
> 
>   [(set (match_dup 3)
> 	(ge:VEC_F (match_dup 1)
> 		  (match_dup 2)))
>    (set (match_dup 4)
> 	(ge:VEC_F (match_dup 2)
> 		  (match_dup 1)))
>    (set (match_dup 3)
> 	(ior:VEC_F (not:VEC_F (match_dup 3))
> 		   (not:VEC_F (match_dup 4))))
>    (set (match_dup 4)
> 	(gt:VEC_F (match_dup 1)
> 		  (match_dup 2)))
>    (set (match_dup 3)
> 	(ior:VEC_F (match_dup 3)
> 		   (match_dup 4)))]

(set (match_dup 3)
     (ge:VEC_F (match_dup 2)
               (match_dup 1)))
(set (match_dup 0)
     (not:VEC_F (match_dup 3)))

should be enough?


So we have only gt/ge/eq.

I think the following are ooptimal (not tested!):

lt(a,b) = gt(b,a)
gt(a,b) = gt(a,b)
eq(a,b) = eq(a,b)
un(a,b) = ~(ge(a,b) | ge(b,a))

ltgt(a,b) = ge(a,b) ^ ge(b,a)
le(a,b)   = ge(b,a)
unlt(a,b) = ~ge(a,b)
ge(a,b)   = ge(a,b)
ungt(a,b) = ~ge(b,a)
uneq(a,b) = ~(ge(a,b) ^ ge(b,a))

ord(a,b)  = ge(a,b) | ge(b,a)
ne(a,b)   = ~eq(a,b)
unle(a,b) = ~gt(a,b)
unge(a,b) = ~gt(b,a)

This is quite regular :-)  5 are done with one cmp; 5 are done with a cmp
and an inversion; 4 are done with two compares and one xor/eqv/or/nor.


Half are pretty simple:

lt(a,b) = gt(b,a)
gt(a,b) = gt(a,b)
eq(a,b) = eq(a,b)
le(a,b) = ge(b,a)
ge(a,b) = ge(a,b)

ltgt(a,b) = ge(a,b) ^ ge(b,a)
ord(a,b)  = ge(a,b) | ge(b,a)

The other half are the negations of those:

unge(a,b) = ~gt(b,a)
unle(a,b) = ~gt(a,b)
ne(a,b)   = ~eq(a,b)
ungt(a,b) = ~ge(b,a)
unlt(a,b) = ~ge(a,b)

uneq(a,b) = ~(ge(a,b) ^ ge(b,a))
un(a,b) = ~(ge(a,b) | ge(b,a))


And please remember to test everythin with -ffast-math :-)  That is, when
flag_finite_math_only is set.  You cannot get unordered results, then,
making the optimal sequences different in some cases (and changing what
"ne" means!)

8 codes, ordered:    never     lt   gt   ltgt eq   le   ge   ordered
8 codes, unordered:  unordered unlt ungt ne   uneq unle unge always
8 codes, fast-math:  never     lt   gt   ne   eq   le   ge   always
8 codes, non-fp:     never     lt   gt   ne   eq   le   ge   always


> >> +;; Same mode for condition true/false values and predicate operand.
> >> +(define_expand "vcond_mask_<mode><mode>"
> >> +  [(match_operand:VEC_I 0 "vint_operand")
> >> +   (match_operand:VEC_I 1 "vint_operand")
> >> +   (match_operand:VEC_I 2 "vint_operand")
> >> +   (match_operand:VEC_I 3 "vint_operand")]
> >> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
> >> +{
> >> +  emit_insn (gen_vector_select_<mode> (operands[0], operands[2], operands[1],
> >> +				  operands[3]));
> >> +  DONE;
> >> +})
> > 
> > So is this exactly the same as vsel/xxsel?
> 
> Yes, expanded into if_then_else and ne against zero, can match their patterns.

Ah, so vector_select is not the canonical name.

> > You shouldn't allow those for signed comparisons, that will only hide
> > problems.
> 
> OK, moved into vec_cmpu*.

> > Why *are* there separate vec_cmp and vec_cmpu patterns, in the first place?
> 
> If I understood the question correctly, you were asking why not have one
> unique pattern for them?

Yes, it is redundant, the comparison code already says if it is an
unsigned comparison.  So this a question about the generic patterns, not
your implementation of them :-)

And if it is *one* pattern then handling LTU etc. makes perfect sense.

> I noticed some vectorization related SPNs have
> separate signed and unsigned patterns, I guess it's due to that sign matters
> for some vector instructions, some platform may only support some of them,
> using sign for fine grain queries and checks?

I think it is because one particular implementation has different machine
insns for both, the one this interface was implemented for first.

> Updated patch attached by addressing above comments.

I'll review it later, sorry.


Segher

Follow-Ups:
- Re: [PATCH rs6000]Fix PR92132
  - From: Kewen.Lin

References:
- Re: [PATCH rs6000]Fix PR92132
  - From: Segher Boessenkool
- Re: [PATCH rs6000]Fix PR92132
  - From: Kewen.Lin

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]