This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH rs6000]Fix PR92132
- From: Segher Boessenkool <segher at kernel dot crashing dot org>
- To: "Kewen.Lin" <linkw at linux dot ibm dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Bill Schmidt <wschmidt at linux dot ibm dot com>
- Date: Wed, 6 Nov 2019 17:49:51 -0600
- Subject: Re: [PATCH rs6000]Fix PR92132
- References: <0eacc4b3-8de4-b373-2f92-bf91bf1f2b14@linux.ibm.com> <a12fd8a6-dab1-b242-2122-0145833f155c@linux.ibm.com> <20191101231750.GG28442@gate.crashing.org> <9c16cecd-74ef-5815-e899-92fade2ff9fe@linux.ibm.com>
Hi Ke Wen,
On Tue, Nov 05, 2019 at 04:35:05PM +0800, Kewen.Lin wrote:
> >> ;; 128-bit one's complement
> >> -(define_insn_and_split "*one_cmpl<mode>3_internal"
> >> +(define_insn_and_split "one_cmpl<mode>3_internal"
> >
> > Instead, rename it to "one_cmpl<mode>3" and delete the define_expand that
> > serves no function?
>
> Renamed. Sorry, what's the "define_expand" specified here. I thought it's
> for existing one_cmpl<mode>3 but I didn't find it.
The expander named "one_cmpl<mode>3":
Erm. 2, not 3 :-)
(define_expand "one_cmpl<mode>2"
[(set (match_operand:BOOL_128 0 "vlogical_operand")
(not:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")))]
""
"")
while the define_insn is
(define_insn_and_split "*one_cmpl<mode>3_internal"
[(set (match_operand:BOOL_128 0 "vlogical_operand" "=<BOOL_REGS_OUTPUT>")
(not:BOOL_128
(match_operand:BOOL_128 1 "vlogical_operand" "<BOOL_REGS_UNARY>")))]
""
{
etc., so you can just delete the expand and rename the insn to the proper
name (one_cmpl<mode>2). It sometimes is useful to have an expand like
this if there are multiple insns that could implement this, but that is
not the case here.
> >> +(define_code_iterator fpcmpun [ungt unge unlt unle])
> >
> > Why these four? Should there be more? Should this be added to some
> > existing iterator?
>
> For floating point comparison operator and vector type, currently rs6000
> supports eq, gt, ge, *ltgt, *unordered, *ordered, *uneq (* for unnamed).
> We can leverage gt, ge, eq for lt, le, ne, then these four left.
There are four conditions for FP: lt/gt/eq/un. For every comparison,
exactly one of the four is true. If not HONOR_NANS for this mode you
never have un, so it is one of lt/gt/eq then, just like with integers.
If we have HONOR_NANS(mode) (or !flag_finite_math_only), there are 14
possible combinations to test for (testing for any of the four or none
of the four is easy ;-) )
Four test just if lt, gt, eq, or un is set. Another four test if one of
the flags is *not* set, or said differently, if one of three flags is set:
ordered, ne, unle, unge. The remaining six test two flags each: ltgt, le,
unlt, ge, ungt, uneq.
> I originally wanted to merge them into the existing unordered or uneq, but
> I found it's hard to share their existing patterns. For example, the uneq
> looks like:
>
> [(set (match_dup 3)
> (gt:VEC_F (match_dup 1)
> (match_dup 2)))
> (set (match_dup 4)
> (gt:VEC_F (match_dup 2)
> (match_dup 1)))
> (set (match_dup 0)
> (and:VEC_F (not:VEC_F (match_dup 3))
> (not:VEC_F (match_dup 4))))]
Or ge/ge/eqv, etc. -- there are multiple options.
> While ungt looks like:
>
> [(set (match_dup 3)
> (ge:VEC_F (match_dup 1)
> (match_dup 2)))
> (set (match_dup 4)
> (ge:VEC_F (match_dup 2)
> (match_dup 1)))
> (set (match_dup 3)
> (ior:VEC_F (not:VEC_F (match_dup 3))
> (not:VEC_F (match_dup 4))))
> (set (match_dup 4)
> (gt:VEC_F (match_dup 1)
> (match_dup 2)))
> (set (match_dup 3)
> (ior:VEC_F (match_dup 3)
> (match_dup 4)))]
(set (match_dup 3)
(ge:VEC_F (match_dup 2)
(match_dup 1)))
(set (match_dup 0)
(not:VEC_F (match_dup 3)))
should be enough?
So we have only gt/ge/eq.
I think the following are ooptimal (not tested!):
lt(a,b) = gt(b,a)
gt(a,b) = gt(a,b)
eq(a,b) = eq(a,b)
un(a,b) = ~(ge(a,b) | ge(b,a))
ltgt(a,b) = ge(a,b) ^ ge(b,a)
le(a,b) = ge(b,a)
unlt(a,b) = ~ge(a,b)
ge(a,b) = ge(a,b)
ungt(a,b) = ~ge(b,a)
uneq(a,b) = ~(ge(a,b) ^ ge(b,a))
ord(a,b) = ge(a,b) | ge(b,a)
ne(a,b) = ~eq(a,b)
unle(a,b) = ~gt(a,b)
unge(a,b) = ~gt(b,a)
This is quite regular :-) 5 are done with one cmp; 5 are done with a cmp
and an inversion; 4 are done with two compares and one xor/eqv/or/nor.
Half are pretty simple:
lt(a,b) = gt(b,a)
gt(a,b) = gt(a,b)
eq(a,b) = eq(a,b)
le(a,b) = ge(b,a)
ge(a,b) = ge(a,b)
ltgt(a,b) = ge(a,b) ^ ge(b,a)
ord(a,b) = ge(a,b) | ge(b,a)
The other half are the negations of those:
unge(a,b) = ~gt(b,a)
unle(a,b) = ~gt(a,b)
ne(a,b) = ~eq(a,b)
ungt(a,b) = ~ge(b,a)
unlt(a,b) = ~ge(a,b)
uneq(a,b) = ~(ge(a,b) ^ ge(b,a))
un(a,b) = ~(ge(a,b) | ge(b,a))
And please remember to test everythin with -ffast-math :-) That is, when
flag_finite_math_only is set. You cannot get unordered results, then,
making the optimal sequences different in some cases (and changing what
"ne" means!)
8 codes, ordered: never lt gt ltgt eq le ge ordered
8 codes, unordered: unordered unlt ungt ne uneq unle unge always
8 codes, fast-math: never lt gt ne eq le ge always
8 codes, non-fp: never lt gt ne eq le ge always
> >> +;; Same mode for condition true/false values and predicate operand.
> >> +(define_expand "vcond_mask_<mode><mode>"
> >> + [(match_operand:VEC_I 0 "vint_operand")
> >> + (match_operand:VEC_I 1 "vint_operand")
> >> + (match_operand:VEC_I 2 "vint_operand")
> >> + (match_operand:VEC_I 3 "vint_operand")]
> >> + "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
> >> +{
> >> + emit_insn (gen_vector_select_<mode> (operands[0], operands[2], operands[1],
> >> + operands[3]));
> >> + DONE;
> >> +})
> >
> > So is this exactly the same as vsel/xxsel?
>
> Yes, expanded into if_then_else and ne against zero, can match their patterns.
Ah, so vector_select is not the canonical name.
> > You shouldn't allow those for signed comparisons, that will only hide
> > problems.
>
> OK, moved into vec_cmpu*.
> > Why *are* there separate vec_cmp and vec_cmpu patterns, in the first place?
>
> If I understood the question correctly, you were asking why not have one
> unique pattern for them?
Yes, it is redundant, the comparison code already says if it is an
unsigned comparison. So this a question about the generic patterns, not
your implementation of them :-)
And if it is *one* pattern then handling LTU etc. makes perfect sense.
> I noticed some vectorization related SPNs have
> separate signed and unsigned patterns, I guess it's due to that sign matters
> for some vector instructions, some platform may only support some of them,
> using sign for fine grain queries and checks?
I think it is because one particular implementation has different machine
insns for both, the one this interface was implemented for first.
> Updated patch attached by addressing above comments.
I'll review it later, sorry.
Segher