This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH rs6000]Fix PR92132


Hi!

On Thu, Nov 07, 2019 at 06:17:53PM +0800, Kewen.Lin wrote:
> on 2019/11/7 上午7:49, Segher Boessenkool wrote:
> > The expander named "one_cmpl<mode>3":
> > 
> > Erm.  2, not 3 :-)

> Ah, sorry I didn't notice we have one cmpl<mode>**3** but actually for one
> cmpl<mode>**2** expand, a bit surprised.  Done.  Thanks for pointing that.

Yeah, I suddenly couldn't find it myself either.  Real head-scratcher :-)

> > etc., so you can just delete the expand and rename the insn to the proper
> > name (one_cmpl<mode>2).  It sometimes is useful to have an expand like
> > this if there are multiple insns that could implement this, but that is
> > not the case here.
> 
> OK, example like vector_select?  :)

Sure, like that.  There are many examples where you are required to have
just one define_expand, it is called by name after all, but you want to
have different define_insns (for different cpus, say).

> > So we have only gt/ge/eq.
> > 
> > I think the following are ooptimal (not tested!):
> > 
> > lt(a,b) = gt(b,a)
> yes, this is what I used for that operator.
> 
> > gt(a,b) = gt(a,b)
> > eq(a,b) = eq(a,b)
> > un(a,b) = ~(ge(a,b) | ge(b,a))
> > 
> 
> existing code uses (~ge(a,b) & ~ge(b,a))
> but should be the same.

Yup, it's just ge/ge/nor, whatever way you write it :-)  (RTL requires
you write the expression in your form, with all the NOTs "pushed in").

> > ltgt(a,b) = ge(a,b) ^ ge(b,a)
> 
> existing code uses gt(a,b) | gt(b,a)
> but should be the same.

Yup, computes exactly the same, and exactly the same execution speeds.

Your form might be slightly easier to optimise with (it has no XOR).

> > Half are pretty simple:
> > 
> > lt(a,b) = gt(b,a)
> > gt(a,b) = gt(a,b)
> > eq(a,b) = eq(a,b)
> > le(a,b) = ge(b,a)
> > ge(a,b) = ge(a,b)
> > 
> > ltgt(a,b) = ge(a,b) ^ ge(b,a)
> > ord(a,b)  = ge(a,b) | ge(b,a)
> > 
> > The other half are the negations of those:
> > 
> > unge(a,b) = ~gt(b,a)
> > unle(a,b) = ~gt(a,b)
> > ne(a,b)   = ~eq(a,b)
> > ungt(a,b) = ~ge(b,a)
> > unlt(a,b) = ~ge(a,b)
> > 
> > uneq(a,b) = ~(ge(a,b) ^ ge(b,a))
> > un(a,b) = ~(ge(a,b) | ge(b,a))
> 
> Awesome!  Do you suggest refactoring on them?  :)

I'd do the first five in one pattern (which then swaps two ops and the
condition in the lt and le case), and the other five in another pattern.
And the rest in two or four patterns?  Just try it out, see what works
well.  It helps to do a bunch together in one pattern, but if that then
turns into special cases for everything, more might be lost than gained.

> > And please remember to test everythin with -ffast-math :-)  That is, when
> > flag_finite_math_only is set.  You cannot get unordered results, then,
> > making the optimal sequences different in some cases (and changing what
> > "ne" means!)
> 
> Thanks for the remind!  On RTL pattern, I think we won't get any un*
> related operators with -ffast-math, so that part on un* expansion
> would be fine?

Yeah, but look what you should do for "ne" :-)

> > 8 codes, ordered:    never     lt   gt   ltgt eq   le   ge   ordered
> > 8 codes, unordered:  unordered unlt ungt ne   uneq unle unge always
> > 8 codes, fast-math:  never     lt   gt   ne   eq   le   ge   always
> > 8 codes, non-fp:     never     lt   gt   ne   eq   le   ge   always
> 
> Sorry, I don't quite follow this table.  What's the column heads?

The first row is the eight possible fp conditions that are not always
true if unordered is set; the second row is those that *are* always true
if it is set.  The other two rows (which are the same) is just the eight
conditions that do not test unordered at all.

The tricky one is "ne": for FP *with* NaNs, "ne" means "less than, or
greater than, or unordered", while without NaNs (i.e. -ffast-math) it
means "less than, or greater than".

You could write the column heads as
--/--/--  lt/--/--  --/gt/--  lt/gt/--  --/--/eq  lt/--/eq  --/gt/eq  lt/gt/eq
if that helps?  Just the eight combinations of the first free flags.

> > Yes, it is redundant, the comparison code already says if it is an
> > unsigned comparison.  So this a question about the generic patterns, not
> > your implementation of them :-)
> > 
> > And if it is *one* pattern then handling LTU etc. makes perfect sense.
> 
> Fully agree, but it separates for now.  :)

Sure :-)

> Thanks again!  I've updated a new version as some comments, you can review this
> one to save your time.  :)


> +;; Return 1 if OP is a signed comparison or an equality operator.
> +(define_predicate "signed_or_equality_comparison_operator"
> +  (ior (match_operand 0 "equality_operator")
> +       (match_operand 0 "signed_comparison_operator")))
> +
> +;; Return 1 if OP is an unsigned comparison or an equality operator.
> +(define_predicate "unsigned_or_equality_comparison_operator"
> +  (ior (match_operand 0 "equality_operator")
> +       (match_operand 0 "unsigned_comparison_operator")))

Hrm.  Unpleasant.

> +(define_expand "vcond_mask_<mode><mode>"
> +  [(match_operand:VEC_I 0 "vint_operand")
> +   (match_operand:VEC_I 1 "vint_operand")
> +   (match_operand:VEC_I 2 "vint_operand")
> +   (match_operand:VEC_I 3 "vint_operand")]
> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
> +{
> +  emit_insn (gen_vector_select_<mode> (operands[0], operands[2], operands[1],
> +				  operands[3]));

Yeah, with the args swapped, good point.  Details details ;-)

> +;; For signed integer vectors comparison.
> +(define_expand "vec_cmp<mode><mode>"
> +  [(set (match_operand:VEC_I 0 "vint_operand")
> +	(match_operator 1 "signed_or_equality_comparison_operator"
> +	  [(match_operand:VEC_I 2 "vint_operand")
> +	   (match_operand:VEC_I 3 "vint_operand")]))]
> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
> +{
> +  enum rtx_code code = GET_CODE (operands[1]);
> +  rtx tmp = gen_reg_rtx (<MODE>mode);
> +  switch (code)
> +    {
> +    case NE:
> +      emit_insn (gen_vector_eq<mode> (operands[0], operands[2], operands[3]));
> +      emit_insn (gen_one_cmpl<mode>2 (operands[0], operands[0]));
> +      break;
> +    case EQ:
> +      emit_insn (gen_vector_eq<mode> (operands[0], operands[2], operands[3]));
> +      break;
> +    case GE:
> +      emit_insn (gen_vector_nlt<mode> (operands[0],operands[2], operands[3],
> +				       tmp));
> +      break;
> +    case GT:
> +      emit_insn (gen_vector_gt<mode> (operands[0], operands[2], operands[3]));
> +      break;
> +    case LE:
> +      emit_insn (gen_vector_ngt<mode> (operands[0], operands[2], operands[3],
> +				       tmp));
> +      break;
> +    case LT:
> +      emit_insn (gen_vector_gt<mode> (operands[0], operands[3], operands[2]));
> +      break;
> +    default:
> +      gcc_unreachable ();
> +      break;
> +    }
> +  DONE;
> +})

I would think this can be done easier, but it is alright for now, it can
be touched up later if we want.

> +;; For float point vectors comparison.
> +(define_expand "vec_cmp<mode><VEC_int>"

This, too.

> +  [(set (match_operand:<VEC_INT> 0 "vint_operand")
> +	 (match_operator 1 "comparison_operator"

If you make an iterator for this instead, it is simpler code (you can then
use <code> to do all these cases in one statement).

But that can be done later.  Okay for trunk.  Thanks!


Segher


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]