This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC PATCH for 9] rs6000: Ordered comparisons (PR56864)


On Tue, Mar 27, 2018 at 7:20 PM, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> Hi!
>
> On Tue, Mar 27, 2018 at 09:30:35AM +0200, Uros Bizjak wrote:
>> +(define_insn "*cmpdd_cmpo"
>> +  [(set (match_operand:CCFP 0 "cc_reg_operand" "=y")
>> + (compare:CCFP (match_operand:DD 1 "gpc_reg_operand" "d")
>> +      (match_operand:DD 2 "gpc_reg_operand" "d")))
>> +   (unspec [(match_dup 1) (match_dup 2)] UNSPEC_CMPO)]
>> +  "TARGET_DFP"
>> +  "dcmpo %0,%1,%2"
>> +  [(set_attr "type" "dfp")])
>>
>> I have had some problems when adding UNSPEC tags as a parallel to a
>> compare for x86. For the testcase:
>>
>> int testo (double a, double b)
>> {
>>   return a == b;
>> }
>>
>> middle end code emits sequence like:
>
> [ snip ]
>
>> and postreload pass removes (insn 10). This was not the case when the
>> compare was implemented with a parallel.
>
> For us this works fine:
>
>         fcmpu 7,1,2
>         mfcr 3,1
>         rlwinm 3,3,31,1
>         blr
>
> (eq is not expanded as an ordered compare, only lt gt le ge are, not the
> other twelve).
>
> But say
>
> int testo (double a, double b)
> {
>   if (a < b) return -1;
>   if (a > b) return 1;
>   return 0;
> }
>
> gives with -ffast-math
>
>         fcmpu 7,1,2
>         li 3,-1
>         bltlr 7
>         mfcr 3,1
>         rlwinm 3,3,30,1
>         blr
>
> (the two compares were combined, by fwprop1) but without the flag we get
>
>         fcmpo 5,1,2
>         li 3,-1
>         bltlr 5
>         mfcr 3,4
>         rlwinm 3,3,22,1
>         fcmpo 7,1,2
>         blr
>
> (it's still combined, but the redundant compare isn't deleted).

Yes, I think this case will be fixed by wrapping the compare inside UNSPEC.

>> Also, -ffast-math on x86 emits trapping compares for all cases. For
>> that reason, unordered (non-trapping) compares were wrapped in an
>> unspec, with the expectation that -ffast-math can perform some more
>> optimizations with patterns using naked compare RTX without unspec.
>
> My patch expands with:
>
> +         if (SCALAR_FLOAT_MODE_P (mode) && HONOR_NANS (mode)
> +             && (code == LT || code == GT || code == LE || code == GE))
> +           {
> +             rtx unspec = gen_rtx_UNSPEC (VOIDmode, gen_rtvec (2, op0, op1),
> +                                          UNSPEC_CMPO);
> +             compare = gen_rtx_PARALLEL (VOIDmode,
> +                                         gen_rtvec (2, compare, unspec));
> +           }
>
> so we use only unordered compares with -ffast-math (exactly as before
> the patch, in all cases).
>
> It would be ideal if there were two separate compare codes in RTL, or
> some other way to flag it.  Or something that deletes unused ordered
> compares (if they are expressed as a parallel with an unspec).
>
> Are ordered compares faster than unordered on x86?  Strange stuff.

Not faster, but on x87 unordered compares operate only with registers,
while some (legacy) ordered can also use memory operands.

Uros.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]