This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0


On Thu, Jun 29, 2017 at 1:20 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> Richard Biener wrote:
>> Hurugalawadi, Naveen wrote:
>> > The code (m1 > m2) * d code should be optimized as m1> m2 ? d : 0.
>
>> What's the reason of this transform?  I expect that the HW multiplier
>> is quite fast given one operand is either zero or one and a multiplication
>> is a gimple operation that's better handled in optimizations than
>> COND_EXPRs which eventually expand to conditional code which
>> would be much slower.
>
> Even really fast multipliers have several cycles latency, and this is generally
> fixed irrespectively of the inputs. Maybe you were thinking about division?
>
> Additionally integer multiply typically has much lower throughput than other
> ALU operations like conditional move - a modern CPU may have 4 ALUs
> but only 1 multiplier, so removing redundant integer multiplies is always good.
>
> Note (m1 > m2) is also a conditional expression which will result in branches
> for floating point expressions and on some targets even for integers. Moving
> the multiply into the conditional expression generates the best code:
>
> Integer version:
> f1:
>         cmp    w0, 100
>         csel   w0, w1, wzr, gt
>         ret
> f2:
>         cmp    w0, 100
>         cset   w0, gt
>         mul    w0, w0, w1
>         ret
>
> Float version:
> f3:
>         movi   v1.2s, #0
>         cmp    w0, 100
>         fcsel  s0, s0, s1, gt
>         ret
> f4:
>         cmp    w0, 100
>         bgt    .L8
>         movi   v1.2s, #0
>         fmul   s0, s0, s1  // eh???
> .L8:
>         ret

But then

int f (int m, int c)
{
  return (m & 1) * c;
}
int g (int m, int c)
{
  if (m & 1 != 0)
    return c;
  return 0;
}

f:
.LFB0:
        .cfi_startproc
        andl    $1, %edi
        movl    %edi, %eax
        imull   %esi, %eax
        ret
g:
.LFB1:
        .cfi_startproc
        movl    %edi, %eax
        andl    $1, %eax
        cmovne  %esi, %eax
        ret

anyway.  As a general comment to the patch please do it as
a pattern in match.pd

(match boolean_value_range_p
 @0
 (if (INTEGRAL_TYPE_P (type)
      && TYPE_PRECISION (type) == 1)))
(match boolean_value_range_p
 INTEGER_CST
 (if (integer_zerop (t) || integer_onep (t))))
(match boolean_value_range_p
 SSA_NAME
 (if (INTEGRAL_TYPE_P (type)
      && ~get_nonzero_bits (t) == 1)))

(simplify
 (mult:c boolean_value_range_p@0 @1)
 (cond @0 @1 @0))

or something like that.

Richard.

> Wilco


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]