This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- Cc: "Naveen dot Hurugalawadi at cavium dot com" <Naveen dot Hurugalawadi at cavium dot com>, nd <nd at arm dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 29 Jun 2017 13:41:36 +0200
- Subject: Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0
- Authentication-results: sourceware.org; auth=none
- References: <AM5PR0802MB2610F55321C5107F420AA98183D20@AM5PR0802MB2610.eurprd08.prod.outlook.com>
On Thu, Jun 29, 2017 at 1:20 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> Richard Biener wrote:
>> Hurugalawadi, Naveen wrote:
>> > The code (m1 > m2) * d code should be optimized as m1> m2 ? d : 0.
>
>> What's the reason of this transform? I expect that the HW multiplier
>> is quite fast given one operand is either zero or one and a multiplication
>> is a gimple operation that's better handled in optimizations than
>> COND_EXPRs which eventually expand to conditional code which
>> would be much slower.
>
> Even really fast multipliers have several cycles latency, and this is generally
> fixed irrespectively of the inputs. Maybe you were thinking about division?
>
> Additionally integer multiply typically has much lower throughput than other
> ALU operations like conditional move - a modern CPU may have 4 ALUs
> but only 1 multiplier, so removing redundant integer multiplies is always good.
>
> Note (m1 > m2) is also a conditional expression which will result in branches
> for floating point expressions and on some targets even for integers. Moving
> the multiply into the conditional expression generates the best code:
>
> Integer version:
> f1:
> cmp w0, 100
> csel w0, w1, wzr, gt
> ret
> f2:
> cmp w0, 100
> cset w0, gt
> mul w0, w0, w1
> ret
>
> Float version:
> f3:
> movi v1.2s, #0
> cmp w0, 100
> fcsel s0, s0, s1, gt
> ret
> f4:
> cmp w0, 100
> bgt .L8
> movi v1.2s, #0
> fmul s0, s0, s1 // eh???
> .L8:
> ret
But then
int f (int m, int c)
{
return (m & 1) * c;
}
int g (int m, int c)
{
if (m & 1 != 0)
return c;
return 0;
}
f:
.LFB0:
.cfi_startproc
andl $1, %edi
movl %edi, %eax
imull %esi, %eax
ret
g:
.LFB1:
.cfi_startproc
movl %edi, %eax
andl $1, %eax
cmovne %esi, %eax
ret
anyway. As a general comment to the patch please do it as
a pattern in match.pd
(match boolean_value_range_p
@0
(if (INTEGRAL_TYPE_P (type)
&& TYPE_PRECISION (type) == 1)))
(match boolean_value_range_p
INTEGER_CST
(if (integer_zerop (t) || integer_onep (t))))
(match boolean_value_range_p
SSA_NAME
(if (INTEGRAL_TYPE_P (type)
&& ~get_nonzero_bits (t) == 1)))
(simplify
(mult:c boolean_value_range_p@0 @1)
(cond @0 @1 @0))
or something like that.
Richard.
> Wilco