This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0
- From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- To: Richard Biener <richard dot guenther at gmail dot com>, "Naveen dot Hurugalawadi at cavium dot com" <Naveen dot Hurugalawadi at cavium dot com>
- Cc: nd <nd at arm dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 29 Jun 2017 11:20:22 +0000
- Subject: Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0
- Authentication-results: sourceware.org; auth=none
- Authentication-results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com;
- Nodisclaimer: True
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
Richard Biener wrote:
> Hurugalawadi, Naveen wrote:
> > The code (m1 > m2) * d code should be optimized as m1> m2 ? d : 0.
> What's the reason of this transform? I expect that the HW multiplier
> is quite fast given one operand is either zero or one and a multiplication
> is a gimple operation that's better handled in optimizations than
> COND_EXPRs which eventually expand to conditional code which
> would be much slower.
Even really fast multipliers have several cycles latency, and this is generally
fixed irrespectively of the inputs. Maybe you were thinking about division?
Additionally integer multiply typically has much lower throughput than other
ALU operations like conditional move - a modern CPU may have 4 ALUs
but only 1 multiplier, so removing redundant integer multiplies is always good.
Note (m1 > m2) is also a conditional expression which will result in branches
for floating point expressions and on some targets even for integers. Moving
the multiply into the conditional expression generates the best code:
Integer version:
f1:
cmp w0, 100
csel w0, w1, wzr, gt
ret
f2:
cmp w0, 100
cset w0, gt
mul w0, w0, w1
ret
Float version:
f3:
movi v1.2s, #0
cmp w0, 100
fcsel s0, s0, s1, gt
ret
f4:
cmp w0, 100
bgt .L8
movi v1.2s, #0
fmul s0, s0, s1 // eh???
.L8:
ret
Wilco