This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0
- From: Jeff Law <law at redhat dot com>
- To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>, Richard Biener <richard dot guenther at gmail dot com>, "Naveen dot Hurugalawadi at cavium dot com" <Naveen dot Hurugalawadi at cavium dot com>
- Cc: nd <nd at arm dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 29 Jun 2017 07:43:10 -0600
- Subject: Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0
- Authentication-results: sourceware.org; auth=none
- Authentication-results: ext-mx06.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
- Authentication-results: ext-mx06.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=law at redhat dot com
- Dkim-filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 4C06A3DBEB
- Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 4C06A3DBEB
- References: <AM5PR0802MB2610F55321C5107F420AA98183D20@AM5PR0802MB2610.eurprd08.prod.outlook.com>
On 06/29/2017 05:20 AM, Wilco Dijkstra wrote:
> Richard Biener wrote:
>> Hurugalawadi, Naveen wrote:
>>> The code (m1 > m2) * d code should be optimized as m1> m2 ? d : 0.
>> What's the reason of this transform? I expect that the HW multiplier
>> is quite fast given one operand is either zero or one and a multiplication
>> is a gimple operation that's better handled in optimizations than
>> COND_EXPRs which eventually expand to conditional code which
>> would be much slower.
> Even really fast multipliers have several cycles latency, and this is generally
> fixed irrespectively of the inputs. Maybe you were thinking about division?
And on some targets, just getting the arguments into the right register
bank is many cycles. Think HPPA where integer multiply occurs in the
floating point unit. Though I don't think that oddity should drive this
> Additionally integer multiply typically has much lower throughput than other
> ALU operations like conditional move - a modern CPU may have 4 ALUs
> but only 1 multiplier, so removing redundant integer multiplies is always good.
I'd tend to agree in general.