This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions
- From: Steven Bosscher <stevenb dot gcc at gmail dot com>
- To: Richard Guenther <richard dot guenther at gmail dot com>
- Cc: Andreas Krebbel <krebbel at linux dot vnet dot ibm dot com>, gcc-patches at gcc dot gnu dot org, Richard Henderson <rth at redhat dot com>
- Date: Wed, 13 Jul 2011 23:49:10 +0200
- Subject: Re: [PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions
- References: <20110713131305.GA5348@bart> <CAFiYyc0OpHGr8_45xXq7=Xmxp1uApGFkRyt-f5_yXwOrKzgZzw@mail.gmail.com>
On Wed, Jul 13, 2011 at 4:34 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Wed, Jul 13, 2011 at 3:13 PM, Andreas Krebbel
> <krebbel@linux.vnet.ibm.com> wrote:
>> Hi,
>>
>> the widening_mul pass might increase the number of multiplications in
>> the code by transforming
>>
>> a = b * c
>> d = a + 2
>> e = a + 3
>>
>> into:
>>
>> d = b * c + 2
>> e = b * c + 3
>>
>> under the assumption that an FMA instruction is not more expensive
>> than a simple add. ?This certainly isn't always true. ?While e.g. on
>> s390 an fma is indeed not slower than an add execution-wise it has
>> disadvantages regarding instruction grouping. ?It doesn't group with
>> any other instruction what has a major impact on the instruction
>> dispatch bandwidth.
>>
>> The following patch tries to figure out the costs for adds, mults and
>> fmas by building an RTX and asking the backends cost function in order
>> to estimate whether it is whorthwhile doing the transformation.
>>
>> With that patch the 436.cactus hotloop contains 28 less
>> multiplications than before increasing performance slightly (~2%).
>>
>> Bootstrapped and regtested on x86_64 and s390x.
>
> Ick ;)
+1
> Maybe this is finally the time to introduce target hook(s) to
> get us back costs for trees? ?For this case we'd need two
> actually, or just one - dependent on what finegrained information
> we pass. ?Choices:
>
> ?tree_code_cost (enum tree_code)
> ?tree_code_cost (enum tree_code, enum machine_mode mode)
> ?unary_cost (enum tree_code, tree actual_arg0) // args will be mostly
> SSA names or constants, but at least they are typed - works for
> mixed-typed operations
> ?binary_cost (...)
> ?...
> ?unary_cost (enum tree_code, enum tree_code arg0_kind) // constant
> vs. non-constant arg, but lacks type/mode
Or maybe add a cost function for all named insns (i.e.
http://gcc.gnu.org/onlinedocs/gccint/Standard-Names.html#Standard-Names)?
I think that any form of lower GIMPLE will not be so low level that
more combinations will exist than the available named patterns. It
should be possible to write a gen* tool using rtx_costs to compute
some useful cost metric for all named patterns. How complicated that
could be (modes, reg vs. mem, etc.), I don't know... But at least that
way we don't end up with multiple target costs depending on the IR in
use.
Ciao!
Steven