This is the mail archive of the
mailing list for the GCC project.
Re: [csl-asm?] PR middle-end/11821: Tweak arm_rtx_costs_1
- From: Richard Earnshaw <rearnsha at arm dot com>
- To: Roger Sayle <roger at eyesopen dot com>
- Cc: Richard dot Earnshaw at arm dot com, gcc-patches at gcc dot gnu dot org
- Date: Tue, 18 Nov 2003 17:17:20 +0000
- Subject: Re: [csl-asm?] PR middle-end/11821: Tweak arm_rtx_costs_1
- Organization: ARM Ltd.
- Reply-to: Richard dot Earnshaw at arm dot com
> Hi Richard,
> > If you've examined the code to check that what is happening is reasonable,
> > and have benchmarks that show that on average a cost of 2 insns is better
> > than a cost of 3, I'm completely happy for your patch to go in on the
> > trunk.
> Here are the results of the CSiBE analysis for my patch on arm-elf:
> total delta
> mainline 1155445
> COSTS_N_INSNS(1) 1155069 -376
> COSTS_N_INSNS(2) 1155069 -376
> COSTS_N_INSNS(3) 1155153 -292
> COSTS_N_INSNS(4) 1155185 -260
> COSTS_N_INSNS(5) 1155241 -204
> COSTS_N_INSNS(6) 1155445 0
> So claiming that the function call to __modsi3 costs is on
> average two instructions does provide a better approximation
> (improvement) than estimating that it costs three instructions.
> What's strange is that estimating function calls as one instruction
> produces identical results to two instructions. This would seem to
> indicate that we're currently underestimating the size of shifts,
> additions and multiplications on the COSTS_N_INSNS scale.
> I'll take your advice and apply the patch as is to mainline to
> resolve PR 11821. Hopefully, csl-arm will provide much better
> approximations, so the relative costs of addition to function call
> is more realistic, which should make the number of instructions per
> function call more intuitive.
> Perhaps this is the perfect combinatorial optimization problem for
> solution by a GA, such as the one used by Scott Robert Ladd for
> determining "optimal" compiler flag combinations. Just a thought.
It's possible that other effects are coming into play when the cost is
described as a single insn. For example, it may be that at that point the
compiler has already decided that an unsigned shift will always be less
expensive than division by a power of two.
Ultimately, the problem here is that the expanders often do cost metrics
based on the number of insns that they will emit at that stage in the
compilation, with no account made of how later stages may combine
instructions to produce more efficient code. This is a particular problem
on ARM where most shift instructions will end up being combined with
either an arithmetic or a logical operation, such as shift-and-add. The
multiplication synthesis algorithm is particularly bad in this respect: it
makes describing the costs accurately almost impossible.
Thanks for the analysis.