This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [csl-asm?] PR middle-end/11821: Tweak arm_rtx_costs_1


> 
> Hi Richard,
> > If you've examined the code to check that what is happening is reasonable,
> > and have benchmarks that show that on average a cost of 2 insns is better
> > than a cost of 3, I'm completely happy for your patch to go in on the
> > trunk.
> 
> Here are the results of the CSiBE analysis for my patch on arm-elf:
> 
> 			total		delta
> mainline		1155445
> COSTS_N_INSNS(1)	1155069		-376
> COSTS_N_INSNS(2)	1155069		-376
> COSTS_N_INSNS(3)	1155153		-292
> COSTS_N_INSNS(4)	1155185		-260
> COSTS_N_INSNS(5)	1155241		-204
> COSTS_N_INSNS(6)	1155445		0
> 
> 
> So claiming that the function call to __modsi3 costs is on
> average two instructions does provide a better approximation
> (improvement) than estimating that it costs three instructions.
> 
> What's strange is that estimating function calls as one instruction
> produces identical results to two instructions.  This would seem to
> indicate that we're currently underestimating the size of shifts,
> additions and multiplications on the COSTS_N_INSNS scale.
> 
> 
> I'll take your advice and apply the patch as is to mainline to
> resolve PR 11821.  Hopefully, csl-arm will provide much better
> approximations, so the relative costs of addition to function call
> is more realistic, which should make the number of instructions per
> function call more intuitive.
> 
> Perhaps this is the perfect combinatorial optimization problem for
> solution by a GA, such as the one used by Scott Robert Ladd for
> determining "optimal" compiler flag combinations.  Just a thought.
> 

It's possible that other effects are coming into play when the cost is 
described as a single insn.  For example, it may be that at that point the 
compiler has already decided that an unsigned shift will always be less 
expensive than division by a power of two.

Ultimately, the problem here is that the expanders often do cost metrics 
based on the number of insns that they will emit at that stage in the 
compilation, with no account made of how later stages may combine 
instructions to produce more efficient code.  This is a particular problem 
on ARM where most shift instructions will end up being combined with 
either an arithmetic or a logical operation, such as shift-and-add.  The 
multiplication synthesis algorithm is particularly bad in this respect: it 
makes describing the costs accurately almost impossible.

Thanks for the analysis.

R.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]