This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC] Improving GCSE to reduce constant splits on ARM

From: Jeff Law <law at redhat dot com>
To: Dmitry Melnik <dm at ispras dot ru>
Cc: gcc at gcc dot gnu dot org, abel at ispras dot ru, dplotnikov at ispras dot ru
Date: Mon, 03 Jan 2011 09:23:46 -0700
Subject: Re: [RFC] Improving GCSE to reduce constant splits on ARM
References: <4D1372A3.1090803@ispras.ru>

On 12/23/10 09:02, Dmitry Melnik wrote:

Hi,

We've found that constant splitting on ARM can be very inefficient, if it's done inside a loop. For example, the expression

a = a & 0xff00ff00;

will be translated into the following code (on ARM, only 8-bit values shifted by an even number can be used as immediate arguments):
  bic     r0, r0, #16711680
  bic     r0, r0, #255
This makes perfect sense, unless this code is in a loop, and there are many instructions using the same bit mask. In that case, we would want to put 0xff00ff00 constant into a register, let pass_rtl_move_loop_invariants put it outside a loop and reuse it for every appropriate bitwise AND inside a loop.

This is a real-life example (from evas rasterization library), where fixing this issue speeds up expedite test suite on average by 6% and up to 20% on several tests.

Why the splitting happens? On 4.4, the only problem was GCSE, which propagated separate pseudo register with a constant into a consumer insn, i.e. r123 = 0xff00ff00; r124 = r125 & r123 was transformed into r124 = r125 & 0xff00ff00 After that, the constant within AND expression is not considered as loop invariant any more, and is not moved outside a loop. This can be fixed by checking whether the insn transformed by GCSE will require splitting, and if it does, then the transformation should not be done at earlier GCSE passes. We may check it by comparing rtx_cost of the constant we're going to propagate with GCSE with rtx_cost of const_int(1). If moving loop invariant fails (e.g. due to register pressure), then pass_combine still can propagate it inside AND, and in this case it will result in the same code.

After this patch http://gcc.gnu.org/ml/gcc-patches/2009-08/msg01032.html , such constants are split as early as expand pass, so there's no chance for loop invariant code motion pass to deal with them.

So, the questions are: 1) Is it really necessary to split constants on ARM at the time of expand? At least, loop invariant code motion can work better if splitting happens later.

There is a general tension between splitting the constant early and late. ie, there are cases where early splitting produces better code and cases where it produces worse code. GCC tries to find a balance which generally generates good code. Further refinement of the heuristics is often helpful.

2) Is there any reason we shouldn't prevent GCSE from propagating constants that we know will be split?

Propagating the constant and keeping it at the use site generally reduces register pressure. Like many things in GCC, it's a tradeoff and GCC attempts to do the right thing.

I would think that we're generally going to get the best code by forcing the constant into a register and only allowing it to appear within the AND insn after cse/loop are complete. I think you can achieve that by changing the operand predicate on the andXX insns within arm.md.

That way the constant will be made available to cse, licm and similar optimizations, but in the case where the constant is used once in a hunk of straightline code it can be combined into the AND insn. It's not perfect since cse/licm of the constant can increase register pressure, but I think the tradeoff is reasonable.

Jeff

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]