This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFA: autoincrement patches for gcc 4 - updated patch


Jeffrey Law wrote:

On Sat, 2006-07-15 at 16:02 +0100, Paul Brook wrote:


When I look at this patch, it seems to me that it turns
pseudo-assembly code like this:

   r1 = r0 + 8
   r2 = (r1)
   r3 = r0 + 12
   r4 = (r3)
...
However, it seems to me that we can view this as two separate
optimizations.  The first one is to change the first code above into
this:

   r1 = r0 + 8
   r2 = (r1)
   r1 = r1 + 4
   r4 = (r1)

On machines without register offset addressing and with relatively few
registers, this is a useful optimization because it decreases register
pressure.


This can also useful on machines with limited immediate ranges (eg. most RISC machines). Typically it occurs with large structures, ie. when "+ 12" requires multiple instructions.

We've seen evidence that this transformation would help Thumb code on CSiBE.


If you look at PRE+strength reduction that's exactly what it will
do. It considers r0 + 8 and r0 + 12 as equivalent and thus
removes the r0 + 12 expression evaluation as it's redundant.


combine wants r1 and r3 separate so that it can generate
r2 = (r0+r8)
r4 = (r0+r12)
on processors where this is possible. So will you be running thes PRE+strength reduction
pass between combine and flow (or its replacement)?
And what are you going to do about the increased scheduling rigidity?
What are you going to do when r0+8 is used in more than one place, but separated by another
r0 use? Applying PRE naiively will increase the instruction count.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]