RFA: autoincrement patches for gcc 4 - updated patch

Mon Jul 17 14:47:00 GMT 2006

Jeffrey Law wrote:

>On Sat, 2006-07-15 at 16:02 +0100, Paul Brook wrote:
>  
>
>>>When I look at this patch, it seems to me that it turns
>>>pseudo-assembly code like this:
>>>
>>>    r1 = r0 + 8
>>>    r2 = (r1)
>>>    r3 = r0 + 12
>>>    r4 = (r3)
>>>...
>>>However, it seems to me that we can view this as two separate
>>>optimizations.  The first one is to change the first code above into
>>>this:
>>>
>>>    r1 = r0 + 8
>>>    r2 = (r1)
>>>    r1 = r1 + 4
>>>    r4 = (r1)
>>>
>>>On machines without register offset addressing and with relatively few
>>>registers, this is a useful optimization because it decreases register
>>>pressure.
>>>      
>>>
>>This can also useful on machines with limited immediate ranges (eg. most RISC 
>>machines). Typically it occurs with large structures, ie. when "+ 12" 
>>requires multiple instructions.
>>
>>We've seen evidence that this transformation would help Thumb code on CSiBE.
>>    
>>
>If you look at PRE+strength reduction that's exactly what it will
>do.  It considers r0 + 8 and r0 + 12 as equivalent and thus
>removes the r0 + 12 expression evaluation as it's redundant.
>  
>
combine wants r1 and r3 separate so that it can generate
r2 = (r0+r8)
r4 = (r0+r12)
on processors where this is possible.  So will you be running thes 
PRE+strength reduction
pass between combine and flow (or its replacement)?
And what are you going to do about the increased scheduling rigidity?
What are you going to do when r0+8 is used in more than one place, but 
separated by another
r0 use?  Applying PRE naiively will increase the instruction count.