This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: Short Displacement Optimizations.


Hi,

On 3rd April,2002 Naveen Sharma Wrote:
> I would like to discuss problems arising out of limited
> displacement in "register+offset" addressing mode.
> e.g The SH architecture  has limitation of 4 bit displace-
> ment in "offset+register" addressing mode.
> Many architectures as ARM,MIPS16,HPPA etc have such limitations
> 
> Practical programs generally create large displacements.
> 
> Consider a simple scenario like.
> void func()
> {
> 	int A[256];
> 	int i;
> 
> 	...........
>       ...........
> }
> 
> Variable "i" is allocated after A on stack.To refer "i" in 
> the function,
> frame pointer is to be adjusted such that it lies within available 
> displacment.
> The situation would be better if "i" was allocated nearer to frame
> pointer(before A).
> 
> In general it is desirable to minimize the number of 
> instructions used  for
> setting appropriate locations in the frame.Although I am 
> still investigating
> the problems
> further, I would like to know views of the community on 
> this.And would be
> keen
> to implement a solution.

Stack re-ordering is only part of more general problem.It is just that gcc
gcc optimizes large structure/class references except for frame pointer 
based references.
Originally, GCC generated code like this, when accessing large structures:
(sh-elf target)

	mov	r1,r2  <-- r1 is base
	add	#64,r2 <-- accessing some member at offset 64
	mov.l	@r2,r3
	...
	mov	r1,r4
	add	#68,r4 <-- next member and so on.
	mov.l	@r4,r3
	...
	mov	r1,r5
	add	#72,r5
	mov.l	@r5,r3

regmove pass can optimize the above sequence to:

	add	#64,r1
	mov.l	@r1,r3
	...
	add	#4,r1
	mov.l	@r1,r3
	...
	add	#4,r1
	mov.l	@r1,r3
	add	#-72,r1


So we might also consider generating RTL assuming infinite virtual
displacement.
Just prior to register allocation, we may map to actual hardware
displacement
and re-adjusting stack there.

For stack re-ordering specifically, we might

1. Sort local variables on stack  by increasing size.
2. For locals of same size we need arrange locals on stack such that 
   adjacent refernces are at " short displacements ".
  ( Stack Offset Assignment SOA problem, I think).
3. If we organize stack early at RTL generation time, we need to 
   see that it works with reload pass, which also allocates stack slots.

Additionally, Consider 
 
void func(void)
{
	int a[128];

	func2(a);
	a[30] = a[31] + a[32];
	func3(a);
}

Looking at the output code for a sh-elf target.

	mov	r14,r3       <---r14 is base
	add	#64,r3  
	mov.l	@(60,r3),r1  <--- Accessing a[31]
	mov	r14,r2
	add	#124,r2
	mov.l	@(4,r2),r2   <----Accessing a[32]
	add	r2,r1
	mov.l	r1,@(56,r3)  <--- Storing to a[30]

Ideally, it should generate code like this:

	mov	r14,r2
	add	#120,r2
	mov.l	@(4,r2),r1
	mov.l	@(8,r2),r2
	add	r2,r1
	mov.l	r1,@r2

The current implementation (LEGITIMIZE_RELOAD_ADDRESS() etc) only
has the ability to factor out multiples of 64 (actually 60 + 64n)
out of the displacement, so if two stacks slots are adjacent but
are in different 64 byte banks the compiler generates TWO separate pointers.
So rearranging the stack slots for locality of usage is insufficient
as the rest of the compiler will fail to follow the hints.

Working further on this and views are welcome, especially from maintainers 
of parts where modifications might be necessary(function.c,reload pases
etc).

Best Regards,
	Naveen Sharma.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]