This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug optimization/11753] %o7 register being used immediately in the delay slot before a call with -O2

From: "ebotcazou at gcc dot gnu dot org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: 5 Sep 2003 14:17:33 -0000
Subject: [Bug optimization/11753] %o7 register being used immediately in the delay slot before a call with -O2
References: <20030731185137.11753.warren_baird@cimmetry.com>
Reply-to: gcc-bugzilla at gcc dot gnu dot org

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11753



------- Additional Comments From ebotcazou at gcc dot gnu dot org  2003-09-05 14:17 -------
It is my understanding that the optimization (which can be selectively disabled
by passing -fno-peephole to the compiler) is valid per se on SPARC:

with -fno-peephole:
	mov	1, %i0
	call	free, 0
	 nop
	b,a	.LL1

without -fno-peephole:
	mov	1, %i0
	call	free, 0
	add	%o7, (.LL1-.-4), %o7

Both codes call free() and jump to the .LL1 label. This is obvious for the
former and works for the latter as follows: the 'call' causes its address to be
copied into %o7 (via the PC) and executes the 'add' in the delay slot before
jumping to its target. The argument of the 'add' is basically '.LL1-.', which
means the difference between the address of .LL1 and the current address (that
of the 'add'). When added to the address of the 'add', this argument gives .LL1
so, when added to the address of the 'call' (which is in %o7), it gives .LL1-4.
With the final -4 correction, it gives .LL1-8. So %o7 hold .LL1-8 when the jump
is executed. Now, the return-from-subroutine insn on SPARC is 'ret', which is an
alias for

   jmp %i7+8

so the final return address is just .LL1, QED.

To help you convince yourself, you can compile the following little program

#include <stdio.h>

int main(int argc, char *argv[])
{
   if (argc > 1)
      printf("at least one argument\n");
   else
      printf("no argument\n");
   
   return 0;
}

with no optimization, manually turn in the assembly file

	call	printf, 0
	 nop
	b	.LL3
	 nop

into
        call    printf, 0
        add     %o7, (.LL3-.-4), %o7

and assemble. The program should work flawlessly.

Here's what I get when assembled with the Sun assembler:

        106a8:  40 00 40 66        call         printf
        106ac:  9e 03 e0 10        add          %o7, 16, %o7
        106b0:  11 00 00 41        sethi        %hi(0x10400), %o0
        106b4:  90 12 23 90        or           %o0, 0x390, %o0 ! 0x10790
        106b8:  40 00 40 62        call         printf
        106bc:  01 00 00 00        nop
        106c0:  90 10 20 00        clr          %o0

One big restriction is, of course, that the difference of addresses be
representable as an immediate operand (signed 13-bit field). With your testcase,
I get with the Sun assembler

	 59c:  40 00 00 00         call    	0x59c
	 5a0:  9e 03 ef 50         add     	%o7, 3920, %o7

3920 is a valid signed 13-bit operand. Moreover, doing the math:
   0x59c + 3920 + 8 = 0x14F4

and

	14f4:  01 00 00 00         nop     	
	14f8:  81 c7 e0 08         ret     	
	14fc:  81 e8 00 00         restore 	

is just 

.LL1:
	nop
	ret
	restore

So everything seems to be fine again.


Could you see why the optimization seems to be failing with your original code?
Where does the final return address point to?

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]