Bug 13585

Summary: Incorrect optimisation of call to sfunc
Product: gcc Reporter: stuart.menefy
Component: targetAssignee: Jorn Wolfgang Rennecke <amylaar>
Severity: normal CC: gcc-bugs
Priority: P2 Keywords: wrong-code
Version: 3.3.2   
Target Milestone: 3.3.3   
Host: i686-pc-linux-gnu Target: sh4-linux
Build: i686-pc-linux-gnu Known to work:
Known to fail: Last reconfirmed: 2004-01-09 18:17:53
Attachments: Test file
Proposed patch for 3.3 and 3.4

Description stuart.menefy 2004-01-06 15:33:23 UTC
The problem occurs when building the attached code. The command line
   % sh4-linux-gcc -O2 -c pcm_params.i -fPIC -o pcm_params.lo       

The generated code for line 8252:
         tmp = *params;
which is a struct assignment, is:

gcc version 3.2 20020529 (experimental) / 3.2.1 20021022 (prerelease)
and 3.3.4:
        mov.l   .L22,r0
        mov     r14,r4
        mov.w   .L32,r7
        mov     r10,r5
        mov.l   @(r0,r12),r0
        mov     #74,r6
        add     r14,r7
        mov.l   r0,@(36,r7)
        jsr     @r7
        mov     r0,r7

The problem is that the compiler has migrated the "mov r0,r7" which sets
up the address to jump to, into the delay slot.

This problem is seen in SuperH's 3.2/3.2.1 compiler, and a 3.3.2
compiler, but not in a 3.4 snapshot, however this may just be a side
effect of slightly different optimisations.
Comment 1 stuart.menefy 2004-01-06 15:34:31 UTC
Created attachment 5420 [details]
Test file
Comment 2 Jorn Wolfgang Rennecke 2004-01-09 18:17:08 UTC
The macro INSN_REFERENCES_ARE_DELAYED is used to indicate that for some
instruction patterns, which are actually function calls with reduced register
usage, the references are delayed like for a normal function call.
(called sfuncs for the SH).
Unfortunately, the effect applies to all references, including the function
address that is called - there is no interface to selectively exclude it.
The pa port hides the problem by hiding the function address, but that is
not suitable for the SH because the function address load has some latency,
and moreover the SH4 is superscalar, so scheduling is required for decent

A solution by fixing the interface could be adding a new macro/hook that
allows the target to extract the function address from the instruction,
or a convention for writing rtl so that reorg can tell what reference is
not delayed, e.g. we could use a (use (call (addr))) pattern to denote that
the reference to addr is not delayed.

On the other hand, delayed branch scheduling is the penultimate optimization
that is not under close control of the machine description (instruction
splitting and final peepholing are done according to specfic patterns in the
md file), and the last one is branch shortening, and both these passes have
traditionally been not very aggressive in reordering instructions.  The SH
uses a special pattern, use_sfunc_addr, to prevent unwanted use of a function
address computing instruction in the sfunc delay slot.  Most of the time,
this pattern isn't even needed, because sfunc addresses for non-PIC code
are loaded pc-relatively, and a good schedule requires these loads to be
separated from the sfunc call by a few instructions.
However, for PIC, on the testcase we currently generate inefficient code
which does not only do a PIC load of the function address and stores it
into the stack, but also does a register-register copy at the end.
This register-register copy has zero latency, so ends up separated from
the sfunc call only by the use_sfunc_addr pattern.
There is a bit of code in reorg.c:fill_slots_from_thread after this comment:

      /* If this insn is a register-register copy and the next insn has
         a use of our destination, change it to use our source.  That way,
         it will become a candidate for our delay slot the next time
         through this loop.  This case occurs commonly in loops that
         scan a list.

which changes the use_sfunc_addr pattern to make it ineffective.
This change also happens in 3.4 20040108, thus potentially enabling
the invalid use of the move instruction in the delay slot there too,
3.4 just happens to pick a different delay slot insn, but we should
not rely on it do do that in call cases.

I am currently testing a patch that makes the recognition of
use_sfunc_addr fail if its register no longer agrees with the guarded
sfunc, thus disabling the transformation in fill_slots_from_thread.
Comment 3 Jorn Wolfgang Rennecke 2004-01-09 18:23:13 UTC
Created attachment 5447 [details]
Proposed patch for 3.3 and 3.4
Comment 4 CVS Commits 2004-01-12 12:59:47 UTC
Subject: Bug 13585

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	amylaar@gcc.gnu.org	2004-01-12 12:59:40

Modified files:
	gcc            : ChangeLog 
	gcc/config/sh  : sh-protos.h sh.c sh.md 

Log message:
	PR target/13585
	* sh-protos.h (check_use_sfunc_addr): Declare.
	* sh.c (extract_sfunc_addr, check_use_sfunc_addr): New functions.
	* sh.md (use_sfunc_addr): Use check_use_sfunc_addr in insn predicate.


Comment 5 CVS Commits 2004-01-12 17:03:30 UTC
Subject: Bug 13585

CVSROOT:	/cvs/gcc
Module name:	gcc
Branch: 	gcc-3_3-branch
Changes by:	amylaar@gcc.gnu.org	2004-01-12 17:03:26

Modified files:
	gcc            : ChangeLog 
	gcc/config/sh  : sh-protos.h sh.c sh.md 

Log message:
	PR target/13585
	* sh-protos.h (check_use_sfunc_addr): Declare.
	* sh.c (extract_sfunc_addr, check_use_sfunc_addr): New functions.
	* sh.md (use_sfunc_addr): Use check_use_sfunc_addr in insn predicate.


Comment 6 Jorn Wolfgang Rennecke 2004-01-12 17:19:22 UTC
Applied patch to 3.3 branch and mainline.