The problem occurs when building the attached code. The command line is: % sh4-linux-gcc -O2 -c pcm_params.i -fPIC -o pcm_params.lo The generated code for line 8252: tmp = *params; which is a struct assignment, is: gcc version 3.2 20020529 (experimental) / 3.2.1 20021022 (prerelease) and 3.3.4: mov.l .L22,r0 mov r14,r4 mov.w .L32,r7 mov r10,r5 mov.l @(r0,r12),r0 mov #74,r6 add r14,r7 mov.l r0,@(36,r7) jsr @r7 mov r0,r7 The problem is that the compiler has migrated the "mov r0,r7" which sets up the address to jump to, into the delay slot. This problem is seen in SuperH's 3.2/3.2.1 compiler, and a 3.3.2 compiler, but not in a 3.4 snapshot, however this may just be a side effect of slightly different optimisations.
Created attachment 5420 [details] Test file
The macro INSN_REFERENCES_ARE_DELAYED is used to indicate that for some instruction patterns, which are actually function calls with reduced register usage, the references are delayed like for a normal function call. (called sfuncs for the SH). Unfortunately, the effect applies to all references, including the function address that is called - there is no interface to selectively exclude it. The pa port hides the problem by hiding the function address, but that is not suitable for the SH because the function address load has some latency, and moreover the SH4 is superscalar, so scheduling is required for decent performance. A solution by fixing the interface could be adding a new macro/hook that allows the target to extract the function address from the instruction, or a convention for writing rtl so that reorg can tell what reference is not delayed, e.g. we could use a (use (call (addr))) pattern to denote that the reference to addr is not delayed. On the other hand, delayed branch scheduling is the penultimate optimization that is not under close control of the machine description (instruction splitting and final peepholing are done according to specfic patterns in the md file), and the last one is branch shortening, and both these passes have traditionally been not very aggressive in reordering instructions. The SH uses a special pattern, use_sfunc_addr, to prevent unwanted use of a function address computing instruction in the sfunc delay slot. Most of the time, this pattern isn't even needed, because sfunc addresses for non-PIC code are loaded pc-relatively, and a good schedule requires these loads to be separated from the sfunc call by a few instructions. However, for PIC, on the testcase we currently generate inefficient code which does not only do a PIC load of the function address and stores it into the stack, but also does a register-register copy at the end. This register-register copy has zero latency, so ends up separated from the sfunc call only by the use_sfunc_addr pattern. There is a bit of code in reorg.c:fill_slots_from_thread after this comment: /* If this insn is a register-register copy and the next insn has a use of our destination, change it to use our source. That way, it will become a candidate for our delay slot the next time through this loop. This case occurs commonly in loops that scan a list. which changes the use_sfunc_addr pattern to make it ineffective. This change also happens in 3.4 20040108, thus potentially enabling the invalid use of the move instruction in the delay slot there too, 3.4 just happens to pick a different delay slot insn, but we should not rely on it do do that in call cases. I am currently testing a patch that makes the recognition of use_sfunc_addr fail if its register no longer agrees with the guarded sfunc, thus disabling the transformation in fill_slots_from_thread.
Created attachment 5447 [details] Proposed patch for 3.3 and 3.4
Subject: Bug 13585 CVSROOT: /cvs/gcc Module name: gcc Changes by: amylaar@gcc.gnu.org 2004-01-12 12:59:40 Modified files: gcc : ChangeLog gcc/config/sh : sh-protos.h sh.c sh.md Log message: PR target/13585 * sh-protos.h (check_use_sfunc_addr): Declare. * sh.c (extract_sfunc_addr, check_use_sfunc_addr): New functions. * sh.md (use_sfunc_addr): Use check_use_sfunc_addr in insn predicate. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.2236&r2=2.2237 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/sh/sh-protos.h.diff?cvsroot=gcc&r1=1.51&r2=1.52 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/sh/sh.c.diff?cvsroot=gcc&r1=1.242&r2=1.243 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/sh/sh.md.diff?cvsroot=gcc&r1=1.163&r2=1.164
Subject: Bug 13585 CVSROOT: /cvs/gcc Module name: gcc Branch: gcc-3_3-branch Changes by: amylaar@gcc.gnu.org 2004-01-12 17:03:26 Modified files: gcc : ChangeLog gcc/config/sh : sh-protos.h sh.c sh.md Log message: PR target/13585 * sh-protos.h (check_use_sfunc_addr): Declare. * sh.c (extract_sfunc_addr, check_use_sfunc_addr): New functions. * sh.md (use_sfunc_addr): Use check_use_sfunc_addr in insn predicate. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_3-branch&r1=1.16114.2.875&r2=1.16114.2.876 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/sh/sh-protos.h.diff?cvsroot=gcc&only_with_tag=gcc-3_3-branch&r1=1.34.2.1&r2=1.34.2.2 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/sh/sh.c.diff?cvsroot=gcc&only_with_tag=gcc-3_3-branch&r1=1.188.2.7&r2=1.188.2.8 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/sh/sh.md.diff?cvsroot=gcc&only_with_tag=gcc-3_3-branch&r1=1.133.2.1&r2=1.133.2.2
Applied patch to 3.3 branch and mainline.