This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug optimization/13585] Incorrect optimisation of call to sfunc
- From: "amylaar at gcc dot gnu dot org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 9 Jan 2004 18:17:18 -0000
- Subject: [Bug optimization/13585] Incorrect optimisation of call to sfunc
- References: <20040106153323.13585.stuart.menefy@st.com>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Additional Comments From amylaar at gcc dot gnu dot org 2004-01-09 18:17 -------
The macro INSN_REFERENCES_ARE_DELAYED is used to indicate that for some
instruction patterns, which are actually function calls with reduced register
usage, the references are delayed like for a normal function call.
(called sfuncs for the SH).
Unfortunately, the effect applies to all references, including the function
address that is called - there is no interface to selectively exclude it.
The pa port hides the problem by hiding the function address, but that is
not suitable for the SH because the function address load has some latency,
and moreover the SH4 is superscalar, so scheduling is required for decent
performance.
A solution by fixing the interface could be adding a new macro/hook that
allows the target to extract the function address from the instruction,
or a convention for writing rtl so that reorg can tell what reference is
not delayed, e.g. we could use a (use (call (addr))) pattern to denote that
the reference to addr is not delayed.
On the other hand, delayed branch scheduling is the penultimate optimization
that is not under close control of the machine description (instruction
splitting and final peepholing are done according to specfic patterns in the
md file), and the last one is branch shortening, and both these passes have
traditionally been not very aggressive in reordering instructions. The SH
uses a special pattern, use_sfunc_addr, to prevent unwanted use of a function
address computing instruction in the sfunc delay slot. Most of the time,
this pattern isn't even needed, because sfunc addresses for non-PIC code
are loaded pc-relatively, and a good schedule requires these loads to be
separated from the sfunc call by a few instructions.
However, for PIC, on the testcase we currently generate inefficient code
which does not only do a PIC load of the function address and stores it
into the stack, but also does a register-register copy at the end.
This register-register copy has zero latency, so ends up separated from
the sfunc call only by the use_sfunc_addr pattern.
There is a bit of code in reorg.c:fill_slots_from_thread after this comment:
/* If this insn is a register-register copy and the next insn has
a use of our destination, change it to use our source. That way,
it will become a candidate for our delay slot the next time
through this loop. This case occurs commonly in loops that
scan a list.
which changes the use_sfunc_addr pattern to make it ineffective.
This change also happens in 3.4 20040108, thus potentially enabling
the invalid use of the move instruction in the delay slot there too,
3.4 just happens to pick a different delay slot insn, but we should
not rely on it do do that in call cases.
I am currently testing a patch that makes the recognition of
use_sfunc_addr fail if its register no longer agrees with the guarded
sfunc, thus disabling the transformation in fill_slots_from_thread.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed| |1
Last reconfirmed|0000-00-00 00:00:00 |2004-01-09 18:17:16
date| |
Target Milestone|--- |3.3.3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13585