This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug optimization/13585] Incorrect optimisation of call to sfunc


------- Additional Comments From amylaar at gcc dot gnu dot org  2004-01-09 18:17 -------
The macro INSN_REFERENCES_ARE_DELAYED is used to indicate that for some
instruction patterns, which are actually function calls with reduced register
usage, the references are delayed like for a normal function call.
(called sfuncs for the SH).
Unfortunately, the effect applies to all references, including the function
address that is called - there is no interface to selectively exclude it.
The pa port hides the problem by hiding the function address, but that is
not suitable for the SH because the function address load has some latency,
and moreover the SH4 is superscalar, so scheduling is required for decent
performance.

A solution by fixing the interface could be adding a new macro/hook that
allows the target to extract the function address from the instruction,
or a convention for writing rtl so that reorg can tell what reference is
not delayed, e.g. we could use a (use (call (addr))) pattern to denote that
the reference to addr is not delayed.

On the other hand, delayed branch scheduling is the penultimate optimization
that is not under close control of the machine description (instruction
splitting and final peepholing are done according to specfic patterns in the
md file), and the last one is branch shortening, and both these passes have
traditionally been not very aggressive in reordering instructions.  The SH
uses a special pattern, use_sfunc_addr, to prevent unwanted use of a function
address computing instruction in the sfunc delay slot.  Most of the time,
this pattern isn't even needed, because sfunc addresses for non-PIC code
are loaded pc-relatively, and a good schedule requires these loads to be
separated from the sfunc call by a few instructions.
However, for PIC, on the testcase we currently generate inefficient code
which does not only do a PIC load of the function address and stores it
into the stack, but also does a register-register copy at the end.
This register-register copy has zero latency, so ends up separated from
the sfunc call only by the use_sfunc_addr pattern.
There is a bit of code in reorg.c:fill_slots_from_thread after this comment:

      /* If this insn is a register-register copy and the next insn has
         a use of our destination, change it to use our source.  That way,
         it will become a candidate for our delay slot the next time
         through this loop.  This case occurs commonly in loops that
         scan a list.

which changes the use_sfunc_addr pattern to make it ineffective.
This change also happens in 3.4 20040108, thus potentially enabling
the invalid use of the move instruction in the delay slot there too,
3.4 just happens to pick a different delay slot insn, but we should
not rely on it do do that in call cases.

I am currently testing a patch that makes the recognition of
use_sfunc_addr fail if its register no longer agrees with the guarded
sfunc, thus disabling the transformation in fill_slots_from_thread.


-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|                            |1
   Last reconfirmed|0000-00-00 00:00:00         |2004-01-09 18:17:16
               date|                            |
   Target Milestone|---                         |3.3.3


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13585


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]