[Bug target/64305] New: [SH] Add support for fschg insn and 64 bit FP moves

olegendo at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Sun Dec 14 13:56:00 GMT 2014


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64305

            Bug ID: 64305
           Summary: [SH] Add support for fschg insn and 64 bit FP moves
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: olegendo at gcc dot gnu.org
            Target: sh*-*-*

Currently, 64 bit FP moves are utilized only for handling DFmode types when the
option -mfmovd (and is specified.  The way it's done right now works only on
SH4A and SH2A, since FPSCR.SZ is tied to FPSCR.PR.

On SH4A and SH2A loading DFmode types from memory using 64 bit FP moves
(FPSCR.SZ = 1) performs little/big endian swapping if FPSCR.PR = 1.  If
FPSCR.PR = 0 the two 32 bit halves are loaded as a pair of SFmode values in big
endian order.

On SH4 64 bit FP moves are only defined for FPSCR.SZ = 1 and FPSCR.PR = 0,
which allows loading of DFmode values in big endian ordering only.


64 bit FP moves can be used for accessing DFmode types in memory on SH4 little
endian, but the memory layout for those values would be have to be half
little-endian half big-endian.  This could be realized with some optional -m
setting.


64 bit FP moves for (FPSCR.SZ = 1 FPSCR.PR = 0) can be utilized on SH4, SH4A,
SH2A for doing SFmode vector loads, since the order of the vector elements is 
endian invariant.  E.g. the following

typedef float v4sf __attribute__ ((vector_size (16)));

float test (v4sf* x)
{
  return (*x)[0];
}

compiles to 

        rts
        fmov.s  @r4,fr0

regardless of the endian mode.


What is currently lacking to realize the above is FPSCR.SZ mode switching. 
Notice that the 'fschg' insn is only valid when FPSCR.PR = 0 on all FPU enabled
cores (SH2A, SH4, SH4A).  Thus FPSCR.SZ mode switching depends on FPSCR.PR mode
switching to some extent.  On SH2A and SH4 FPSCR.PR mode switching is done
using sts-modify-lds sequences of FPSCR, since there is no fpchg insn.  If SZ
and PR mode switching is done independently, multiple FPSCR mode switches might
need combining for better efficiency.


In some cases FP register-to-register moves and loads/stores of adjacent SFmode
values can also be done via 64 bit FP moves.  Recently a new pass
'pass_sched_fusion' has been added, which tries to fuse such adjacent
loads/stores.  On SH 64 bit FP moves can only operate on even register numbers,
thus fusing loads/stores has an impact on the register allocation.  The new
pass 'pass_sched_fusion' on the other hand is done before peephole2, which is
after register allocation/reload and thus will probably not be that useful on
SH.


Whether using a 64 bit FP move (either for SFmode vectors, DFmode types or
fused SFmode access) will be beneficial or not depends on the surrounding code
and the number of FPSCR.SZ mode switches that need to be inserted.  If insns
can't be grouped to minimize mode switches (see PR 64299) it might be better to
split 64 bit FP moves into 32 bit FP moves.



More information about the Gcc-bugs mailing list