This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/53513] SH Target: Add support for fschg and fpchg insns
- From: "olegendo at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Sat, 11 Oct 2014 22:09:37 +0000
- Subject: [Bug target/53513] SH Target: Add support for fschg and fpchg insns
- Auto-submitted: auto-generated
- References: <bug-53513-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513
--- Comment #16 from Oleg Endo <olegendo at gcc dot gnu.org> ---
I've tried a modified example from PR 5360, using floats instead of doubles:
void loop_p (int np, int non0, float coeff[][2048], float tmp1)
{
int j, k;
for (j = non0; j < np; j++)
for (k = 0; k < j; k++)
coeff[j][j] -= tmp1 * coeff[j][k];
}
with -O2 -m4a (double mode default) and the patch from comment #15 applied:
(loop setup code omitted)
...
.L6:
cmp/pl r5 ! outer loop, set to single
bf/s .L7
sts fpscr,r7
mov.l .L16,r4
mov r0,r2
fmov.s @r3,fr1
mov r5,r1
and r4,r7
lds r7,fpscr
.align 2
.L5:
fmov.s @r2+,fr0 ! inner loop, no switch
dt r1
fneg fr0
fmac fr0,fr5,fr1
bf/s .L5
fmov.s fr1,@r3
.L7:
dt r6
add #1,r5
add r9,r0
bf/s .L6
add r8,r3
sts fpscr,r1 ! function return, set to double
mov.l .L17,r2
mov.l @r15+,r9
or r2,r1
mov.l @r15+,r8
rts
lds r1,fpscr
Obviously, if the inner loop count is small the mode set in the outer loop will
dominate. Something seems to be missing in the mode-switch optimization. The
mode switch should be just hoisted above all loops, which then can use the
fpchg insn on SH4A.