This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: mips-elf-gcc -fno-delayed-branch problem
- From: Richard Sandiford <rdsandiford at googlemail dot com>
- To: Toshi Morita <tm314159 at yahoo dot com>
- Cc: gcc at gcc dot gnu dot org
- Date: Sat, 21 May 2011 08:37:12 +0100
- Subject: Re: mips-elf-gcc -fno-delayed-branch problem
- References: <333311.53106.qm@web114715.mail.gq1.yahoo.com>
Toshi Morita <tm314159@yahoo.com> writes:
> Maybe GAS could recognize -fno-delayed-branch to selectively disable
> branch slot filling?
I'd agree if it was -mno-delayed-branch. I think -f* options are
generally compiler options, while -m* options are target options that
could in principle be passed down to either the assembler or the linker.
> Is there a list of optimizations performed by MIPS GAS listed somewhere?
> It would be nice if these could be selectively enabled.
The only other optimisation (if it can even be called that) is increased
accuracy regarding nop insertion. Suppose we have something like:
.text
lw $4,foo
addiu $5,$5,1
jr $31
.data
foo:
.word 1
When GAS sees the LW, it doesn't know whether the LW should use a
HI/LO pair or a GP-relative access. It therefore creates a variant
"frag" that describes both possibilities. As far as GAS is concerned,
the following ADDIU starts a new subblock of code.
With -Wa,-O0, GAS doesn't try to handle dependencies between these subblocks,
and just assumes the worst. So if you assemble with -mips1, GAS has to
assume that the next subblock after the LW might use $4 straight away,
and that a nop is needed:
00000000 <.text>:
0: 3c040000 lui a0,0x0
0: R_MIPS_HI16 .data
4: 8c840000 lw a0,0(a0)
4: R_MIPS_LO16 .data
8: 00000000 nop
c: 24a50001 addiu a1,a1,1
10: 03e00008 jr ra
14: 00000000 nop
At -Wa,-O1 and above it does the sensible thing:
00000000 <.text>:
0: 3c040000 lui a0,0x0
0: R_MIPS_HI16 .data
4: 8c840000 lw a0,0(a0)
4: R_MIPS_LO16 .data
8: 24a50001 addiu a1,a1,1
c: 03e00008 jr ra
10: 00000000 nop
TBH, I think the cases where you'd want the -O0 behaviour are
vanishingly rare. It does in principle need less memory, and does
in principle assemble slightly quicker, but I don't think anyone would
notice unless they looked hard.
So -Wa,-O1 is better than the -Wa,-O0 that I mentioned previously.
Richard