g++ optimization issue / useless instructions for stack access
Niklas Gürtler
profclonk@gmail.com
Sun May 25 12:32:00 GMT 2014
Hello GCC List,
i am currently working on a hardware API in C++11 for ARM Cortex-M3
microcontrollers. It provides an object oriented way of accessing
hardware registers. The idea is that the user need not worry about
individual registers and their composition of bit fields but can access
these with symbolic names.
The API uses temporary objects and call chaining for syntactic sugar.
The problem is now that GCC produces correct, but way too slow and too
much code.
See the attached simplified testcase (with a dummy linker script to
shorten disassembler output) and the function getInput. When compiling
with gcc-arm-embedded ( https://launchpad.net/gcc-arm-embedded ), this
is the code generated by GCC:
00000000 <getInput()>:
0: b470 push {r4, r5, r6}
2: 2300 movs r3, #0
4: b097 sub sp, #92 ; 0x5c
6: 2204 movs r2, #4
8: 2101 movs r1, #1
a: 2603 movs r6, #3
c: 2511 movs r5, #17
e: 2412 movs r4, #18
10: 480f ldr r0, [pc, #60] ; (50 <getInput()+0x50>)
12: f88d 3000 strb.w r3, [sp]
16: 9303 str r3, [sp, #12]
18: 9304 str r3, [sp, #16]
1a: 9307 str r3, [sp, #28]
1c: 9308 str r3, [sp, #32]
1e: f88d 302c strb.w r3, [sp, #44] ; 0x2c
22: f88d 6001 strb.w r6, [sp, #1]
26: f88d 202d strb.w r2, [sp, #45] ; 0x2d
2a: f88d 2038 strb.w r2, [sp, #56] ; 0x38
2e: f88d 2039 strb.w r2, [sp, #57] ; 0x39
32: f88d 5044 strb.w r5, [sp, #68] ; 0x44
36: f88d 1045 strb.w r1, [sp, #69] ; 0x45
3a: f88d 1051 strb.w r1, [sp, #81] ; 0x51
3e: f88d 4050 strb.w r4, [sp, #80] ; 0x50
42: 6800 ldr r0, [r0, #0]
44: f3c0 4080 ubfx r0, r0, #18, #1
48: b017 add sp, #92 ; 0x5c
4a: bc70 pop {r4, r5, r6}
4c: 4770 bx lr
4e: bf00 nop
50: 00c0ffee .word 0x00c0ffee
GCC allocates a stack frame ("sub sp, #92") of size 92, fills it with
some values ("strX rY, [sp, #...]") . Then that frame is discarded ("add
sp, #92"), but the data written there is never even read. So most of the
instructions in the generated code (0-e, 12-3e, 48-4A) can just be left
out, resulting in equally functional, but faster and smaller code that
doesn't use the stack at all:
00000000 <getInput()>:
0: 4b02 ldr r3, [pc, #8] ; (c <getInput()+0xc>)
2: 6818 ldr r0, [r3, #0]
4: f3c0 4080 ubfx r0, r0, #18, #1
8: 4770 bx lr
a: bf00 nop
c: 00c0ffee .word 0x00c0ffee
The actual read access here occurs in function
"RegContainer<RegType>::getReg".
The same applies to the other functions setOutput() and main(), while
configure() looks OK.
The problem occurs with both arm-none-eabi-gcc 4.8.3, 4.9.0 and also
x86_64-linux-gnu-gcc 4.8.1 (while that code is of course of little use
on x86_64).
So, the question is - is that behaviour a result of a bug in my code or
a GCC bug? Is that a known problem?
Thanks in advance,
Niklas Gürtler
-------------- next part --------------
A non-text attachment was scrubbed...
Name: main.cc
Type: text/x-c
Size: 10086 bytes
Desc: not available
URL: <https://gcc.gnu.org/pipermail/gcc-help/attachments/20140525/caf0dc2a/attachment.bin>
-------------- next part --------------
PREFIX=arm-none-eabi-
CXXFLAGS=-Wall -Wextra -std=c++11 -O3 -mcpu=cortex-m3 -mthumb
LDFLAGS =$(CXXFLAGS) -nostdlib
CXX=$(PREFIX)g++
OBJDUMP=$(PREFIX)objdump
test.S : test.elf
$(OBJDUMP) -C -d $< > $@
test.elf : main.o test.ld
$(CXX) $(LDFLAGS) -o $@ $< -T test.ld
main.o : main.cc
$(CXX) $(CXXFLAGS) -c -o $@ $<
clean :
rm test.S test.elf main.o
-------------- next part --------------
ENTRY(main)
/* Define output sections */
SECTIONS
{
/* The program code and other data goes into FLASH */
.text :
{
*(.text) /* .text sections (code) */
*(.text*) /* .text* sections (code) */
}
}
More information about the Gcc-help
mailing list