g++ optimization issue / useless instructions for stack access

Niklas Gürtler profclonk@gmail.com
Sun May 25 12:32:00 GMT 2014


Hello GCC List,

i am currently working on a hardware API in C++11 for ARM Cortex-M3
microcontrollers. It provides an object oriented way of accessing
hardware registers. The idea is that the user need not worry about
individual registers and their composition of bit fields but can access
these with symbolic names.
The API uses temporary objects and call chaining for syntactic sugar.
The problem is now that GCC produces correct, but way too slow and too
much code.

See the attached simplified testcase (with a dummy linker script to
shorten disassembler output) and the function getInput. When compiling
with gcc-arm-embedded ( https://launchpad.net/gcc-arm-embedded ), this
is the code generated by GCC:

00000000 <getInput()>:
   0:    b470          push    {r4, r5, r6}
   2:    2300          movs    r3, #0
   4:    b097          sub    sp, #92    ; 0x5c
   6:    2204          movs    r2, #4
   8:    2101          movs    r1, #1
   a:    2603          movs    r6, #3
   c:    2511          movs    r5, #17
   e:    2412          movs    r4, #18
  10:    480f          ldr    r0, [pc, #60]    ; (50 <getInput()+0x50>)
  12:    f88d 3000     strb.w    r3, [sp]
  16:    9303          str    r3, [sp, #12]
  18:    9304          str    r3, [sp, #16]
  1a:    9307          str    r3, [sp, #28]
  1c:    9308          str    r3, [sp, #32]
  1e:    f88d 302c     strb.w    r3, [sp, #44]    ; 0x2c
  22:    f88d 6001     strb.w    r6, [sp, #1]
  26:    f88d 202d     strb.w    r2, [sp, #45]    ; 0x2d
  2a:    f88d 2038     strb.w    r2, [sp, #56]    ; 0x38
  2e:    f88d 2039     strb.w    r2, [sp, #57]    ; 0x39
  32:    f88d 5044     strb.w    r5, [sp, #68]    ; 0x44
  36:    f88d 1045     strb.w    r1, [sp, #69]    ; 0x45
  3a:    f88d 1051     strb.w    r1, [sp, #81]    ; 0x51
  3e:    f88d 4050     strb.w    r4, [sp, #80]    ; 0x50
  42:    6800          ldr    r0, [r0, #0]
  44:    f3c0 4080     ubfx    r0, r0, #18, #1
  48:    b017          add    sp, #92    ; 0x5c
  4a:    bc70          pop    {r4, r5, r6}
  4c:    4770          bx    lr
  4e:    bf00          nop
  50:    00c0ffee     .word    0x00c0ffee

GCC allocates a stack frame ("sub sp, #92") of size 92, fills it with
some values ("strX rY, [sp, #...]") . Then that frame is discarded ("add
sp, #92"), but the data written there is never even read. So most of the
instructions in the generated code (0-e, 12-3e, 48-4A) can just be left
out, resulting in equally functional, but faster and smaller code that
doesn't use the stack at all:

00000000 <getInput()>:
   0:    4b02          ldr    r3, [pc, #8]    ; (c <getInput()+0xc>)
   2:    6818          ldr    r0, [r3, #0]
   4:    f3c0 4080     ubfx    r0, r0, #18, #1
   8:    4770          bx    lr
   a:    bf00          nop
   c:    00c0ffee     .word    0x00c0ffee

The actual read access here occurs in function
"RegContainer<RegType>::getReg".

The same applies to the other functions setOutput() and main(), while
configure() looks OK.

The problem occurs with both arm-none-eabi-gcc 4.8.3, 4.9.0 and also
x86_64-linux-gnu-gcc 4.8.1 (while that code is of course of little use
on x86_64).

So, the question is - is that behaviour a result of a bug in my code or
a GCC bug? Is that a known problem?

Thanks in advance,
Niklas Gürtler
-------------- next part --------------
A non-text attachment was scrubbed...
Name: main.cc
Type: text/x-c
Size: 10086 bytes
Desc: not available
URL: <https://gcc.gnu.org/pipermail/gcc-help/attachments/20140525/caf0dc2a/attachment.bin>
-------------- next part --------------
PREFIX=arm-none-eabi-
CXXFLAGS=-Wall -Wextra -std=c++11 -O3 -mcpu=cortex-m3 -mthumb
LDFLAGS =$(CXXFLAGS) -nostdlib

CXX=$(PREFIX)g++
OBJDUMP=$(PREFIX)objdump


test.S : test.elf
	$(OBJDUMP) -C -d $< > $@

test.elf : main.o test.ld
	$(CXX) $(LDFLAGS) -o $@ $< -T test.ld

main.o : main.cc
	$(CXX) $(CXXFLAGS) -c -o $@ $<

clean :
	rm test.S test.elf main.o
-------------- next part --------------
ENTRY(main)

/* Define output sections */
SECTIONS
{
  /* The program code and other data goes into FLASH */
  .text :
  {
	*(.text)           /* .text sections (code) */
	*(.text*)          /* .text* sections (code) */
  }
}


More information about the Gcc-help mailing list