given the following function: unsigned char check(float x) { return (0.0 < x); } in avr-gcc 8.3.0 the following code is generated: 00000098 <_Z5checkf>: 98: cf 93 push r28 9a: c1 e0 ldi r28, 0x01 ; 1 9c: 20 e0 ldi r18, 0x00 ; 0 9e: 30 e0 ldi r19, 0x00 ; 0 a0: a9 01 movw r20, r18 a2: 0e 94 a8 00 call 0x150 ; 0x150 <__gesf2> a6: 18 16 cp r1, r24 a8: 0c f0 brlt .+2 ; 0xac <_Z5checkf+0x14> aa: c0 e0 ldi r28, 0x00 ; 0 ac: 8c 2f mov r24, r28 ae: cf 91 pop r28 b0: 08 95 ret I don't see any room for improvements here. avr-gcc 9.1.0 compiles to the following. I've marked the lines that don't make sense to me. 00000098 <_Z5checkf>: 98: cf 93 push r28 9a: df 93 push r29 * 9c: 00 d0 rcall .+0 ; 0x9e <_Z5checkf+0x6> * 9e: 00 d0 rcall .+0 ; 0xa0 <_Z5checkf+0x8> * a0: 0f 92 push r0 * a2: cd b7 in r28, 0x3d ; 61 * a4: de b7 in r29, 0x3e ; 62 a6: 21 e0 ldi r18, 0x01 ; 1 a8: 2d 83 std Y+5, r18 ; 0x05 aa: 20 e0 ldi r18, 0x00 ; 0 ac: 30 e0 ldi r19, 0x00 ; 0 ae: a9 01 movw r20, r18 * b0: 69 83 std Y+1, r22 ; 0x01 * b2: 7a 83 std Y+2, r23 ; 0x02 * b4: 8b 83 std Y+3, r24 ; 0x03 * b6: 9c 83 std Y+4, r25 ; 0x04 * b8: 69 81 ldd r22, Y+1 ; 0x01 * ba: 7a 81 ldd r23, Y+2 ; 0x02 * bc: 8b 81 ldd r24, Y+3 ; 0x03 * be: 9c 81 ldd r25, Y+4 ; 0x04 c0: 0e 94 dc 00 call 0x1b8 ; 0x1b8 <__gesf2> c4: 18 16 cp r1, r24 c6: 0c f0 brlt .+2 ; 0xca <_Z5checkf+0x32> c8: 1d 82 std Y+5, r1 ; 0x05 ca: 8d 81 ldd r24, Y+5 ; 0x05 cc: 0f 90 pop r0 ce: 0f 90 pop r0 d0: 0f 90 pop r0 d2: 0f 90 pop r0 d4: 0f 90 pop r0 * d6: df 91 pop r29 * d8: cf 91 pop r28 da: 08 95 ret The value is put to Y+1 to Y+4 and then immediately read back. why? pushing r0 does not make sense at all since it is by definition a temporary register that can freely be modified. Maybe it's just pushed to make room for the stack operations? compilation: "D:\AVR\Toolchain\9.1.0\bin\avr-g++.exe" -funsigned-char -funsigned-bitfields -DNDEBUG -I"C:\Program Files (x86)\Atmel\Studio\7.0\Packs\atmel\ATmega_DFP\1.2.272\include" -Os -ffunction-sections -fdata-sections -fpack-struct -fshort-enums -Wall -mmcu=atmega644 -c -MD -MP -MF "main.d" -MT"main.d" -MT"main.o" -o "main.o" ".././main.cpp" linker: "D:\AVR\Toolchain\9.1.0\bin\avr-g++.exe" -o BugTest.elf main.o -Wl,-Map="BugTest.map" -Wl,--start-group -Wl,-lm -Wl,--end-group -Wl,--gc-sections -mmcu=atmega644
(In reply to Berni from comment #0) > pushing r0 does not make sense at all since it is by definition a temporary > register that can freely be modified. Maybe it's just pushed to make room > for the stack operations? Yes. The code from v8 is already suboptimal: It's nonsense to load R28 with 0x1 just to survive a function call. Better use a call-used register and load it after the function call to where the return value is computed. Then there would be no need to push/pop R28. Does -fno-caller-saves improve v9 code? This is definitely not a target issue. It's likely a register-allocation problem. And the v8 problem is because some (tree) passes move setters away from their consumers.
(In reply to Georg-Johann Lay from comment #1) > (In reply to Berni from comment #0) > > pushing r0 does not make sense at all since it is by definition a temporary > > register that can freely be modified. Maybe it's just pushed to make room > > for the stack operations? > > Yes. > > The code from v8 is already suboptimal: It's nonsense to load R28 with 0x1 > just to survive a function call. Better use a call-used register and load it > after the function call to where the return value is computed. Then there > would be no need to push/pop R28. > > Does -fno-caller-saves improve v9 code? > > This is definitely not a target issue. It's likely a register-allocation > problem. And the v8 problem is because some (tree) passes move setters away > from their consumers. With option -fno-caller-saves there is no change in v9 code!
There is no improvement with gcc 9.2.0
Still an issue with v10.
Created attachment 47173 [details] bloat.c: A trivial test case demonstrating the problem. A (small) part of the overhead can be worked around with -fsplit-wide-types-early, but the major problem is from the register allocator, ira specifically. compile $ avr-gcc -S bloat.c -Os -mmcu=atmega128 -dp -da -fsplit-wide-types-early Generated code: call: push r28 ; 17 [c=4 l=1] pushqi1/0 push r29 ; 18 [c=4 l=1] pushqi1/0 ; SP -= 4 ; 22 [c=4 l=2] *addhi3_sp rcall . rcall . in r28,__SP_L__ ; 23 [c=4 l=2] *movhi/7 in r29,__SP_H__ /* prologue: function */ /* frame size = 4 */ /* stack size = 6 */ .L__stack_usage = 6 std Y+1,r22 ; 14 [c=4 l=4] *movsf/3 std Y+2,r23 std Y+3,r24 std Y+4,r25 /* epilogue start */ ; SP += 4 ; 34 [c=4 l=4] *addhi3_sp pop __tmp_reg__ pop __tmp_reg__ pop __tmp_reg__ pop __tmp_reg__ pop r29 ; 35 [c=4 l=1] popqi pop r28 ; 36 [c=4 l=1] popqi jmp func ; 7 [c=24 l=2] call_value_insn/3 Optimal code: call: jmp func The problem is that IRA concludes that register moves are always more expensive than memory moves, i.e. whatever costs you assign to TARGET_REGISTER_MOVE_COST and TARGET_MEMORY_MOVE_COST, memory will *always* win. From bloat.c.278r.ira: Pass 0 for finding pseudo/allocno costs a1 (r44,l0) best NO_REGS, allocno NO_REGS a0 (r43,l0) best NO_REGS, allocno NO_REGS a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 == configure == --target=avr --disable-shared --disable-nls --with-dwarf2 --enable-target-optspace=yes --with-gnu-as --with-gnu-ld --enable-checking=release --enable-languages=c,c++
*** Bug 91189 has been marked as a duplicate of this bug. ***
GCC 9.3.0 has been released, adjusting target milestone.
I just compiled AVR gcc 9.3.0 and tested the code again. Still no improvement!
(In reply to Berni from comment #8) > I just compiled AVR gcc 9.3.0 and tested the code again. Still no > improvement! Don't expect anything from v9 (or from v10 for that matter). The problem is in the middle-end, and problems there that affect targets like avr will not be fixed -- except in the rare case you manage to show that the issue affects a target that is important enough and report it for that target. And don't expect anything from v11+ either. The avr backend will likely be removed from the compiler, see PR92729. The depreciation is for v11 and wasn't even worth a mention in the v10 release notes caveats https://gcc.gnu.org/gcc-10/changes.html The general recommendation is to switch to clang / llvm where the respective backend is improving and has left experimental status;and is not suffering from self-destruction...
GCC 9.4 is being released, retargeting bugs to GCC 9.5.