Created attachment 57616 [details] pi-sigma.c: C99 test case Compile the attached test case with: $ avr-gcc pi-sigma.c -c -Os -mmcu=atmega8 -fstack-usage && avr-size pi-sigma.o Then the code sizes are for respective versions of the compiler: avr-gcc-v8: 624 avr-gcc-v14: 1008 which is an increase of code size of more than 60% ! The stack usage also increases by a lot. According to pi-sigma.su: avr-gcc-v8: ----------- pi-sigma.c:80:7:sigma 30 static pi-sigma.c:86:7:pi_n 14 static avr-gcc-v14: ------------ pi-sigma.c:80:7:sigma 86 static pi-sigma.c:86:7:pi_n 36 static That is for the 1st function the stack use almost triples! With -fno-split-wide-types the performace of v14 code is similar to v8. Target: avr Configured with: ../../source/gcc-master/configure --target=avr --disable-nls --with-dwarf2 --with-gnu-as --with-gnu-ld --disable-shared --enable-languages=c,c++ Thread model: single Supported LTO compression algorithms: zlib gcc version 14.0.1 20240303 (experimental) (GCC)
May be related to PR110093. As Vladimir noted in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110093#c5 the problem is that data flow analysis cannot cope with the subregs generated from lower-subregs, and register alloc chokes at it.
Subreg improvements to ra is planned for gcc 15 as the riscv folks are running into it for vector modes in some cases. Maybe that will improves the situation here.
Still present in master.
Created attachment 58483 [details] sfmode.c: C test case This is a test case with simpler functions like float add2 (float a, float b) { return a + b; } v8 compiles this with -Os -dp to: add2: rcall __addsf3 ; 9 [c=24 l=1] call_value_insn/1 ret ; 21 [c=0 l=1] return but the current compiler does: add2: push r4 ; 76 [c=4 l=1] pushqi1/0 push r5 ; 77 [c=4 l=1] pushqi1/0 push r6 ; 78 [c=4 l=1] pushqi1/0 push r7 ; 79 [c=4 l=1] pushqi1/0 push r8 ; 80 [c=4 l=1] pushqi1/0 push r9 ; 81 [c=4 l=1] pushqi1/0 push r10 ; 82 [c=4 l=1] pushqi1/0 push r11 ; 83 [c=4 l=1] pushqi1/0 push r12 ; 84 [c=4 l=1] pushqi1/0 push r13 ; 85 [c=4 l=1] pushqi1/0 push r14 ; 86 [c=4 l=1] pushqi1/0 push r15 ; 87 [c=4 l=1] pushqi1/0 /* prologue: function */ /* frame size = 0 */ /* stack size = 12 */ .L__stack_usage = 12 mov r4,r18 ; 61 [c=4 l=1] movqi_insn/0 mov r5,r19 ; 62 [c=4 l=1] movqi_insn/0 mov r6,r20 ; 63 [c=4 l=1] movqi_insn/0 mov r7,r21 ; 64 [c=4 l=1] movqi_insn/0 mov r21,r7 ; 65 [c=4 l=4] *movsf/0 mov r20,r6 mov r19,r5 mov r18,r4 mov r8,r22 ; 66 [c=4 l=1] movqi_insn/0 mov r9,r23 ; 67 [c=4 l=1] movqi_insn/0 mov r10,r24 ; 68 [c=4 l=1] movqi_insn/0 mov r11,r25 ; 69 [c=4 l=1] movqi_insn/0 mov r25,r11 ; 70 [c=4 l=4] *movsf/0 mov r24,r10 mov r23,r9 mov r22,r8 rcall __addsf3 ; 9 [c=24 l=1] call_value_insn/1 mov r12,r22 ; 71 [c=4 l=1] movqi_insn/0 mov r13,r23 ; 72 [c=4 l=1] movqi_insn/0 mov r14,r24 ; 73 [c=4 l=1] movqi_insn/0 mov r15,r25 ; 74 [c=4 l=1] movqi_insn/0 mov r25,r15 ; 75 [c=4 l=4] *movsf/0 mov r24,r14 mov r23,r13 mov r22,r12 /* epilogue start */ pop r15 ; 90 [c=4 l=1] popqi pop r14 ; 91 [c=4 l=1] popqi pop r13 ; 92 [c=4 l=1] popqi pop r12 ; 93 [c=4 l=1] popqi pop r11 ; 94 [c=4 l=1] popqi pop r10 ; 95 [c=4 l=1] popqi pop r9 ; 96 [c=4 l=1] popqi pop r8 ; 97 [c=4 l=1] popqi pop r7 ; 98 [c=4 l=1] popqi pop r6 ; 99 [c=4 l=1] popqi pop r5 ; 100 [c=4 l=1] popqi pop r4 ; 101 [c=4 l=1] popqi ret ; 102 [c=0 l=1] return_from_epilogue