Bug 114243 - [13/14/15 Regression][avr] -fsplit-wide-types bloats code by more than 50%
Summary: [13/14/15 Regression][avr] -fsplit-wide-types bloats code by more than 50%
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 14.0
: P3 normal
Target Milestone: 13.4
Assignee: Not yet assigned to anyone
URL:
Keywords: code-size, missed-optimization, ra
Depends on:
Blocks: avr+ra
  Show dependency treegraph
 
Reported: 2024-03-05 15:56 UTC by Georg-Johann Lay
Modified: 2024-10-25 10:01 UTC (History)
0 users

See Also:
Host:
Target: avr
Build:
Known to work: 8.5.0
Known to fail: 14.1.0
Last reconfirmed: 2024-06-21 00:00:00


Attachments
pi-sigma.c: C99 test case (570 bytes, text/plain)
2024-03-05 15:56 UTC, Georg-Johann Lay
Details
sfmode.c: C test case (222 bytes, text/plain)
2024-06-21 19:55 UTC, Georg-Johann Lay
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Georg-Johann Lay 2024-03-05 15:56:00 UTC
Created attachment 57616 [details]
pi-sigma.c: C99 test case

Compile the attached test case with:

$ avr-gcc pi-sigma.c -c -Os -mmcu=atmega8 -fstack-usage && avr-size pi-sigma.o

Then the code sizes are for respective versions of the compiler:

avr-gcc-v8:   624
avr-gcc-v14: 1008

which is an increase of code size of more than 60% !

The stack usage also increases by a lot. According to pi-sigma.su:

avr-gcc-v8:
-----------
pi-sigma.c:80:7:sigma	30	static
pi-sigma.c:86:7:pi_n	14	static

avr-gcc-v14:
------------
pi-sigma.c:80:7:sigma	86	static
pi-sigma.c:86:7:pi_n	36	static

That is for the 1st function the stack use almost triples!

With -fno-split-wide-types the performace of v14 code is similar to v8.

Target: avr
Configured with: ../../source/gcc-master/configure --target=avr --disable-nls --with-dwarf2 --with-gnu-as --with-gnu-ld --disable-shared --enable-languages=c,c++ 
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 14.0.1 20240303 (experimental) (GCC)
Comment 1 Georg-Johann Lay 2024-03-05 20:31:43 UTC
May be related to PR110093.  As Vladimir noted in

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110093#c5

the problem is that data flow analysis cannot cope with the subregs generated from lower-subregs, and register alloc chokes at it.
Comment 2 Andrew Pinski 2024-03-05 20:40:41 UTC
Subreg improvements to ra is planned for gcc 15 as the riscv folks are running into it for vector modes in some cases. Maybe that will improves the situation here.
Comment 3 Georg-Johann Lay 2024-06-21 13:17:00 UTC
Still present in master.
Comment 4 Georg-Johann Lay 2024-06-21 19:55:49 UTC
Created attachment 58483 [details]
sfmode.c: C test case

This is a test case with simpler functions like

float add2 (float a, float b)
{
    return a + b;
}

v8 compiles this with -Os -dp to:

add2:
	rcall __addsf3	 ;  9	[c=24 l=1]  call_value_insn/1
	ret		 ;  21	[c=0 l=1]  return

but the current compiler does:

add2:
	push r4		 ;  76	[c=4 l=1]  pushqi1/0
	push r5		 ;  77	[c=4 l=1]  pushqi1/0
	push r6		 ;  78	[c=4 l=1]  pushqi1/0
	push r7		 ;  79	[c=4 l=1]  pushqi1/0
	push r8		 ;  80	[c=4 l=1]  pushqi1/0
	push r9		 ;  81	[c=4 l=1]  pushqi1/0
	push r10		 ;  82	[c=4 l=1]  pushqi1/0
	push r11		 ;  83	[c=4 l=1]  pushqi1/0
	push r12		 ;  84	[c=4 l=1]  pushqi1/0
	push r13		 ;  85	[c=4 l=1]  pushqi1/0
	push r14		 ;  86	[c=4 l=1]  pushqi1/0
	push r15		 ;  87	[c=4 l=1]  pushqi1/0
/* prologue: function */
/* frame size = 0 */
/* stack size = 12 */
.L__stack_usage = 12
	mov r4,r18	 ;  61	[c=4 l=1]  movqi_insn/0
	mov r5,r19	 ;  62	[c=4 l=1]  movqi_insn/0
	mov r6,r20	 ;  63	[c=4 l=1]  movqi_insn/0
	mov r7,r21	 ;  64	[c=4 l=1]  movqi_insn/0
	mov r21,r7	 ;  65	[c=4 l=4]  *movsf/0
	mov r20,r6
	mov r19,r5
	mov r18,r4
	mov r8,r22	 ;  66	[c=4 l=1]  movqi_insn/0
	mov r9,r23	 ;  67	[c=4 l=1]  movqi_insn/0
	mov r10,r24	 ;  68	[c=4 l=1]  movqi_insn/0
	mov r11,r25	 ;  69	[c=4 l=1]  movqi_insn/0
	mov r25,r11	 ;  70	[c=4 l=4]  *movsf/0
	mov r24,r10
	mov r23,r9
	mov r22,r8
	rcall __addsf3	 ;  9	[c=24 l=1]  call_value_insn/1
	mov r12,r22	 ;  71	[c=4 l=1]  movqi_insn/0
	mov r13,r23	 ;  72	[c=4 l=1]  movqi_insn/0
	mov r14,r24	 ;  73	[c=4 l=1]  movqi_insn/0
	mov r15,r25	 ;  74	[c=4 l=1]  movqi_insn/0
	mov r25,r15	 ;  75	[c=4 l=4]  *movsf/0
	mov r24,r14
	mov r23,r13
	mov r22,r12
/* epilogue start */
	pop r15		 ;  90	[c=4 l=1]  popqi
	pop r14		 ;  91	[c=4 l=1]  popqi
	pop r13		 ;  92	[c=4 l=1]  popqi
	pop r12		 ;  93	[c=4 l=1]  popqi
	pop r11		 ;  94	[c=4 l=1]  popqi
	pop r10		 ;  95	[c=4 l=1]  popqi
	pop r9		 ;  96	[c=4 l=1]  popqi
	pop r8		 ;  97	[c=4 l=1]  popqi
	pop r7		 ;  98	[c=4 l=1]  popqi
	pop r6		 ;  99	[c=4 l=1]  popqi
	pop r5		 ;  100	[c=4 l=1]  popqi
	pop r4		 ;  101	[c=4 l=1]  popqi
	ret		 ;  102	[c=0 l=1]  return_from_epilogue