This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

lower-subreg.c: Extreme code bloat for MEM splits


For the following small test case there is unbelievable code bloat from
lower-subreg.c

The code reads a 4-byte value from AVR's address spaces:


long readx (const __memx long *p)
{
    return *p;
}

long read1 (const __flash1 long *p)
{
    return *p;
}


Compiled with 4.8.0


$ avr-gcc flash.c -S -dp -Os -mmcu=avr51 -fno-split-wide-types

This yields

readx:
/* prologue: function */
	movw r30,r22
	mov r21,r24
	call __xload_4
	ret

read1:
/* prologue: function */
	movw r30,r24
	ldi r18,1
	out __RAMPZ__,r18
	elpm r22,Z+
	elpm r23,Z+
	elpm r24,Z+
	elpm r25,Z+
	ret


Which is reasonable. Loads from space __memx are expensive and are outsourced
to libgcc function __xload_4.

But without the -fno-split-wide-types the code is

readx:
	push r12
	push r13
	push r14
/* prologue: function */
	mov r26,r24
	movw r24,r22
	movw r18,r24
	mov r20,r26
	subi r18,-1
	sbci r19,-1
	sbci r20,-1
	movw r30,r18
	mov r21,r20
	call __xload_1
	mov r23,r22
	ldi r18,lo8(2)
	mov r12,r18
	mov r13,__zero_reg__
	mov r14,__zero_reg__
	add r12,r24
	adc r13,r25
	adc r14,r26
	movw r18,r24
	mov r20,r26
	subi r18,-3
	sbci r19,-1
	sbci r20,-1
	movw r30,r12
	mov r21,r14
	call __xload_1
	mov r24,r22
	movw r30,r18
	mov r21,r20
	call __xload_1
	mov r25,r22
/* epilogue start */
	pop r14
	pop r13
	pop r12
	ret

read1:
/* prologue: function */
	movw r30,r24
	ldi r18,1
	out __RAMPZ__,r18
	elpm r22,Z+
	ldi r18,1
	out __RAMPZ__,r18
	elpm r23,Z
	movw r18,r24
	subi r18,-2
	sbci r19,-1
	movw r20,r24
	subi r20,-3
	sbci r21,-1
	movw r30,r18
	ldi r24,1
	out __RAMPZ__,r24
	elpm r24,Z
	movw r30,r20
	ldi r25,1
	out __RAMPZ__,r25
	elpm r25,Z
	ret

You don't need to know anything about AVR to see that the code is *really* bad
and bloat to the maximum.

Besides that the code is wrong, there are just 3 __xload_1 calls instead of 4.
But that appears to be a different issue, PR52484.

The reason is that lower-subreg.c does not care about costs at all and greedily
splits everything it gets hold of.

And a second reason is that GCC is completely afraid of pre/post
increment/modify/decrement addressing modes.

Any idea how to fix this in the backend?

There is TARGET_MODE_DEPENDENT_ADDRESS_P and it can fix the first case which
uses PSImode as pointer mode.

The second case, however, uses Pmode and in that hook there is no way to tell
if an address is to generic address space or to a special address space because
that hook hides this information from the backend and there is no address-space
flavour of the hook.

Any ideas what to do about that?

Is it reasonable hack to make
   TARGET_MODE_DEPENDENT_ADDRESS_P (PSImode) = false

Why does lower-subreg not care for costs at all?
...even if; MEMORY_MOVE_COST is not sensitive to address spaces, either.

Johann


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]