Bug 34789 - [avr] sometimes the compiler keeps addresses in registers unnecessarily
Summary: [avr] sometimes the compiler keeps addresses in registers unnecessarily
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.2.2
: P3 enhancement
Target Milestone: 4.7.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2008-01-15 08:23 UTC by David Brown
Modified: 2011-10-09 20:49 UTC (History)
3 users (show)

See Also:
Host:
Target: avr
Build:
Known to work:
Known to fail:
Last reconfirmed: 2008-04-04 21:31:23


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Brown 2008-01-15 08:23:09 UTC
On RISC cpus with many registers, it is often helpful to keep constants (such as addresses) in registers for later use, since loading a register with a constant takes longer than moving data between registers, and since register-constant operations are seldom available.  On the AVR, however, register-constant operations are just as small and fast as register-register operations (excluding the movw operation), so it is often better to use the constants directly - it saves a few instructions (time and code space), and avoids tying up a register pair unnecessarily.

Example:

extern uint16_t data[64];
extern uint16_t foo(uint16_t x);
extern uint16_t a;

uint16_t bar(void) {
	uint16_t x;
	x = foo(data[a]);
	return foo(data[x]);
}

In this case, the compiler caches the address of "data" in r16:r17 :

  59               	bar:
  60               	/* prologue: frame size=0 */
  61 001a 0F93      		push r16
  62 001c 1F93      		push r17
  63               	/* prologue end (size=2) */
  64 001e E091 0000 		lds r30,a	 ;  a, a
  65 0022 F091 0000 		lds r31,(a)+1	 ;  a, a
  66 0026 00E0      		ldi r16,lo8(data)	 ;  tmp45,
  67 0028 10E0      		ldi r17,hi8(data)	 ;  tmp45,
  68 002a EE0F      		lsl r30	 ;  a
  69 002c FF1F      		rol r31	 ;  a
  70 002e E00F      		add r30,r16	 ;  a, tmp45
  71 0030 F11F      		adc r31,r17	 ;  a, tmp45
  72 0032 8081      		ld r24,Z	 ; , data
  73 0034 9181      		ldd r25,Z+1	 ; , data
  74 0036 0E94 0000 		call foo	 ; 
  75 003a 880F      		lsl r24	 ;  tmp50
  76 003c 991F      		rol r25	 ;  tmp50
  77 003e 080F      		add r16,r24	 ;  tmp45, tmp50
  78 0040 191F      		adc r17,r25	 ;  tmp45, tmp50
  79 0042 F801      		movw r30,r16	 ; , tmp45
  80 0044 8081      		ld r24,Z	 ; , data
  81 0046 9181      		ldd r25,Z+1	 ; , data
  82 0048 0E94 0000 		call foo	 ; 
  83               	/* epilogue: frame size=0 */
  84 004c 1F91      		pop r17
  85 004e 0F91      		pop r16
  86 0050 0895      		ret

Better code could be generated by using "data" directly:

			; Prologue avoided - saved 2 words code
				lds r30, a
				lds r31, (a) + 1
			; Load of r16:r17 avoided - saved 2 words code
				lsl r30 ; a
				rol r31 ; a
				subi r30, lo8(-(data))
				sbci r31, hi8(-(data))
				ld r24, z
				ldd r25, Z+1
				call foo
				lsl r24 ; x
				rol r25 ; x
			; Using constant for data is just as small as register
				subi r24, lo8(-(data))
				sbci r25, hi8(-(data))
				movw r30, r16
				ld r24, z
				ldd r25, Z+1
				call foo
			; Epilogue avoided - saved 2 words code
				ret

This saves 6 code words, and corresponding time.
Comment 1 Andy Hutchinson 2008-02-22 01:22:11 UTC
This appears to be due to avr_rtx_costs not assigning same cost to SYMBOL_REF and CONST_INT. So SYMBOL_REF looks expensive - so is held in register to avoid "recalculating" it.

Quick change to make SYMBOL_REF same cost as CONST_INT in both avr_rtx_cost and avr_operand_rtx_cost gave the desired result 

  22               	/* prologue: function */
  23               	/* frame size = 0 */
  24               	.LM2:
  25 0000 E091 0000 		lds r30,a
  26 0004 F091 0000 		lds r31,(a)+1
  27 0008 EE0F      		lsl r30
  28 000a FF1F      		rol r31
  29 000c E050      		subi r30,lo8(-(data))
  30 000e F040      		sbci r31,hi8(-(data))
  31 0010 8081      		ld r24,Z
  32 0012 9181      		ldd r25,Z+1
  33 0014 0E94 0000 		call foo
  34               	.LM3:
  35 0018 FC01      		movw r30,r24
  36 001a EE0F      		lsl r30
  37 001c FF1F      		rol r31
  38 001e E050      		subi r30,lo8(-(data))
  39 0020 F040      		sbci r31,hi8(-(data))
  40 0022 8081      		ld r24,Z
  41 0024 9181      		ldd r25,Z+1
  42 0026 0E94 0000 		call foo
  43               	/* epilogue start */
  44               	.LM4:
  45 002a 0895      		ret

CONST and LABEL_REF might also have same problem as they cost same as MEM.

But note, if SYMBOL_REF were part of memory address, then it might be better held in register (like Y or Z) - this should be done by checking outer code. With outer code of "MEM" SYMBOL_REF would be more expensive than register. (which might same a few LDS/STS that appear in code)

Avr_rtx_cost needs some serious work done to correct these are other anomalies in cost assumptions and the recursion on operand costs.


Comment 2 Georg-Johann Lay 2011-10-09 20:49:20 UTC
This is compiled with avr-gcc-4.7.0 -Os -mmcu=avr4 as expected:

bar:
	lds r30,a
	lds r31,a+1
	lsl r30
	rol r31
	subi r30,lo8(-(data))
	sbci r31,hi8(-(data))
	ld r24,Z
	ldd r25,Z+1
	rcall foo
	movw r30,r24
	lsl r30
	rol r31
	subi r30,lo8(-(data))
	sbci r31,hi8(-(data))
	ld r24,Z
	ldd r25,Z+1
	rjmp foo

Thus closing this PR as RESOLVED. For the tail call, see PR34790.