Bug 46278 - avr-gcc 4.5.1 doing suboptimal reloads using X
Summary: avr-gcc 4.5.1 doing suboptimal reloads using X
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.5.1
: P3 normal
Target Milestone: 4.7.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on: 50775
Blocks:
  Show dependency treegraph
 
Reported: 2010-11-02 21:04 UTC by Georg-Johann Lay
Modified: 2011-12-15 19:09 UTC (History)
3 users (show)

See Also:
Host: i686-linux-gnu
Target: avr
Build: i686-linux-gnu
Known to work: 3.4.6
Known to fail: 4.6.1, 4.7.0
Last reconfirmed: 2011-07-19 13:55:18


Attachments
vektor-zeichen-i.c (2.34 KB, text/plain)
2010-11-02 21:04 UTC, Georg-Johann Lay
Details
vektor-zeichen-i.s (5.37 KB, text/plain)
2010-11-02 21:07 UTC, Georg-Johann Lay
Details
snake.c (4.93 KB, text/plain)
2011-06-22 20:35 UTC, Georg-Johann Lay
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Georg-Johann Lay 2010-11-02 21:04:00 UTC
Created attachment 22240 [details]
vektor-zeichen-i.c

Using avr-gcc-4.5.1, I observed the following issue concerning both
code density and speed.

avr-gcc-4.5.1 prefers to use X-reg to reload addresses. Just note the
*movqi/3 and *movqi/4 sequences in the asm-printout attached below.

All these memory accesses could be made by means of Y+const and most
of the code consists of adding/subtracting constants to/from X.
Hacking source so that Y gets used to address
SRAM leads to code deflation from ~1200 bytes to ~950 bytes.

Without tracking it down, I think it's an avr backend issue because it
is lying at IRA and reports wrong reload costs: Both X+const, Y+const
and Z+const are handled by the same constraint alternative(s), but the
costs of X+const are considerably higher because there is no native
instruction for it. IRA uses X and not Y because Y would cause
prologue/epilogue costs. However, the accumulated additional costs of
using X are much higher than the prologue/epilogue costs that would
arise if Y was used.

So the backend badly needs new memory constraints that report costs to
IRA. I would highly recommend to use multi-letter constraints for that
so that the backend won't run out of constraint letters sooner or later...

Being not familiar with LEGITIMIZE_RELOAD_ADDRESS, I wonder if there
is a way to describe costs therein, or maybe that hook is an
anachronism that it no more needed with IRA?

To reproduce this, I attached a precompiled source. It basically does indirect
accesses to struct components of one struct in SRAM, so nothing
complicated. Register pressure is quite low so that is no issue.

Compiled with

avr-gcc-4.5.1 vektor-zeichen-i.c -S -Os -mmcu=atmega168 -ffixed-6
-ffixed-7 -mmcu=atmega168 -dp -fverbose-asm
Comment 1 Georg-Johann Lay 2010-11-02 21:07:59 UTC
Created attachment 22242 [details]
vektor-zeichen-i.s

assembler output, FYI
Comment 2 Georg-Johann Lay 2010-11-02 21:28:03 UTC
configured as

../source/configure --target=avr --prefix=/home/georg/gcc/install/gcc-4.5.1 --enable-languages=c,c++ --disable-libssp --disable-libada --disable-nls --disable-shared --with-dwarf2

sources as of

http://gcc.gnu.org/viewcvs/tags/gcc_4_5_1_release/
Comment 3 Georg-Johann Lay 2011-06-22 20:35:30 UTC
Created attachment 24582 [details]
snake.c

Yet another source file to test. Compile with, e.g. -Os -mmcu=atmega128 -std=gnu99

Compiled with avr-gcc 4.6.1 (prerelease SVN 175201) is shows size ca. 15% more compared to avr-gcc 3.4.6.
Comment 4 Georg-Johann Lay 2011-07-07 12:19:13 UTC
For the Created attachment 22240 [details] "vektor-zeichen-i.c"
   http://gcc.gnu.org/bugzilla/attachment.cgi?id=22240
I get the following sizes (-Os -mmcu=atmega128):

avr-gcc-4.6.1:
   text    data     bss     dec     hex filename
   1076       0     190    1266     4f2 vektor-zeichen-i.o

avr-gcc-4.7.0:
   text    data     bss     dec     hex filename
   1088       0     190    1278     4fe vektor-zeichen-i.o

avr-gcc-4.7.0 with patch:
   text    data     bss     dec     hex filename
    728       0     190     918     396 vektor-zeichen-i.o


For the Created attachment 24582 [details] "snake.c"
   http://gcc.gnu.org/bugzilla/attachment.cgi?id=24582
I get the following sizes (-Os -mmcu=atmega168 -std=gnu99):

avr-gcc-4.6.1:
   text    data     bss     dec     hex filename
   1549       0       0    1549     60d snake-i.o

avr-gcc-4.7.0
   text    data     bss     dec     hex filename
   1625       0       0    1625     659 snake-i.o

avr-gcc-4.7.0 with patch:
   text    data     bss     dec     hex filename
   1507       0       0    1507     5e3 snake-i.o

the "with patch" version patches by the tentative patch attached in
http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00437.html

This shows that the fake X addresses increase the code size and that avoiding fake X addresses as proposed by the patch are pointing into the right direction.

BTW: With avr-gcc-3.4.6 I get a codesize of ~1400 bytes for the "snake.c" testcase which is because the old compiler is smarter with register allocation and no function needs a frame whereas 4.7 needs frame for some functions.

But that's a different story.
Comment 5 Eric Weddington 2011-07-19 13:55:18 UTC
There's enough evidence to confirm this missed optimization.
Comment 6 Georg-Johann Lay 2011-10-14 15:42:40 UTC
Author: gjl
Date: Fri Oct 14 15:42:33 2011
New Revision: 179993

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=179993
Log:
	PR target/46278
	* doc/invoke.texi (AVR Options): Document -mstrict-X.
	* config/avr/avr.opt (-mstrict-X): New option.
	(avr_strict_X): New variable reflecting -mstrict-X.
	* config/avr/avr.c (avr_reg_ok_for_addr_p): Add parameter
	outer_code and pass it down to avr_regno_mode_code_ok_for_base_p.
	(avr_legitimate_address_p): Pass outer_code to
	avr_reg_ok_for_addr_p and use that function in case PLUS.
	(avr_mode_code_base_reg_class): Depend on avr_strict_X.
	(avr_regno_mode_code_ok_for_base_p): Ditto, and depend on outer_code.
	(avr_option_override): Disable -fcaller-saves if -mstrict-X is on.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/avr/avr.c
    trunk/gcc/config/avr/avr.opt
    trunk/gcc/doc/invoke.texi
Comment 7 Georg-Johann Lay 2011-12-15 19:09:19 UTC
Fixed for 4.7.

But there is no general clue if -mstrict-X is better or default -mno-strict-X.