avr-gcc 4.0 (20050116) replaces multiplication by a constant to a combination of more simple operations. Thus it does not pay attention to efficiency of such replacement. For example: int foo (int x) { return 24011 * x; } produce 49 words (-mmcu=atmega8 -Os). Older GCC branches -- only 11 words in this case.
The problem is that middle-end expands it to shifts and such but really it might be that rtx_cost for avr does not take into account size at all.
Bernie, can you confirm this? Roger, you tweaked the middle-end in this regard. Can you have a look at this?
Confirmed: --- 3.4 output --- ldi r18,lo8(24011) ldi r19,hi8(24011) mul r24,r18 movw r20,r0 mul r24,r19 add r21,r0 mul r25,r18 add r21,r0 clr r1 movw r24,r20 ------------------ --- 4.0 output --- movw r22,r24 movw r18,r24 add r18,r24 adc r19,r25 add r18,r18 adc r19,r19 add r18,r24 adc r19,r25 movw r24,r18 add r24,r18 adc r25,r19 add r24,r24 adc r25,r25 add r24,r24 adc r25,r25 sub r24,r18 sbc r25,r19 movw r18,r24 add r18,r24 adc r19,r25 add r18,r18 adc r19,r19 add r18,r18 adc r19,r19 sub r18,r24 sbc r19,r25 movw r20,r18 add r20,r18 adc r21,r19 add r20,r20 adc r21,r21 add r20,r20 adc r21,r21 sub r20,r18 sbc r21,r19 movw r24,r20 add r24,r20 adc r25,r21 add r24,r24 adc r25,r25 add r24,r24 adc r25,r25 sub r24,r20 sbc r25,r21 add r24,r24 adc r25,r25 add r24,r22 adc r25,r23 ------------------
The function avr_rtx_costs needs fixing. For a start, it doesn't even use COSTS_N_INSNS, indeed COSTS_N_INSNS isn't used anywhere in the AVR backend. I'm only mildly surprised at the code sequence we generate, it looks like the avr backend reports the costs of shifts and multiplications as free, but the code of additions as GET_MODE_SIZE (mode). i.e. SImode additions are COSTS_N_INSNS(1), but it can perform four QImode additions in a single cycle. And I don't think I even have to mention that avr_rtx_costs ignores optimize_size. Many of the middle-end improvements to make better use of rtx_costs should benefit AVR, for example, where the cost of a shift is dependent upon the number of bits by which the operand is being shifted, and where the cost of almost all operations is highly dependent upon the machine mode. But to summarise, this is a target bug.
(In reply to comment #4) > But to summarise, this is a target bug. That is what I thought. Anyways Roger posted a patch to rewrite rtx_cost for AVR to fix this bug here: <http://gcc.gnu.org/ml/ gcc-patches/2005-01/msg01698.html> It might also improve other things too because of the mentioned items in that mail.
Marek, can you review this patch please?
Subject: Re: [4.0 Regression] avr-gcc 4.0, multiplication by constant, very long code On Mon, Jan 24, 2005 at 08:56:46AM -0000, giovannibajo at libero dot it wrote: > Marek, can you review this patch please? > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19597 Thanks. Reviewing this will take some time - I agree the current rtx costs are not perfect, but changing them can affect generated code in unexpected ways. It would be good to test it a lot on various test cases, to make sure it doesn't introduce new code size regressions... Marek
Subject: Re: [4.0 Regression] avr-gcc 4.0, multiplication by constant, very long code marekm at amelek dot gda dot pl wrote: > ------- Additional Comments From marekm at amelek dot gda dot pl 2005-01-24 09:24 ------- > Subject: Re: [4.0 Regression] avr-gcc 4.0, multiplication by constant, very long code > > On Mon, Jan 24, 2005 at 08:56:46AM -0000, giovannibajo at libero dot it wrote: > > >>Marek, can you review this patch please? > > >>http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19597 > > > Thanks. Reviewing this will take some time - I agree the current rtx > costs are not perfect, but changing them can affect generated code in > unexpected ways. It would be good to test it a lot on various test > cases, to make sure it doesn't introduce new code size regressions... I'm building avr-gcc right now with your two patches and this one applied. I'll let you know shortly. btw, how do you test the AVR backend? Can you execute the dejagnu testsuite on simulavr or something similar?
Subject: Re: [4.0 Regression] avr-gcc 4.0, multiplication by constant, very long code Bernardo Innocenti wrote: > marekm at amelek dot gda dot pl wrote: > >>------- Additional Comments From marekm at amelek dot gda dot pl 2005-01-24 09:24 ------- >>Subject: Re: [4.0 Regression] avr-gcc 4.0, multiplication by constant, very long code >> >>On Mon, Jan 24, 2005 at 08:56:46AM -0000, giovannibajo at libero dot it wrote: >> >> >> >>>Marek, can you review this patch please? >> >> >>>http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19597 >> >> >>Thanks. Reviewing this will take some time - I agree the current rtx >>costs are not perfect, but changing them can affect generated code in >>unexpected ways. It would be good to test it a lot on various test >>cases, to make sure it doesn't introduce new code size regressions... > > > I'm building avr-gcc right now with your two patches and > this one applied. I'll let you know shortly. Not good. With these two patches applied, the size of four big AVR applications increased slightly. These were built with -Os (the second one shows a minor improvement): text data bss dec hex filename 8008 136 401 8545 2161 images-orig/dspslave.elf 8032 136 401 8569 2179 images-patched/dspslave.elf text data bss dec hex filename 18448 536 692 19676 4cdc images-orig/dspmaster.elf 18428 536 692 19656 4cc8 images-patched/dspmaster.elf These with -O2: text data bss dec hex filename 60454 1832 1562 63848 f968 images-orig/kfront.elf 60488 1832 1562 63882 f98a images-patched/kfront.elf text data bss dec hex filename 36160 900 1713 38773 9775 images-orig/kcntrl.elf 36344 900 1713 38957 982d images-patched/kcntrl.elf Would you like to see some diffs?
(In reply to comment #9) > Not good. With these two patches applied, the size of four > big AVR applications increased slightly. Although it would have been nicer if all 4 got smaller, it's not clear that a < 0.5% code size variance means much of anything good or bad.
I'm currently working on an updated and improved patch that should address some of this size regression. Failing that there are a number of middle-end improvements that can be made to address the code size increase. For example, GCC's middle-end considers any constant with an rtx_cost greater than COSTS_N_INSNS(1) to be expensive, and therefore worth keeping in a register for CSE and GCSE. Unfortunately, this plays badly with AVR, where the cost of a register-register copy is often greater than COSTS_N_INSNS(1)! The solution to which is to introduce a REG_REG_MOVE_COST(mode) target macro that can be used to report the cost of copying a pseudo and use this instead of COSTS_N_INSNS(1) where appropriate. This should even help non-AVR platforms, for example DImode typically requires two move instructions. The change described above should avoid AVR keeping HImode integer constants in registers and then copying them when required (its as cheap to load an immediate constant as it is to copy registers, but the later increases register pressure, stack frame size, etc...)
(In reply to comment #11) > The change described above should avoid AVR keeping HImode integer constants > in registers and then copying them when required (its as cheap to load an > immediate constant as it is to copy registers, but the later increases register > pressure, stack frame size, etc...) Just as a minor clarification, HI-mode register-to-register moves are relativly cheap on avr (i.e. 1-instruction/1-cycle if allocated in an register pair), so there may often be circumstances when keeping frequently-recently used HI constants in registers may actually be cheaper than loading them upon demand (as QI-mode constant loads are limited to r16-r31), and although an HI-mode constant may be arbitrarity loaded from data-memory, they would need to have been stored there first in anticipation of their requirement, unfortunatly further complecated by data-memory itself tending to be a very scarce/limited resource (nor is it clear how to best express this potential method or cost to GCC if considered worth while)? (however, more accurrately describing true inst. costs should only help improve code effeciency)
Fixed. Thanks Roger for looking into this testcase. And thanks for all people who tested the patch to show it actually helps the code size.
Actually Roger has not checked this in yet. I was misled by: <http://gcc.gnu.org/ml/gcc-patches/2005-02/msg00344.html>.
Subject: Bug 19597 CVSROOT: /cvs/gcc Module name: gcc Changes by: sayle@gcc.gnu.org 2005-02-09 14:43:29 Modified files: gcc : ChangeLog gcc/config/avr : avr.c Log message: PR target/19597 * config/avr/avr.c (default_rtx_costs): Delete. (avr_operand_rtx_cost): New function. (avr_rtx_costs): Completely rewrite. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.7419&r2=2.7420 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/avr/avr.c.diff?cvsroot=gcc&r1=1.128&r2=1.129
Fixed for real this time.