Bug 19597 - [4.0 Regression] avr-gcc 4.0, multiplication by constant, very long code
Summary: [4.0 Regression] avr-gcc 4.0, multiplication by constant, very long code
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.0.0
: P3 minor
Target Milestone: 4.0.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization, patch
Depends on:
Blocks:
 
Reported: 2005-01-23 22:55 UTC by Dmitry K.
Modified: 2005-02-09 14:49 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2005-01-24 00:49:29


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dmitry K. 2005-01-23 22:55:38 UTC
avr-gcc 4.0 (20050116) replaces multiplication by a constant to a combination of more simple 
operations. Thus it does not pay attention to efficiency of such replacement.  
 
For example: 
	int foo (int x) { return 24011 * x; } 
produce 49 words (-mmcu=atmega8 -Os). 
Older GCC branches -- only 11 words in this case.
Comment 1 Andrew Pinski 2005-01-23 23:01:57 UTC
The problem is that middle-end expands it to shifts and such but really it might be that rtx_cost for avr 
does not take into account size at all.
Comment 2 Giovanni Bajo 2005-01-23 23:56:48 UTC
Bernie, can you confirm this?
Roger, you tweaked the middle-end in this regard. Can you have a look at this?
Comment 3 Bernardo Innocenti 2005-01-24 00:49:29 UTC
Confirmed:

--- 3.4 output ---
        ldi r18,lo8(24011)
        ldi r19,hi8(24011)
        mul r24,r18
        movw r20,r0
        mul r24,r19
        add r21,r0
        mul r25,r18
        add r21,r0
        clr r1
        movw r24,r20
------------------

--- 4.0 output ---
        movw r22,r24
        movw r18,r24
        add r18,r24
        adc r19,r25
        add r18,r18
        adc r19,r19
        add r18,r24
        adc r19,r25
        movw r24,r18
        add r24,r18
        adc r25,r19
        add r24,r24
        adc r25,r25
        add r24,r24
        adc r25,r25
        sub r24,r18
        sbc r25,r19
        movw r18,r24
        add r18,r24
        adc r19,r25
        add r18,r18
        adc r19,r19
        add r18,r18
        adc r19,r19
        sub r18,r24
        sbc r19,r25
        movw r20,r18
        add r20,r18
        adc r21,r19
        add r20,r20
        adc r21,r21
        add r20,r20
        adc r21,r21
        sub r20,r18
        sbc r21,r19
        movw r24,r20
        add r24,r20
        adc r25,r21
        add r24,r24
        adc r25,r25
        add r24,r24
        adc r25,r25
        sub r24,r20
        sbc r25,r21
        add r24,r24
        adc r25,r25
        add r24,r22
        adc r25,r23
------------------
Comment 4 roger 2005-01-24 02:53:56 UTC
The function avr_rtx_costs needs fixing.  For a start, it doesn't even use
COSTS_N_INSNS, indeed COSTS_N_INSNS isn't used anywhere in the AVR backend.

I'm only mildly surprised at the code sequence we generate, it looks like
the avr backend reports the costs of shifts and multiplications as free,
but the code of additions as GET_MODE_SIZE (mode).  i.e. SImode additions
are COSTS_N_INSNS(1), but it can perform four QImode additions in a single
cycle.  And I don't think I even have to mention that avr_rtx_costs ignores
optimize_size.

Many of the middle-end improvements to make better use of rtx_costs should
benefit AVR, for example, where the cost of a shift is dependent upon the
number of bits by which the operand is being shifted, and where the cost of
almost all operations is highly dependent upon the machine mode.

But to summarise, this is a target bug.
Comment 5 Andrew Pinski 2005-01-24 08:16:53 UTC
(In reply to comment #4) 
> But to summarise, this is a target bug.

That is what I thought.
Anyways Roger posted a patch to rewrite rtx_cost for AVR to fix this bug here: <http://gcc.gnu.org/ml/
gcc-patches/2005-01/msg01698.html>  It might also improve other things too because of the 
mentioned items in that mail.
Comment 6 Giovanni Bajo 2005-01-24 08:56:44 UTC
Marek, can you review this patch please?
Comment 7 Marek Michalkiewicz 2005-01-24 09:24:16 UTC
Subject: Re:  [4.0 Regression] avr-gcc 4.0, multiplication by constant, very long code

On Mon, Jan 24, 2005 at 08:56:46AM -0000, giovannibajo at libero dot it wrote:

> Marek, can you review this patch please?

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19597

Thanks.  Reviewing this will take some time - I agree the current rtx
costs are not perfect, but changing them can affect generated code in
unexpected ways.  It would be good to test it a lot on various test
cases, to make sure it doesn't introduce new code size regressions...

Marek

Comment 8 Bernardo Innocenti 2005-01-24 10:28:49 UTC
Subject: Re:  [4.0 Regression] avr-gcc 4.0, multiplication
 by constant, very long code

marekm at amelek dot gda dot pl wrote:
> ------- Additional Comments From marekm at amelek dot gda dot pl  2005-01-24 09:24 -------
> Subject: Re:  [4.0 Regression] avr-gcc 4.0, multiplication by constant, very long code
> 
> On Mon, Jan 24, 2005 at 08:56:46AM -0000, giovannibajo at libero dot it wrote:
> 
> 
>>Marek, can you review this patch please?
> 
> 
>>http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19597
> 
> 
> Thanks.  Reviewing this will take some time - I agree the current rtx
> costs are not perfect, but changing them can affect generated code in
> unexpected ways.  It would be good to test it a lot on various test
> cases, to make sure it doesn't introduce new code size regressions...

I'm building avr-gcc right now with your two patches and
this one applied.  I'll let you know shortly.

btw, how do you test the AVR backend?  Can you execute
the dejagnu testsuite on simulavr or something similar?

Comment 9 Bernardo Innocenti 2005-01-24 13:15:51 UTC
Subject: Re:  [4.0 Regression] avr-gcc 4.0, multiplication
 by constant, very long code

Bernardo Innocenti wrote:
> marekm at amelek dot gda dot pl wrote:
> 
>>------- Additional Comments From marekm at amelek dot gda dot pl  2005-01-24 09:24 -------
>>Subject: Re:  [4.0 Regression] avr-gcc 4.0, multiplication by constant, very long code
>>
>>On Mon, Jan 24, 2005 at 08:56:46AM -0000, giovannibajo at libero dot it wrote:
>>
>>
>>
>>>Marek, can you review this patch please?
>>
>>
>>>http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19597
>>
>>
>>Thanks.  Reviewing this will take some time - I agree the current rtx
>>costs are not perfect, but changing them can affect generated code in
>>unexpected ways.  It would be good to test it a lot on various test
>>cases, to make sure it doesn't introduce new code size regressions...
> 
> 
> I'm building avr-gcc right now with your two patches and
> this one applied.  I'll let you know shortly.

Not good.  With these two patches applied, the size of four
big AVR applications increased slightly.

These were built with -Os (the second one shows a minor improvement):

   text    data     bss     dec     hex filename
   8008     136     401    8545    2161 images-orig/dspslave.elf
   8032     136     401    8569    2179 images-patched/dspslave.elf

   text    data     bss     dec     hex filename
  18448     536     692   19676    4cdc images-orig/dspmaster.elf
  18428     536     692   19656    4cc8 images-patched/dspmaster.elf

These with -O2:

   text    data     bss     dec     hex filename
  60454    1832    1562   63848    f968 images-orig/kfront.elf
  60488    1832    1562   63882    f98a images-patched/kfront.elf

   text    data     bss     dec     hex filename
  36160     900    1713   38773    9775 images-orig/kcntrl.elf
  36344     900    1713   38957    982d images-patched/kcntrl.elf


Would you like to see some diffs?

Comment 10 Paul Schlie 2005-01-25 20:33:16 UTC
(In reply to comment #9)
> Not good.  With these two patches applied, the size of four
> big AVR applications increased slightly.

Although it would have been nicer if all 4 got smaller,  it's not clear
that a < 0.5% code size variance means much of anything good or bad.
Comment 11 roger 2005-01-25 21:23:47 UTC
I'm currently working on an updated and improved patch that should address
some of this size regression.  Failing that there are a number of middle-end
improvements that can be made to address the code size increase.  For example,
GCC's middle-end considers any constant with an rtx_cost greater than 
COSTS_N_INSNS(1) to be expensive, and therefore worth keeping in a register
for CSE and GCSE.  Unfortunately, this plays badly with AVR, where the cost
of a register-register copy is often greater than COSTS_N_INSNS(1)!  The
solution to which is to introduce a REG_REG_MOVE_COST(mode) target macro that
can be used to report the cost of copying a pseudo and use this instead of 
COSTS_N_INSNS(1) where appropriate.  This should even help non-AVR platforms,
for example DImode typically requires two move instructions.

The change described above should avoid AVR keeping HImode integer constants
in registers and then copying them when required (its as cheap to load an
immediate constant as it is to copy registers, but the later increases register
pressure, stack frame size, etc...)
Comment 12 Paul Schlie 2005-01-25 23:19:57 UTC
(In reply to comment #11)
> The change described above should avoid AVR keeping HImode integer constants
> in registers and then copying them when required (its as cheap to load an
> immediate constant as it is to copy registers, but the later increases register
> pressure, stack frame size, etc...)

Just as a minor clarification,  HI-mode register-to-register moves are relativly cheap on avr
(i.e. 1-instruction/1-cycle if allocated in an register pair), so there may often be circumstances
when keeping frequently-recently used HI constants in registers may actually be cheaper than
loading them upon demand (as QI-mode constant loads are limited to r16-r31), and although
an HI-mode constant may be arbitrarity loaded from data-memory, they would need to have
been stored there first in anticipation of their requirement, unfortunatly further complecated
by data-memory itself tending to be a very scarce/limited resource (nor is it clear how to best
express this potential method or cost to GCC if considered worth while)?

(however, more accurrately describing true inst. costs should only help improve code effeciency)
Comment 13 Andrew Pinski 2005-02-09 07:02:00 UTC
Fixed. Thanks Roger for looking into this testcase.  And thanks for all people who tested the patch to 
show it actually helps the code size.
Comment 14 Andrew Pinski 2005-02-09 07:16:13 UTC
Actually Roger has not checked this in yet.
I was misled by: <http://gcc.gnu.org/ml/gcc-patches/2005-02/msg00344.html>.
Comment 15 GCC Commits 2005-02-09 14:43:43 UTC
Subject: Bug 19597

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	sayle@gcc.gnu.org	2005-02-09 14:43:29

Modified files:
	gcc            : ChangeLog 
	gcc/config/avr : avr.c 

Log message:
	PR target/19597
	* config/avr/avr.c (default_rtx_costs): Delete.
	(avr_operand_rtx_cost): New function.
	(avr_rtx_costs): Completely rewrite.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.7419&r2=2.7420
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/avr/avr.c.diff?cvsroot=gcc&r1=1.128&r2=1.129

Comment 16 Andrew Pinski 2005-02-09 14:49:04 UTC
Fixed for real this time.