Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug
Bug#: 11180
Product:  
Component:  
Status: NEW
Resolution:
Assigned To: Not yet assigned to anyone <unassigned@gcc.gnu.org>
Host:
Reported against  
Priority:  
Severity:  
Target Milestone:  
 
 
Target:
Reporter: Dmitry K. <dmixm@marine.febras.ru>
Add CC:
CC:
Remove selected CCs
Build:
URL:
Summary:
Keywords:
Known to work:
Known to fail:

Attachment Description Type Created Size Actions
avr-pr11180.patch quick and dirty patch to reduce code size patch 2007-09-16 13:38 1.04 KB Edit | Diff
avr-pr11180.patch quick and dirty patch to reduce code size patch 2007-09-16 17:30 1.04 KB Edit | Diff
gcc-4.3-bug-11180-experimental.patch Rask's patch modified from comments. patch 2007-09-19 17:57 1.05 KB Edit | Diff
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 11180 depends on: Show dependency tree
Show dependency graph
Bug 11180 blocks: 16996

Additional Comments:





Mark bug as waiting for feedback
Mark bug as suspended




View Bug Activity   |   Format For Printing   |   Clone This Bug


Description:   Last confirmed: 2005-06-26 02:25 Opened: 2003-06-13 02:35
Any optimization level increase size and decrease speed 
for next program: 
 
	typedef struct { long lo; int in; } lo_in; 
	lo_in foo (void) 
	{ 
	    lo_in x; 
	    x.lo = 1; 
	    x.in = 2; 
	    return x; 
	} 
 
"avr-gcc -W -Wall -S -O0" produced: 
	/* File "lo_in.c": code   75 = 0x004b (  40), prologues  18, epilogues  17 */ 
"avr-gcc -W -Wall -S -Os" produced: 
	/* File "lo_in.c": code   80 = 0x0050 (  29), prologues  26, epilogues  25 */ 
"avr-gcc -W -Wall -S -O3" produced: 
	/* File "lo_in.c": code   79 = 0x004f (  28), prologues  26, epilogues  25 */ 
 
Compiler: 
	avr-gcc (GCC) 3.3.1 20030519 (prerelease)

------- Comment #1 From Andrew Pinski 2003-07-06 01:24 -------
I can confirm this on the mainline (20030706).

------- Comment #2 From Steven Bosscher 2003-07-10 13:38 -------
On i686-pc-linux-gnu I get identical asm for -O2 and -O3 and also for -O1 and
-Os.   The only difference between -O[23] and -O[1s] is:
<       .p2align 2,,3

        .file   "11180.c"
        .text
        .p2align 2,,3
.globl foo
        .type   foo, @function
foo:
        pushl   %ebp
        movl    %esp, %ebp
        movl    8(%ebp), %eax
        movl    $1, (%eax)
        movl    $2, 4(%eax)
        leave
        ret     $4
        .size   foo, .-foo
        .section        .note.GNU-stack,"",@progbits
        .ident  "GCC: (GNU) 3.4 20030710 (experimental)"

So -Os is optimal AFAICT (obviously it gets even smaller with -fomit-frame-pointer).

Apparently this is something target specific.  What target is avr-gcc?

Gr.
Steven

------- Comment #3 From Andrew Pinski 2003-07-10 21:03 -------
I was not the reporter but I was the one who confirmed it; avr-elf is the
target, I will post the asm 
(or rtl) if you want.

------- Comment #4 From Andrew Pinski 2004-01-03 19:00 -------
This is a target problem of not showing how moves are done.

------- Comment #5 From Andrew Pinski 2005-01-09 01:30 -------
One issue I find is that avr does not define_insn_and_split (or just
define_split) which will greatly 
improve the code generation because the hi part of the QI would be zero and you
don't need to set it 
four times.
        ldi r25,hi8(2)
        ldi r25,hi8(1)
        ldi r26,hlo8(1)
        ldi r27,hhi8(1)

The main difference between -Os and -O0 (at least before 4.0.0) is that we use
more registers at -Os 
which causes the prologue and eplogue to do more so my recommendation will help
a lot.

------- Comment #6 From andy hutchinson 2005-03-27 14:33 -------
The problem here is that gcc is using  a DImode register to handle 6 byte
(int+long) structure. Why I have no idea!

Since the target has no insn for DI move, gcc turns this into individual QImode
byte moves (subregs all over the place!). 

The 'stacked' 6 byte structure is 'popped' into DI register (6 bytes ). Two
other byte registers are explicitely cleared (making our 8 byte DI register) 

What then  follows is a large amount of shuffling. i.e. Moving from intermediate
virtual DI register (8 bytes) into correct place for a 6 byte return. Which
seems to surpass the abilities of the register allocator (DI and return
registers overlap).

Smaller structures (<=4 bytes) are optimally handled. Larger structure >8 are
also much better since they are returned in memory.

So in summary, it would appear that the root cause is allocation of a DI mode
register for structures >4 and <=8 bytes.

A secondary factor is the use of QImode moves (when SI,HImode are available and
more  efficient) 

The problem can be partially alleviated by defining DImode moves (that a hell of
a change though). Poor code still remains - for example clearing unused padding
bytes and extra register usage.

PS -fpack-struct does not change this bug.


------- Comment #7 From Andrew Pinski 2005-06-26 02:25 -------
(In reply to comment #6)
> The problem here is that gcc is using  a DImode register to handle 6 byte
> (int+long) structure. Why I have no idea!
This is so it does not store it on the stack.  As I said in comment #5, this is a target issue and have 
nothing to do with DImode.

------- Comment #8 From Paul Schlie 2005-06-26 15:06 -------
(In reply to comment #7)
> (In reply to comment #6)
> > The problem here is that gcc is using  a DImode register to handle 6 byte
> > (int+long) structure. Why I have no idea!
> This is so it does not store it on the stack.  As I said in comment #5, this is a target issue and have 
> nothing to do with DImode.

It would seem more desireable given the intended purpose to avoid pushing it on the stack
so that it's elements may be more effeciencly accessed, that it's coresponding elements be
allocated within the register file (as opposed to the whole struct remaining packed into an
alllocated DI mode integer), so that it's elements may be effeciently accessed without
needing to rip them out or reassemble them into the otherwise packed monolithic structure?

------- Comment #9 From Eric Weddington 2007-07-24 23:50 -------
Version 4.2.1 offers somewhat better results:

With -O0:
        .file   "test.c"
/* File "test.c": code  109 = 0x006d (  74), prologues  18, epilogues  17 */

With -O[123s]:
        .file   "test.c"
/* File "test.c": code   79 = 0x004f (  28), prologues  26, epilogues  25 */

At least the code no longer increases in size when optimization is turned on.

Dmitry, are the results for 4.2.1 acceptable for this bug report?

------- Comment #10 From Dmitry K. 2007-07-27 01:24 -------
Yes, results are:

avr-gcc-3.3.6:  O0 --> 75,  O1,O2,O3,Os --> 79
avr-gcc-4.2.1:  O0 --> 109, O1,O2,O3,Os --> 79

The mistake is corrected?  It is possible to tell and so as now
application of keys of optimization shortens a code. However,
"correction" does not improve quality of optimization, it only worsens
not optimized code.

------- Comment #11 From Eric Weddington 2007-07-27 14:23 -------
Subject: RE:  [avr-gcc] Optimization decrease performance of
 struct assignment.



> ------- Comment #10 from dmixm at marine dot febras dot ru
> 2007-07-27 01:24 -------
> Yes, results are:
>
> avr-gcc-3.3.6:  O0 --> 75,  O1,O2,O3,Os --> 79
> avr-gcc-4.2.1:  O0 --> 109, O1,O2,O3,Os --> 79
>
> The mistake is corrected?  It is possible to tell and so as now
> application of keys of optimization shortens a code. However,
> "correction" does not improve quality of optimization, it only worsens
> not optimized code.

Agreed. But is this sufficient to close this bug report?

Eric

------- Comment #12 From Dmitry K. 2007-09-09 21:59 -------
Andy Hutchinson wrote (comment #6) that addition a 'movdi' instruction improves
the result. I have try to add a very simple 'movdi' (which split into 2 SImode
instuctions). In result:
   -O0  --> 85 words,
   -O1,2,3,s  -->  50 words.
Version is 4.2.1

------- Comment #13 From Eric Weddington 2007-09-11 16:10 -------
(In reply to comment #12)
> Andy Hutchinson wrote (comment #6) that addition a 'movdi' instruction improves
> the result. I have try to add a very simple 'movdi' (which split into 2 SImode
> instuctions). In result:
>    -O0  --> 85 words,
>    -O1,2,3,s  -->  50 words.
> Version is 4.2.1
> 

That's great! Dmitry, could you attach your patch?

------- Comment #14 From Dmitry K. 2007-09-16 03:22 -------
It was:

(define_insn "movdi"
  [(set (match_operand:DI 0 "nonimmediate_operand" "")
        (match_operand:DI 1 "general_operand" ""))]
  ""
  "#")
(define_split
  [(set (match_operand:DI 0 "nonimmediate_operand" "")
        (match_operand:DI 1 "general_operand" ""))]
  "reload_completed"
  [(set (match_dup 2) (match_dup 4))
   (set (match_dup 3) (match_dup 5))]
  "{
     operands[2] = simplify_gen_subreg (SImode, operands[0], DImode, 0);
     operands[3] = simplify_gen_subreg (SImode, operands[0], DImode, 4);
     operands[4] = simplify_gen_subreg (SImode, operands[1], DImode, 0);
     operands[5] = simplify_gen_subreg (SImode, operands[1], DImode, 4);
   }")

Alas, this elementary addition has appeared erroneous.
The following program leads to emergency end of compilation:

int main ()
{
    volatile long long x = 0x0102030405060708LL;

    if (x != 0x0102030405060708LL)
        exit (__LINE__);
    exit (0);
}

------- Comment #15 From Rask Ingemann Lambertsen 2007-09-16 12:57 -------
1. You should use define_insn_and_split.
2. If possible (which I think it is here), splitting before reload should
produc.e better code.

Btw, what is the ICE?

Also, it seems to me that avr.h defines MOVE_MAX incorrectly. Shouldn't it be 2
instead of 4?

------- Comment #16 From Rask Ingemann Lambertsen 2007-09-16 13:38 -------
Created an attachment (id=14211) [edit]
quick and dirty patch to reduce code size

A fundamental problem with the AVR back end is that it sabotages the RTL
optimizer's attempts to optimize away redundant move insns. With this patch, I
get better code (-O2):

foo:
        push r29         ;  73  *pushhi/1       [length = 2]
        push r28
        rcall .  ;  79  *addhi3_sp_R_pc2        [length = 3]
        rcall .
        rcall .
        in r28,__SP_L__  ;  90  *movqi/6        [length = 1]
        in r29,__SP_H__  ;  91  *movqi/6        [length = 1]
/* prologue: function */
/* frame size = 6 */
        ldi r24,lo8(2)   ;  68  *movqi/2        [length = 1]
        std Y+5,r24      ;  70  *movqi/3        [length = 1]
        std Y+6,__zero_reg__     ;  71  *movqi/3        [length = 1]
        ldi r22,lo8(2)   ;  21  *movqi/2        [length = 1]
        ldi r23,lo8(0)   ;  23  *movqi/2        [length = 1]
        ldi r18,lo8(1)   ;  42  *movqi/2        [length = 1]
        ldi r19,lo8(0)   ;  43  *movqi/2        [length = 1]
        ldi r20,lo8(0)   ;  44  *movqi/2        [length = 1]
        ldi r21,lo8(0)   ;  45  *movqi/2        [length = 1]
        ldi r24,lo8(0)   ;  48  *movqi/2        [length = 1]
        ldi r25,lo8(0)   ;  49  *movqi/2        [length = 1]
/* epilogue start */
        adiw r28,6       ;  85  *addhi3/2       [length = 1]
        out __SP_L__,r28         ;  92  *movqi/5        [length = 1]
        out __SP_H__,r29         ;  93  *movqi/5        [length = 1]
        pop r28  ;  87  pophi   [length = 2]
        pop r29
        ret      ;  88  return_from_epilogue    [length = 1]
        .size   foo, .-foo
/* File "/home/rask/pr11180.c": code    0 = 0x0000 (   0), prologues   0,
epilogues   0 */

Perhaps a peephole optimization could replace insns 48 and 49 with "movw
r25:r24, r21:r20".
The only big further improvement I see would be to get rid of the stack var.

------- Comment #17 From Rask Ingemann Lambertsen 2007-09-16 13:54 -------
The patch is against mainline revision 128431.

------- Comment #18 From Eric Weddington 2007-09-16 15:44 -------
Subject: RE:  [avr-gcc] Optimization decrease performance of
 struct assignment.



> ------- Comment #15 from rask at gcc dot gnu dot org
> 2007-09-16 12:57 -------

> Also, it seems to me that avr.h defines MOVE_MAX incorrectly.
> Shouldn't it be 2
> instead of 4?

Hmm. Yes and no. There are variants that have the MOVW instruction, and for
those it should be 2. Otherwise, if they don't have the MOVW instruction, it
should be 1, I think.

------- Comment #19 From Rask Ingemann Lambertsen 2007-09-16 17:30 -------
Created an attachment (id=14213) [edit]
quick and dirty patch to reduce code size

Here's a patch which doesn't mess up the stack pointer update in the epilogue.

The code size output is broken on mainline, so here's what I use:
$ gcc/xgcc -Bgcc/ -W -Wall ~/pr11180.c -S -dp -o /dev/stdout -O2 | awk
'match($0, /\[length = ([[:digit:]]+)\]/, field) { sum += field[1]; } END {
print sum; }'
27

Compare that to unpatched mainline:
$ gcc/xgcc -Bgcc/ -W -Wall ~/pr11180.c -S -dp -o /dev/stdout -O2 | awk
'match($0, /\[length = ([[:digit:]]+)\]/, field) { sum += field[1]; } END {
print sum; }'
36

------- Comment #20 From Eric Weddington 2007-09-17 02:31 -------
Subject: RE:  [avr-gcc] Optimization decrease performance of
 struct assignment.



> Here's a patch which doesn't mess up the stack pointer update
> in the epilogue.
>

Hi Rask,

Your patch causes a regression. Sort of.

I have a small patch that enables Objective-C for the AVR (not that anyone
would or should use it), that hasn't been committed yet:

--- gcc/config/avr/avr.h.old    2007-08-23 15:18:31.015625000 -0600
+++ gcc/config/avr/avr.h        2007-08-23 15:19:17.687500000 -0600
@@ -53,7 +53,7 @@ extern int avr_mega_p;
 extern int avr_have_mul_p;
 extern int avr_asm_only_p;
 extern int avr_have_movw_lpmx_p;
-#ifndef IN_LIBGCC2
+#if !defined(IN_LIBGCC2) && !defined(IN_TARGET_LIBS)
 extern GTY(()) section *progmem_section;
 #endif

Well, your patch causes a new error in configuring libobjc (when using the
patch above):

checking for thread model used by GCC... single
checking for exception model to use... configure: error: unable to detect
exception model
make[1]: *** [configure-target-libobjc] Error 1
make[1]: Leaving directory `/c/avrdev/gcc/build'
make: *** [all] Error 2

Normally, configure detects the exception model as sjlj.

The portion of your patch that changes MOVE_MAX in avr.h is fine. The
problem seems to be something with your changes in avr.md.

Why this is happening, I don't really know. I'm certainly more concerned
with optimizing the C compiler than I am in having ObjC build.

Thanks,
Eric Weddington

------- Comment #21 From Rask Ingemann Lambertsen 2007-09-17 11:13 -------
It's probably someting simple, see config.log. Like I said, the patch is a
quick and dirty one and the AVR back end can use more work than that, most of
which means deleting patterns. Examples: All and, ior, xor, one_cmpl and sign
extend patterns larger than qimode, all zero_extend patterns, all movsi, movdi
and such. Along with that all the output_xxx functions in avr.c that they use.

------- Comment #22 From Eric Weddington 2007-09-17 22:53 -------
Subject: RE:  [avr-gcc] Optimization decrease performance of
 struct assignment.



> ------- Comment #21 from rask at gcc dot gnu dot org
> 2007-09-17 11:13 -------
> It's probably someting simple, see config.log.

Here it is:

configure:10398: error: unrecognizable insn:
(insn 105 104 106 2 (set (subreg:QI (reg/f:HI 52) 0)
        (subreg:QI (label_ref:HI 57) 0)) -1 (nil))
configure:10398: internal compiler error: in extract_insn, at recog.c:1990

------- Comment #23 From Rask Ingemann Lambertsen 2007-09-18 07:14 -------
> configure:10398: error: unrecognizable insn:
> (insn 105 104 106 2 (set (subreg:QI (reg/f:HI 52) 0)
>         (subreg:QI (label_ref:HI 57) 0)) -1 (nil))
> configure:10398: internal compiler error: in extract_insn, at recog.c:1990

In define_insn_and_split "*movhi", add the line

   && LABEL_REF != GET_CODE (operands[1])

where it already says

   && SYMBOL_REF != GET_CODE (operands[1])

and the error should go away.

------- Comment #24 From Eric Weddington 2007-09-18 19:06 -------
Subject: RE:  [avr-gcc] Optimization decrease performance of
 struct assignment.



> ------- Comment #23 from rask at gcc dot gnu dot org
>
> In define_insn_and_split "*movhi", add the line
>
>    && LABEL_REF != GET_CODE (operands[1])
>
> where it already says
>
>    && SYMBOL_REF != GET_CODE (operands[1])
>
> and the error should go away.

I *added* the line (not replace) as you suggested, and now it generates a
new error during build:

c:/avrdev/gcc/gcc-4.3-20070914/libobjc/Object.m:66: error: unrecognizable
insn:
(insn 54 4 55 2 c:/avrdev/gcc/gcc-4.3-20070914/libobjc/Object.m:65 (set
(reg:QI 22 r22 [ D.2345 ])
        (subreg:QI (const:HI (plus:HI (symbol_ref:HI
("_OBJC_SELECTOR_TABLE") [flags 0x2] <var_decl 013C4A20
_OBJC_SELECTOR_TABLE>)
                    (const_int 8 [0x8]))) 0)) -1 (nil))

------- Comment #25 From Rask Ingemann Lambertsen 2007-09-18 19:49 -------
> c:/avrdev/gcc/gcc-4.3-20070914/libobjc/Object.m:66: error: unrecognizable
> insn:
> (insn 54 4 55 2 c:/avrdev/gcc/gcc-4.3-20070914/libobjc/Object.m:65 (set
> (reg:QI 22 r22 [ D.2345 ])
>         (subreg:QI (const:HI (plus:HI (symbol_ref:HI
> ("_OBJC_SELECTOR_TABLE") [flags 0x2] <var_decl 013C4A20
> _OBJC_SELECTOR_TABLE>)
>                     (const_int 8 [0x8]))) 0)) -1 (nil))

That's a similar story: Add a line with
&& CONST != GET_CODE (operands[1])

------- Comment #26 From Eric Weddington 2007-09-19 05:28 -------
Subject: RE:  [avr-gcc] Optimization decrease performance of
 struct assignment.



> > c:/avrdev/gcc/gcc-4.3-20070914/libobjc/Object.m:66: error:
> unrecognizable
> > insn:
> > (insn 54 4 55 2
> c:/avrdev/gcc/gcc-4.3-20070914/libobjc/Object.m:65 (set
> > (reg:QI 22 r22 [ D.2345 ])
> >         (subreg:QI (const:HI (plus:HI (symbol_ref:HI
> > ("_OBJC_SELECTOR_TABLE") [flags 0x2] <var_decl 013C4A20
> > _OBJC_SELECTOR_TABLE>)
> >                     (const_int 8 [0x8]))) 0)) -1 (nil))
>
> That's a similar story: Add a line with
> && CONST != GET_CODE (operands[1])

That finally did the trick! At least it builds now...

------- Comment #27 From Eric Weddington 2007-09-19 17:57 -------
Created an attachment (id=14224) [edit]
Rask's patch modified from comments.

Here is Rask's patch again, but slightly modified from the comments.

------- Comment #28 From Andy Hutchinson 2008-02-02 15:44 -------
The patch and suggestions on this are valid. However, memory moves  -
particular with base pointers, may require additional instruction to be added
to reach required displacments. Splitting such moves may well incur the
overhead per BYTE!

So we need to get the pointers sorted so that (for example) 4 separate QI bytes
is just as good as 1 access to 4 SI bytes.

As for other pattern removal YES!

The reason to change MOVE_MAX is not made clear. I understood this to control
the threshold for using movmem rather than inline RTL. movmem wins (in size) at
about 8+ bytes. Does it have another use related to this problem?

------- Comment #29 From Eric Weddington 2008-02-19 02:47 -------
Rask's patch (gcc-4.3-bug-11180-experimental.patch) causes worse code for the
test case in bug #32871, than without the patch.

Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug