Bug 9723 - With -Os optimization increases size if the loop contains array element access
Summary: With -Os optimization increases size if the loop contains array element access
Status: RESOLVED WORKSFORME
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 3.3
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: 16996
  Show dependency treegraph
 
Reported: 2003-02-17 15:26 UTC by gertom
Modified: 2019-03-06 07:52 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail: 4.0.0
Last reconfirmed: 2006-04-24 22:41:02


Attachments
array-in-loop.tar.gz (1.59 KB, application/x-gzip )
2003-05-21 15:17 UTC, gertom
Details

Note You need to log in before you can comment on or make changes to this bug.
Description gertom 2003-02-17 15:26:00 UTC
When an array element is accessed in a loop, gcc makes the following optimization: it counts the address of the first array element before the loop, and then in/decreases this value by the element size within the loop. This makes the loop faster and shorter than in the case of -O1, but the overall size increases (as opposit to the purpose of -Os).

Compare the result of arm-elf-gcc -S -g0 -Os with the output created using arm-elf-gcc -S -g0 -O1. GCC should produce the later kind of output with -Os.

Release:
gcc version 3.3 20030210 (prerelease)

Environment:
BUILD & HOST: Linux 2.4.20 i686 unknown
TARGET: arm-unknown-elf

How-To-Repeat:
arm-elf-gcc -S -g0 -Os

// 01.c:

# 1 "01.c"
# 1 "<built-in>"
# 1 "<command line>"
# 1 "01.c"
typedef struct {
        int si1;
        short ss1;
        char sc1;
        int si2;
        char sc2;
        short ss2;
        short ss3;
} st;

int f1(st* p, int c, int n)
{
  int i;
  for (i = c-1; i >= 0; i--) {
    if (p[i].si1 == n) {
      return 1;
    }
  }
  return 0;
}
Comment 1 Andrew Pinski 2003-06-02 01:17:41 UTC
Same is also true on PPC on the mainline (20030529):
[omni:~/src/gccPRs] pinskia% gcc -O1 -S -o - pr9723.c
.text
        .align 2
        .globl _f1
_f1:
        mr r2,r3
L9:
        addic. r4,r4,-1
        blt- cr0,L8
        mulli r0,r4,20
        lwzx r0,r2,r0
        li r3,1
        cmpw cr7,r0,r5
        beqlr- cr7
        b L9
L8:
        li r3,0
        blr
[omni:~/src/gccPRs] pinskia% gcc -Os -S -o - pr9723.c
.text
        .align 2
        .globl _f1
_f1:
        addic. r0,r4,-1
        mr r2,r3
        blt- cr0,L8
        mulli r0,r0,20
        add r4,r0,r3
L6:
        lwz r0,0(r4)
        addi r4,r4,-20
        cmpw cr6,r4,r2
        li r3,1
        cmpw cr7,r0,r5
        beqlr- cr7
        bge+ cr6,L6
L8:
        li r3,0
        blr
Comment 2 Steven Bosscher 2005-01-23 15:17:56 UTC
On i686 I have the following code size (with -fomit-frame-pointer): 
 
   text    data     bss     dec     hex filename 
     66       0       0      66      42 O2.o 
     48       0       0      48      30 Os.o 
 
For AMD64 I have the following: 
   text    data     bss     dec     hex filename 
    118       0       0     118      76 O2.o 
     83       0       0      83      53 Os.o 
 
So, is there still a problem for ARM and PPC? 
 
Comment 3 Andrew Pinski 2005-01-23 15:22:49 UTC
(In reply to comment #2)
-Os is even worse on the mainline now:
_f1:
        addi r0,r4,-1
        addi r4,r4,1
        cmpwi cr7,r0,-1
        mulli r0,r0,20
        mtctr r4
        add r3,r3,r0
        bge+ cr7,L2
        li r0,1
        mtctr r0
        b L2
L3:
        lwz r0,0(r3)
        addi r3,r3,-20
        cmpw cr7,r0,r5
        bne+ cr7,L2
        li r3,1
        blr
L2:
        bdnz L3
        li r3,0
        blr
Comment 4 Andrew Pinski 2006-03-01 02:43:47 UTC
+1 one more instruction on PPC:
_f1:
        addi r0,r4,-1
        cmpwi cr7,r0,-1
        mulli r0,r4,20
        addi r4,r4,1
        add r3,r3,r0
        mtctr r4
        addi r3,r3,-20
        bge+ cr7,L2
        li r0,1
        mtctr r0
        b L2
L3:
        lwz r0,0(r3)
        addi r3,r3,-20
        cmpw cr7,r0,r5
        bne+ cr7,L2
        li r3,1
        blr
L2:
        bdnz L3
        li r3,0
        blr
Comment 5 Steven Bosscher 2019-03-06 07:52:00 UTC
For arm and ppc, trunk today at -Os produces smaller code than -O1 and -O2.