This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re : gcc 3.4 > mainline performance regression

Hi all!
I don't know exactly if I've understood all your previous explanation (excepted the load & store motion part), but we pointed out 2 different problems:

Pb n°1: depending on the optimization level -03, a[0] and a[1] are being loaded and stored on each loop iteration
Pb n°2: depending on the optimization level -0s, the max range limit value (1.000.000) is loaded on each loop iteration (previously in gcc 3.4.2, it was loaded once, then the register holding it was decremented by one until null flag detected)

It seems to me that your current remaks only apply to Pb n°1, am I wrong?

Anyway, thanks a lot for your help!

ps: my ref code
when compiled with -mthumb -Os, we get:
00000000 <foo>:
  0:    b510          push    {r4, lr}
  2:    6802          ldr    r2, [r0, #0]
  4:    6844          ldr    r4, [r0, #4]
  6:    2100          movs    r1, #0
  8:    4b03          ldr    r3, [pc, #12]    (18 <.text+0x18>)
  a:    3101          adds    r1, #1
  c:    1912          adds    r2, r2, r4
  e:    4299          cmp    r1, r3
 10:    d1fa          bne.n    8 <foo+0x8>
 12:    6002          str    r2, [r0, #0]
 14:    bd10          pop    {r4, pc}
 16:    0000          lsls    r0, r0, #0
 18:    4240          negs    r0, r0
 1a:    000f          lsls    r7, r1, #0

Pb n°1: The Load of the loop end value is performed within the loop !

   when compiled with -mthumb -O3, we get:
00000000 <foo>:
  0:    b530          push    {r4, r5, lr}
  2:    6802          ldr    r2, [r0, #0]
  4:    4d05          ldr    r5, [pc, #20]    (1c <.text+0x1c>)
  6:    1d04          adds    r4, r0, #4
  8:    2100          movs    r1, #0
  a:    6823          ldr    r3, [r4, #0]
  c:    3101          adds    r1, #1
  e:    18d3          adds    r3, r2, r3
 10:    1c1a          adds    r2, r3, #0
 12:    6003          str    r3, [r0, #0]
 14:    42a9          cmp    r1, r5
 16:    d1f8          bne.n    a <foo+0xa>
 18:    bd30          pop    {r4, r5, pc}
 1a:    0000          lsls    r0, r0, #0
 1c:    4240          negs    r0, r0
 1e:    000f          lsls    r7, r1, #0


----- Message d'origine ----
De : Steven Bosscher <>
À : David Edelsohn <>
Cc : Andrew Haley <>;
Envoyé le : Vendredi, 5 Janvier 2007, 17h55mn 47s
Objet : Re: gcc 3.4 > mainline performance regression

On 1/5/07, David Edelsohn <> wrote:
> >>>>> Steven Bosscher writes:
> Steven> What does the code look like if you compile with -O2  -fgcse-sm?
>         Yep.  Mark and I recently discussed whether gcse-sm should be
> enabled by default at some optimization level.  We're hiding performance
> from GCC users.

The problem with it used to be that it was just very broken. When I
fixed PR24257, it was still not possible to bootstrap with gcse store
motion enabled.

Putting someone on fixing tree load&store motion is probably more
useful anyway, if you're going to do load&store motion for
performance.  In RTL, we can't move loads and stores that are not
simple loads or stores (i.e. reg <- mem, or mem <- reg). There are two
very popular targets where this is the common case ;-)


Do You Yahoo!?
En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible contre les messages non sollicités Yahoo! Mail

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]