Bug 38453 - Output code optimisation excessive use of builtins
Summary: Output code optimisation excessive use of builtins
Status: RESOLVED DUPLICATE of bug 32044
Alias: None
Product: gcc
Classification: Unclassified
Component: c (show other bugs)
Version: 4.3.2
: P3 normal
Target Milestone: ---
Assignee: Steven Bosscher
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-12-09 14:49 UTC by Vincent Sanders
Modified: 2008-12-10 11:25 UTC (History)
18 users (show)

See Also:
Host: x86_64-build_unknown-linux-gnu
Target: arm-unknown-linux-gnu
Build: x86_64-build_unknown-linux-gnu
Known to work:
Known to fail:
Last reconfirmed: 2008-12-10 10:51:37


Attachments
Trivial test code to show behaviour (209 bytes, text/x-csrc)
2008-12-09 14:51 UTC, Vincent Sanders
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Vincent Sanders 2008-12-09 14:49:55 UTC
While compiling compression code for LZMA for use with an embedded ARM target I have discovered a regression from previous editions of GCC.

I have pared this down to a trivial example (attached) which boils down to a application specific modulus operation (please note this is the *minimal* test case and obviously is a bit more complex buried in the middle of the compression system. The behavior exhibited remains the same in both the large and small systems.

The simple test case is compiled with  
arm-unknown-linux-gnu-gcc -Os -o foo test.c

and the resulting objdump is:

000083fc <foo>:
    83fc:       e92d4010        push    {r4, lr}
    8400:       e5d11000        ldrb    r1, [r1]
    8404:       e1a04000        mov     r4, r0
    8408:       e1a02001        mov     r2, r1
    840c:       ea000002        b       841c <foo+0x20>
    8410:       e5943004        ldr     r3, [r4, #4]
    8414:       e2833001        add     r3, r3, #1      ; 0x1
    8418:       e5843004        str     r3, [r4, #4]
    841c:       e242302d        sub     r3, r2, #45     ; 0x2d
    8420:       e352002c        cmp     r2, #44 ; 0x2c
    8424:       e20320ff        and     r2, r3, #255    ; 0xff
    8428:       8afffff8        bhi     8410 <foo+0x14>
    842c:       e1a00001        mov     r0, r1
    8430:       e3a0102d        mov     r1, #45 ; 0x2d
    8434:       eb000003        bl      8448 <__umodsi3>
    8438:       e20000ff        and     r0, r0, #255    ; 0xff
    843c:       e5840000        str     r0, [r4]
    8440:       e8bd8010        pop     {r4, pc}

if a differing optimisation is used:

arm-unknown-linux-gnu-gcc -O2 -o foo test.c

000083fc <foo>:
    83fc:       e92d4070        push    {r4, r5, r6, lr}
    8400:       e5d14000        ldrb    r4, [r1]
    8404:       e354002c        cmp     r4, #44 ; 0x2c
    8408:       e1a06000        mov     r6, r0
    840c:       9a00000e        bls     844c <foo+0x50>
    8410:       e244402d        sub     r4, r4, #45     ; 0x2d
    8414:       e20440ff        and     r4, r4, #255    ; 0xff
    8418:       e5905004        ldr     r5, [r0, #4]
    841c:       e3a0102d        mov     r1, #45 ; 0x2d
    8420:       e1a00004        mov     r0, r4
    8424:       eb00004f        bl      8568 <__umodsi3>
    8428:       e3a0102d        mov     r1, #45 ; 0x2d
    842c:       e1a03000        mov     r3, r0
    8430:       e1a00004        mov     r0, r4
    8434:       e20340ff        and     r4, r3, #255    ; 0xff
    8438:       eb000006        bl      8458 <__aeabi_uidiv>
    843c:       e2855001        add     r5, r5, #1      ; 0x1
    8440:       e20000ff        and     r0, r0, #255    ; 0xff
    8444:       e0855000        add     r5, r5, r0
    8448:       e5865004        str     r5, [r6, #4]
    844c:       e5864000        str     r4, [r6]
    8450:       e8bd8070        pop     {r4, r5, r6, pc}

Actually several optimization levels were tried and all produced similar output

GCC 4.2.2 and 4.2.4 (which are our current compliers) 
arm-unknown-linux-gnueabi-gcc -Os -o foo test.c
produce:

00008328 <foo>:
    8328:       e5d12000        ldrb    r2, [r1]
    832c:       ea000003        b       8340 <foo+0x18>
    8330:       e5903004        ldr     r3, [r0, #4]
    8334:       e20120ff        and     r2, r1, #255    ; 0xff
    8338:       e2833001        add     r3, r3, #1      ; 0x1
    833c:       e5803004        str     r3, [r0, #4]
    8340:       e352002c        cmp     r2, #44 ; 0x2c
    8344:       e242102d        sub     r1, r2, #45     ; 0x2d
    8348:       8afffff8        bhi     8330 <foo+0x8>
    834c:       e5802000        str     r2, [r0]
    8350:       e12fff1e        bx      lr



As can be seen the trivial loop is performed and the divisor and remainder found but then the __umodsi3 builtin is called to do the operation *again* and that used to assign the result which is already available from the loop!

This odd behavior is seen in cross built (and native) GCC 4.3.2 but not in 4.2.4 it seems to be present in current development builds however I have issues building those reliably so cannot give definite results.

The behavior is especially obvious with large performance and code size degradation in compression code on small embedded system. Also the additional need to link in the __umodsi3 implementation causes more space to be lost. 

This has also been observed in some circumstances within ARM kernels when using modulous on powers of two! the obvious optimisation using shifts is performed and then the value recomputed using __modsi3

Just for completeness here is the GCC 4.3.2 compiler used for the tests (the 4.3.4 produces identical compiled output but has other undesirable behaviors not relevant to this report)

arm-unknown-linux-gnu-gcc -v
Using built-in specs.
Target: arm-unknown-linux-gnu
Configured with: /opt/simtec/crosstool-ng/targets/src/gcc-4.3.2/configure --build=x86_64-build_unknown-linux-gnu --host=x86_64-build_unknown-linux-gnu --target=arm-unknown-linux-gnu --prefix=/opt/simtec/arm-unknown-linux-gnu --with-sysroot=/opt/simtec/arm-unknown-linux-gnu/arm-unknown-linux-gnu/sys-root --enable-languages=c,c++,fortran,java --disable-multilib --with-float=soft --with-gmp=/opt/simtec/arm-unknown-linux-gnu --with-mpfr=/opt/simtec/arm-unknown-linux-gnu --with-pkgversion=crosstool-NG-1.3.0 --enable-__cxa_atexit --with-local-prefix=/opt/simtec/arm-unknown-linux-gnu/arm-unknown-linux-gnu/sys-root --disable-nls --enable-threads=posix --enable-symvers=gnu --enable-c99 --enable-long-long --enable-target-optspace
Thread model: posix
gcc version 4.3.2 (crosstool-NG-1.3.0)
Comment 1 Vincent Sanders 2008-12-09 14:51:34 UTC
Created attachment 16854 [details]
Trivial test code to show behaviour
Comment 2 Andrew Pinski 2008-12-10 00:25:52 UTC
I don't see an issue here really, the code got optimized to just:
<bb 2>:
  prop0.24 = *propsData;
  prop0 = prop0.24;
  goto <bb 4>;

<bb 3>:
  propsRes->pb = [plus_expr] propsRes->pb + 1;
  prop0 = prop0 + 211;

<bb 4>:
  if (prop0 > 44)
    goto <bb 3>;
  else
    goto <bb 5>;

<bb 5>:
  propsRes->lc = (int) (int) (prop0.24 % 45);
  return;

But since for arm, there is no %/divide instruction (which is sad by the way), a call to __umodsi3/__aeabi_uidiv is used.
Comment 3 Steven Bosscher 2008-12-10 10:51:37 UTC
Investigating.
Comment 4 pinskia@gmail.com 2008-12-10 11:20:27 UTC
Subject: Re:  Output code optimisation excessive use of builtins



Sent from my iPhone

On Dec 10, 2008, at 2:51 AM, "steven at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org 
 > wrote:

>
>
> ------- Comment #3 from steven at gcc dot gnu dot org  2008-12-10  
> 10:51 -------
> Investigating.
>
There is no reason to investigate.  The reason why this change  
happened was because the hurestic in scev-cp was removed and is now  
done always. There is another bug about this with respect to the Linux  
kernel too.
Thanks,
Andrew Pinski


>
> -- 
>
> steven at gcc dot gnu dot org changed:
>
>           What    |Removed                     |Added
> --- 
> --- 
> ----------------------------------------------------------------------
>         AssignedTo|unassigned at gcc dot gnu   |steven at gcc dot  
> gnu dot
>                   |dot org                     |org
>             Status|UNCONFIRMED                 |ASSIGNED
>     Ever Confirmed|0                           |1
>   Last reconfirmed|0000-00-00 00:00:00         |2008-12-10 10:51:37
>               date|                            |
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38453
>
Comment 5 Steven Bosscher 2008-12-10 11:24:59 UTC
See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32044#c5
Comment 6 Steven Bosscher 2008-12-10 11:25:59 UTC

*** This bug has been marked as a duplicate of 32044 ***