[Bug rtl-optimization/52060] New: Incorrect mask/and (ARM "bic") instruction generated for shifted expression parameter, triggered by -O2 -finline-functions

Mon Jan 30 21:36:00 GMT 2012

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52060

             Bug #: 52060
           Summary: Incorrect mask/and (ARM "bic") instruction generated
                    for shifted expression parameter, triggered by -O2
                    -finline-functions
    Classification: Unclassified
           Product: gcc
           Version: 4.6.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: swarren@nvidia.com

Created attachment 26519
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26519
Test case source

Compiling the attached test case for ARM with inlining enabled produces a buggy
executable.

(on x86-64 host)

    ${CROSS_COMPILE}gcc -O2 -o testprog-good testprog.c

(on ARM target)

    $ ./testprog-good
    clearValue: 1.000000 -> 65535
    Expected: 65535

(on x86-64 host)

    ${CROSS_COMPILE}gcc -O2 -finline-functions -o testprog-bad testprog.c

(on ARM target)

    $ ./testprog-bad
    clearValue: 1.000000 -> 57344
    Expected: 65535

gcc is gcc-4.6.1 compiled using crosstool-ng (obtained from hg on 2011/06/08),
built to target ARM Cortex-A9 by default. I can supply more details if needed,
but I don't think crosstool-ng is patching/munging/... gcc when building it.

(I also confirmed the bug is still present in gcc-4.6.2)

In both cases, quick_float_to_int() is compiled identically:

000083dc <quick_float_to_int>:
83dc: e1a03400 lsl r3, r0, #8
83e0: e7e72bd0 ubfx r2, r0, #23, #8
83e4: e352007d cmp r2, #125 ; 0x7d
83e8: c3833102 orrgt r3, r3, #-2147483648 ; 0x80000000
83ec: e262209d rsb r2, r2, #157 ; 0x9d
83f0: d3833102 orrle r3, r3, #-2147483648 ; 0x80000000
83f4: c1a01233 lsrgt r1, r3, r2
83f8: d1a01233 lsrle r1, r3, r2
83fc: e3e00000 mvn r0, #0
8400: e1c33210 bic r3, r3, r0, lsl r2
8404: c281c001 addgt ip, r1, #1
8408: e2011002 and r1, r1, #2
840c: c1a0c0ac lsrgt ip, ip, #1
8410: d3a0c000 movle ip, #0
8414: e1931001 orrs r1, r3, r1
8418: 03e00001 mvneq r0, #1
841c: e000000c and r0, r0, ip
8420: e12fff1e bx lr

When inlining is not enabled, convert() calls quick_float_to_int():

(good)
00008424 <convert>:
8424: e30f1ff0 movw r1, #65520 ; 0xfff0
8428: e92d4008 push {r3, lr}
842c: e344197f movt r1, #18815 ; 0x497f
8430: eb00001a bl 84a0 <__aeabi_fmul>
8434: ebffffe8 bl 83dc <quick_float_to_int>
8438: e3083808 movw r3, #34824 ; 0x8808
843c: e1a026a0 lsr r2, r0, #13
8440: e3403000 movt r3, #0
8444: e7933102 ldr r3, [r3, r2, lsl #2]
8448: e2632007 rsb r2, r3, #7
844c: e1a00230 lsr r0, r0, r2
8450: e1a00980 lsl r0, r0, #19
8454: e1a009a0 lsr r0, r0, #19
8458: e1800683 orr r0, r0, r3, lsl #13
845c: e8bd8008 pop {r3, pc}

When inlining is enabled, convert() inlines quick_float_to_int():

(bad)
00008424 <convert>:
8424: e30f1ff0 movw r1, #65520 ; 0xfff0
8428: e92d4008 push {r3, lr}
842c: e344197f movt r1, #18815 ; 0x497f
8430: eb000024 bl 84c8 <__aeabi_fmul>
8434: e7e73bd0 ubfx r3, r0, #23, #8
8438: e59f2044 ldr r2, [pc, #68] ; 8484 <convert+0x60>
843c: e353007d cmp r3, #125 ; 0x7d
8440: c1a00400 lslgt r0, r0, #8
8444: c263309d rsbgt r3, r3, #157 ; 0x9d
8448: d3a03000 movle r3, #0
844c: c3800102 orrgt r0, r0, #-2147483648 ; 0x80000000
8450: c1a03330 lsrgt r3, r0, r3
8454: c2833001 addgt r3, r3, #1
8458: c1a030a3 lsrgt r3, r3, #1
845c: e1a016a3 lsr r1, r3, #13
8460: e3c33d7f bic r3, r3, #8128 ; 0x1fc0
8464: e3c331fe bic r3, r3, #-2147483585 ; 0x8000003f
8468: e7920101 ldr r0, [r2, r1, lsl #2]
846c: e2602007 rsb r2, r0, #7
8470: e1a03233 lsr r3, r3, r2
8474: e1a03983 lsl r3, r3, #19
8478: e1a039a3 lsr r3, r3, #19
847c: e1830680 orr r0, r3, r0, lsl #13
8480: e8bd8008 pop {r3, pc}
8484: 00008830 .word 0x00008830

Here's the problem:

C variable "linear" is in:
Good code: Register r0 at address 0x843c.
Bad code: Register r3 at address 0x845c.

In both cases, the value is identical (checked with gdb)

This value is then used in the expression:

nl = (lOnes<<13) | ((linear>>(7-lOnes))&0x1fff); //[15:13] leading ones

This is implemented starting at 0x8448 in the good code and 0x846c in the bad
code.

In the good code, "linear" (r0) doesn't change between those two points. In the
bad code, "linear" (r3) does change due to the two "bic" instructions at
address 0x8460. Those two bic instructions bitwise-and "linear" with
0x7fffe000. "linear" is 0xfffff, and this results in linear==0xfe000. I think
the compiler is synthesizing the bic instructions because of the "& 0x1fff" in
the expression, but when applying it to "linear", it's failing to shift the
mask left by "7-lOnes" when applying it to the unshifted value of "linear". In
fact, the compiler doesn't even need to do the mask separately, since that's
what the lsl/lsr #19 at address 0x8474 in the bad code are doing anyway. The
good code doesn't have any such bic instructions.

This issue did not occur using CodeSourcery's 2009q1-203 compiler, which is
gcc-4.3.3.

Note that putting a printf() inside convert() after the call to linear, or
assigning linear to some global variable, will cause the bug not to occur.