[Bug rtl-optimization/52060] New: Incorrect mask/and (ARM "bic") instruction generated for shifted expression parameter, triggered by -O2 -finline-functions
swarren at nvidia dot com
gcc-bugzilla@gcc.gnu.org
Mon Jan 30 21:36:00 GMT 2012
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52060
Bug #: 52060
Summary: Incorrect mask/and (ARM "bic") instruction generated
for shifted expression parameter, triggered by -O2
-finline-functions
Classification: Unclassified
Product: gcc
Version: 4.6.2
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: swarren@nvidia.com
Created attachment 26519
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26519
Test case source
Compiling the attached test case for ARM with inlining enabled produces a buggy
executable.
(on x86-64 host)
${CROSS_COMPILE}gcc -O2 -o testprog-good testprog.c
(on ARM target)
$ ./testprog-good
clearValue: 1.000000 -> 65535
Expected: 65535
(on x86-64 host)
${CROSS_COMPILE}gcc -O2 -finline-functions -o testprog-bad testprog.c
(on ARM target)
$ ./testprog-bad
clearValue: 1.000000 -> 57344
Expected: 65535
gcc is gcc-4.6.1 compiled using crosstool-ng (obtained from hg on 2011/06/08),
built to target ARM Cortex-A9 by default. I can supply more details if needed,
but I don't think crosstool-ng is patching/munging/... gcc when building it.
(I also confirmed the bug is still present in gcc-4.6.2)
In both cases, quick_float_to_int() is compiled identically:
000083dc <quick_float_to_int>:
83dc: e1a03400 lsl r3, r0, #8
83e0: e7e72bd0 ubfx r2, r0, #23, #8
83e4: e352007d cmp r2, #125 ; 0x7d
83e8: c3833102 orrgt r3, r3, #-2147483648 ; 0x80000000
83ec: e262209d rsb r2, r2, #157 ; 0x9d
83f0: d3833102 orrle r3, r3, #-2147483648 ; 0x80000000
83f4: c1a01233 lsrgt r1, r3, r2
83f8: d1a01233 lsrle r1, r3, r2
83fc: e3e00000 mvn r0, #0
8400: e1c33210 bic r3, r3, r0, lsl r2
8404: c281c001 addgt ip, r1, #1
8408: e2011002 and r1, r1, #2
840c: c1a0c0ac lsrgt ip, ip, #1
8410: d3a0c000 movle ip, #0
8414: e1931001 orrs r1, r3, r1
8418: 03e00001 mvneq r0, #1
841c: e000000c and r0, r0, ip
8420: e12fff1e bx lr
When inlining is not enabled, convert() calls quick_float_to_int():
(good)
00008424 <convert>:
8424: e30f1ff0 movw r1, #65520 ; 0xfff0
8428: e92d4008 push {r3, lr}
842c: e344197f movt r1, #18815 ; 0x497f
8430: eb00001a bl 84a0 <__aeabi_fmul>
8434: ebffffe8 bl 83dc <quick_float_to_int>
8438: e3083808 movw r3, #34824 ; 0x8808
843c: e1a026a0 lsr r2, r0, #13
8440: e3403000 movt r3, #0
8444: e7933102 ldr r3, [r3, r2, lsl #2]
8448: e2632007 rsb r2, r3, #7
844c: e1a00230 lsr r0, r0, r2
8450: e1a00980 lsl r0, r0, #19
8454: e1a009a0 lsr r0, r0, #19
8458: e1800683 orr r0, r0, r3, lsl #13
845c: e8bd8008 pop {r3, pc}
When inlining is enabled, convert() inlines quick_float_to_int():
(bad)
00008424 <convert>:
8424: e30f1ff0 movw r1, #65520 ; 0xfff0
8428: e92d4008 push {r3, lr}
842c: e344197f movt r1, #18815 ; 0x497f
8430: eb000024 bl 84c8 <__aeabi_fmul>
8434: e7e73bd0 ubfx r3, r0, #23, #8
8438: e59f2044 ldr r2, [pc, #68] ; 8484 <convert+0x60>
843c: e353007d cmp r3, #125 ; 0x7d
8440: c1a00400 lslgt r0, r0, #8
8444: c263309d rsbgt r3, r3, #157 ; 0x9d
8448: d3a03000 movle r3, #0
844c: c3800102 orrgt r0, r0, #-2147483648 ; 0x80000000
8450: c1a03330 lsrgt r3, r0, r3
8454: c2833001 addgt r3, r3, #1
8458: c1a030a3 lsrgt r3, r3, #1
845c: e1a016a3 lsr r1, r3, #13
8460: e3c33d7f bic r3, r3, #8128 ; 0x1fc0
8464: e3c331fe bic r3, r3, #-2147483585 ; 0x8000003f
8468: e7920101 ldr r0, [r2, r1, lsl #2]
846c: e2602007 rsb r2, r0, #7
8470: e1a03233 lsr r3, r3, r2
8474: e1a03983 lsl r3, r3, #19
8478: e1a039a3 lsr r3, r3, #19
847c: e1830680 orr r0, r3, r0, lsl #13
8480: e8bd8008 pop {r3, pc}
8484: 00008830 .word 0x00008830
Here's the problem:
C variable "linear" is in:
Good code: Register r0 at address 0x843c.
Bad code: Register r3 at address 0x845c.
In both cases, the value is identical (checked with gdb)
This value is then used in the expression:
nl = (lOnes<<13) | ((linear>>(7-lOnes))&0x1fff); //[15:13] leading ones
This is implemented starting at 0x8448 in the good code and 0x846c in the bad
code.
In the good code, "linear" (r0) doesn't change between those two points. In the
bad code, "linear" (r3) does change due to the two "bic" instructions at
address 0x8460. Those two bic instructions bitwise-and "linear" with
0x7fffe000. "linear" is 0xfffff, and this results in linear==0xfe000. I think
the compiler is synthesizing the bic instructions because of the "& 0x1fff" in
the expression, but when applying it to "linear", it's failing to shift the
mask left by "7-lOnes" when applying it to the unshifted value of "linear". In
fact, the compiler doesn't even need to do the mask separately, since that's
what the lsl/lsr #19 at address 0x8474 in the bad code are doing anyway. The
good code doesn't have any such bic instructions.
This issue did not occur using CodeSourcery's 2009q1-203 compiler, which is
gcc-4.3.3.
Note that putting a printf() inside convert() after the call to linear, or
assigning linear to some global variable, will cause the bug not to occur.
More information about the Gcc-bugs
mailing list