This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/50717] New: Silent code gen fault with incorrect widening of multiply
- From: "mgretton at sourceware dot org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 13 Oct 2011 15:48:35 +0000
- Subject: [Bug tree-optimization/50717] New: Silent code gen fault with incorrect widening of multiply
- Auto-submitted: auto-generated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50717
Bug #: 50717
Summary: Silent code gen fault with incorrect widening of
multiply
Classification: Unclassified
Product: gcc
Version: 4.7.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: mgretton@sourceware.org
Host: x86_64-linux-gnu
Target: arm-none-eabi
Created attachment 25483
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25483
Executable test case.
The attached test case fails when compiled and executed as follows:
arm-none-eabi-gcc -O2 gen_exec.c -o gen_exec.axf -fno-expensive-optimizations
.../linaro-qemu/0.15.50/bin/qemu-arm ./gen_exec.axf
The two functions in the test case f0a and f0b are identical, just compiled
with -fexpensive-optimizations off (for f0a) and on (for f0b). The code
generation differences produce an incorrect result.
The attached file gen_exec_simple.c contains the individual f0b function for
compilation.
The attached tree dumps show the first difference between compiling
gen_exec_simple.c with and without -fexpensive-optimizations. The main
difference seems to be the following:
--- gen_exec_simple.c.135t.tailc.cheap 2011-10-13 15:02:50.000000000 +0100
+++ gen_exec_simple.c.135t.tailc.expensive 2011-10-13 15:03:15.000000000
+0100
@@ -3,6 +3,7 @@
f0b (uint32_t * restrict arg1, uint64_t * restrict arg2, uint8_t * restrict
arg3)
{
+ <unnamed-unsigned:32> D.8363;
void * D.8362;
sizetype D.8361;
void * D.8360;
@@ -67,7 +68,8 @@ f0b (uint32_t * restrict arg1, uint64_t
D.8255_41 = MEM[base: D.8362_127, offset: 0B];
D.8256_42 = D.8252_36 * D.8255_41;
D.8257_43 = (uint64_t) D.8256_42;
- D.8258_44 = D.8257_43 + temp_1_18;
+ D.8363_7 = (<unnamed-unsigned:32>) D.8245_16;
+ D.8258_44 = WIDEN_MULT_PLUS_EXPR <D.8255_41, D.8363_7, temp_1_18>;
D.8259_45 = D.8258_44 >> 1;
D.8260_46 = D.8259_45 >> 24;
D.8272_57 = D.8251_31 | 1;
That is a widening multiply/accumulate has been added to the tree. This
ultimately becomes a UMLAL in the output.
This widening multiply/accumulate is incorrect. It is trying to do the
following:
result += ((((((arg3[idx] * arg1[idx]) + temp_1)/2))>>24) / (temp_2 | 1));
Where arg3[idx] is a uint8_t, arg1[idx] is a uint32_t and temp_1 is a uint64_t.
As written in C, the result of the multiply is truncated to a 32-bit value, and
then added to the 64-bit value.
The widening multiply/accumulate, however, widens the inputs to 64-bits, and
does a 64-bit multiply before adding it to the 64-bit accumulator.
These produce a different result when the result of the multiply overflows
32-bits.
A bisect of the source leads me to believe that revision 177907 is responsible:
http://gcc.gnu.org/viewcvs?view=revision&revision=177907