This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/50717] New: Silent code gen fault with incorrect widening of multiply


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50717

             Bug #: 50717
           Summary: Silent code gen fault with incorrect widening of
                    multiply
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: mgretton@sourceware.org
              Host: x86_64-linux-gnu
            Target: arm-none-eabi


Created attachment 25483
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25483
Executable test case.

The attached test case fails when compiled and executed as follows:

arm-none-eabi-gcc -O2 gen_exec.c -o gen_exec.axf -fno-expensive-optimizations
.../linaro-qemu/0.15.50/bin/qemu-arm  ./gen_exec.axf

The two functions in the test case f0a and f0b are identical, just compiled
with -fexpensive-optimizations off (for f0a) and on (for f0b).  The code
generation differences produce an incorrect result.

The attached file gen_exec_simple.c contains the individual f0b function for
compilation.

The attached tree dumps show the first difference between compiling
gen_exec_simple.c with and without -fexpensive-optimizations.  The main
difference seems to be the following:


--- gen_exec_simple.c.135t.tailc.cheap  2011-10-13 15:02:50.000000000 +0100
+++ gen_exec_simple.c.135t.tailc.expensive      2011-10-13 15:03:15.000000000
+0100
@@ -3,6 +3,7 @@

 f0b (uint32_t * restrict arg1, uint64_t * restrict arg2, uint8_t * restrict
arg3)
 {
+  <unnamed-unsigned:32> D.8363;
   void * D.8362;
   sizetype D.8361;
   void * D.8360;
@@ -67,7 +68,8 @@ f0b (uint32_t * restrict arg1, uint64_t 
   D.8255_41 = MEM[base: D.8362_127, offset: 0B];
   D.8256_42 = D.8252_36 * D.8255_41;
   D.8257_43 = (uint64_t) D.8256_42;
-  D.8258_44 = D.8257_43 + temp_1_18;
+  D.8363_7 = (<unnamed-unsigned:32>) D.8245_16;
+  D.8258_44 = WIDEN_MULT_PLUS_EXPR <D.8255_41, D.8363_7, temp_1_18>;
   D.8259_45 = D.8258_44 >> 1;
   D.8260_46 = D.8259_45 >> 24;
   D.8272_57 = D.8251_31 | 1;

That is a widening multiply/accumulate has been added to the tree.  This
ultimately becomes a UMLAL in the output.

This widening multiply/accumulate is incorrect.  It is trying to do the
following:

result += ((((((arg3[idx] * arg1[idx]) + temp_1)/2))>>24) / (temp_2 | 1));

Where arg3[idx] is a uint8_t, arg1[idx] is a uint32_t and temp_1 is a uint64_t.

As written in C, the result of the multiply is truncated to a 32-bit value, and
then added to the 64-bit value.

The widening multiply/accumulate, however, widens the inputs to 64-bits, and
does a 64-bit multiply before adding it to the 64-bit accumulator.

These produce a different result when the result of the multiply overflows
32-bits.

A bisect of the source leads me to believe that revision 177907 is responsible:
http://gcc.gnu.org/viewcvs?view=revision&revision=177907


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]