Bug 50717 - [4.7 Regression] Silent code gen fault with incorrect widening of multiply
Summary: [4.7 Regression] Silent code gen fault with incorrect widening of multiply
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.7.0
: P3 normal
Target Milestone: 4.7.0
Assignee: Andrew Stubbs
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2011-10-13 15:48 UTC by Matthew Gretton-Dann
Modified: 2011-10-19 14:41 UTC (History)
1 user (show)

See Also:
Host: x86_64-linux-gnu
Target: arm-none-eabi
Build:
Known to work:
Known to fail:
Last reconfirmed: 2011-10-14 00:00:00


Attachments
Executable test case. (629 bytes, text/x-csrc)
2011-10-13 15:48 UTC, Matthew Gretton-Dann
Details
Single function testcase (not executable) (255 bytes, text/x-csrc)
2011-10-13 15:49 UTC, Matthew Gretton-Dann
Details
Correct tree output (836 bytes, text/plain)
2011-10-13 15:49 UTC, Matthew Gretton-Dann
Details
Incorrect tree output (895 bytes, text/plain)
2011-10-13 15:50 UTC, Matthew Gretton-Dann
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Matthew Gretton-Dann 2011-10-13 15:48:35 UTC
Created attachment 25483 [details]
Executable test case.

The attached test case fails when compiled and executed as follows:

arm-none-eabi-gcc -O2 gen_exec.c -o gen_exec.axf -fno-expensive-optimizations
.../linaro-qemu/0.15.50/bin/qemu-arm  ./gen_exec.axf

The two functions in the test case f0a and f0b are identical, just compiled with -fexpensive-optimizations off (for f0a) and on (for f0b).  The code generation differences produce an incorrect result.

The attached file gen_exec_simple.c contains the individual f0b function for compilation.

The attached tree dumps show the first difference between compiling gen_exec_simple.c with and without -fexpensive-optimizations.  The main difference seems to be the following:


--- gen_exec_simple.c.135t.tailc.cheap  2011-10-13 15:02:50.000000000 +0100
+++ gen_exec_simple.c.135t.tailc.expensive      2011-10-13 15:03:15.000000000 +0100
@@ -3,6 +3,7 @@
 
 f0b (uint32_t * restrict arg1, uint64_t * restrict arg2, uint8_t * restrict arg3)
 {
+  <unnamed-unsigned:32> D.8363;
   void * D.8362;
   sizetype D.8361;
   void * D.8360;
@@ -67,7 +68,8 @@ f0b (uint32_t * restrict arg1, uint64_t 
   D.8255_41 = MEM[base: D.8362_127, offset: 0B];
   D.8256_42 = D.8252_36 * D.8255_41;
   D.8257_43 = (uint64_t) D.8256_42;
-  D.8258_44 = D.8257_43 + temp_1_18;
+  D.8363_7 = (<unnamed-unsigned:32>) D.8245_16;
+  D.8258_44 = WIDEN_MULT_PLUS_EXPR <D.8255_41, D.8363_7, temp_1_18>;
   D.8259_45 = D.8258_44 >> 1;
   D.8260_46 = D.8259_45 >> 24;
   D.8272_57 = D.8251_31 | 1;

That is a widening multiply/accumulate has been added to the tree.  This ultimately becomes a UMLAL in the output.

This widening multiply/accumulate is incorrect.  It is trying to do the following:

result += ((((((arg3[idx] * arg1[idx]) + temp_1)/2))>>24) / (temp_2 | 1));

Where arg3[idx] is a uint8_t, arg1[idx] is a uint32_t and temp_1 is a uint64_t.

As written in C, the result of the multiply is truncated to a 32-bit value, and then added to the 64-bit value.

The widening multiply/accumulate, however, widens the inputs to 64-bits, and does a 64-bit multiply before adding it to the 64-bit accumulator.

These produce a different result when the result of the multiply overflows 32-bits.

A bisect of the source leads me to believe that revision 177907 is responsible: http://gcc.gnu.org/viewcvs?view=revision&revision=177907
Comment 1 Matthew Gretton-Dann 2011-10-13 15:49:15 UTC
Created attachment 25484 [details]
Single function testcase (not executable)
Comment 2 Matthew Gretton-Dann 2011-10-13 15:49:46 UTC
Created attachment 25485 [details]
Correct tree output
Comment 3 Matthew Gretton-Dann 2011-10-13 15:50:48 UTC
Created attachment 25486 [details]
Incorrect tree output
Comment 4 Andrew Stubbs 2011-10-13 16:54:21 UTC
Hmmm, I thought we'd worked out all those kinds of problems.

I'll take a look.
Comment 5 Andrew Stubbs 2011-10-14 15:25:49 UTC
I think I've identified the issue.

Basically, we *want* to recognise cases like these:

  int
  f1 (signed char a, signed char b, int c)
  {
    return (short)(a * b) + c;
  }

  long long
  f2 (short a, short b, long long c)
  {
    return (a * b) + c;
  }

  long long
  f3 (char a, char b, long long c)
  {
    return (a * b) + c;
  }

These have an extend between the multiply and plus operations, and the old implementation couldn't cope with that.

The problem is that I've all caught all the cases where the user wanted (or C standard requires) that the multiply product be truncated.

The solution then is to only convert to widening multiply when we can prove the operation will never overflow.

I'll post a patch soon.
Comment 6 Andrew Stubbs 2011-10-15 21:29:12 UTC
Patch posted here:

http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01397.html
Comment 7 Andrew Stubbs 2011-10-18 19:57:19 UTC
Author: ams
Date: Tue Oct 18 19:57:15 2011
New Revision: 180164

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=180164
Log:
2011-10-18  Andrew Stubbs  <ams@codesourcery.com>

	PR tree-optimization/50717

	gcc/
	* tree-ssa-math-opts.c (is_widening_mult_p): Remove the 'type'
	parameter.  Calculate 'type' from stmt.
	(convert_mult_to_widen): Update call the is_widening_mult_p.
	(convert_plusminus_to_widen): Likewise.

	gcc/testsuite/
	* gcc.dg/pr50717-1.c: New file.
	* gcc.target/arm/wmul-12.c: Correct types.
	* gcc.target/arm/wmul-8.c: Correct types.



Added:
    trunk/gcc/testsuite/gcc.dg/pr50717-1.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/arm/wmul-12.c
    trunk/gcc/testsuite/gcc.target/arm/wmul-8.c
    trunk/gcc/tree-ssa-math-opts.c
Comment 8 Andrew Stubbs 2011-10-19 14:41:19 UTC
The bug is now fixed.