Created attachment 25483 [details] Executable test case. The attached test case fails when compiled and executed as follows: arm-none-eabi-gcc -O2 gen_exec.c -o gen_exec.axf -fno-expensive-optimizations .../linaro-qemu/0.15.50/bin/qemu-arm ./gen_exec.axf The two functions in the test case f0a and f0b are identical, just compiled with -fexpensive-optimizations off (for f0a) and on (for f0b). The code generation differences produce an incorrect result. The attached file gen_exec_simple.c contains the individual f0b function for compilation. The attached tree dumps show the first difference between compiling gen_exec_simple.c with and without -fexpensive-optimizations. The main difference seems to be the following: --- gen_exec_simple.c.135t.tailc.cheap 2011-10-13 15:02:50.000000000 +0100 +++ gen_exec_simple.c.135t.tailc.expensive 2011-10-13 15:03:15.000000000 +0100 @@ -3,6 +3,7 @@ f0b (uint32_t * restrict arg1, uint64_t * restrict arg2, uint8_t * restrict arg3) { + <unnamed-unsigned:32> D.8363; void * D.8362; sizetype D.8361; void * D.8360; @@ -67,7 +68,8 @@ f0b (uint32_t * restrict arg1, uint64_t D.8255_41 = MEM[base: D.8362_127, offset: 0B]; D.8256_42 = D.8252_36 * D.8255_41; D.8257_43 = (uint64_t) D.8256_42; - D.8258_44 = D.8257_43 + temp_1_18; + D.8363_7 = (<unnamed-unsigned:32>) D.8245_16; + D.8258_44 = WIDEN_MULT_PLUS_EXPR <D.8255_41, D.8363_7, temp_1_18>; D.8259_45 = D.8258_44 >> 1; D.8260_46 = D.8259_45 >> 24; D.8272_57 = D.8251_31 | 1; That is a widening multiply/accumulate has been added to the tree. This ultimately becomes a UMLAL in the output. This widening multiply/accumulate is incorrect. It is trying to do the following: result += ((((((arg3[idx] * arg1[idx]) + temp_1)/2))>>24) / (temp_2 | 1)); Where arg3[idx] is a uint8_t, arg1[idx] is a uint32_t and temp_1 is a uint64_t. As written in C, the result of the multiply is truncated to a 32-bit value, and then added to the 64-bit value. The widening multiply/accumulate, however, widens the inputs to 64-bits, and does a 64-bit multiply before adding it to the 64-bit accumulator. These produce a different result when the result of the multiply overflows 32-bits. A bisect of the source leads me to believe that revision 177907 is responsible: http://gcc.gnu.org/viewcvs?view=revision&revision=177907
Created attachment 25484 [details] Single function testcase (not executable)
Created attachment 25485 [details] Correct tree output
Created attachment 25486 [details] Incorrect tree output
Hmmm, I thought we'd worked out all those kinds of problems. I'll take a look.
I think I've identified the issue. Basically, we *want* to recognise cases like these: int f1 (signed char a, signed char b, int c) { return (short)(a * b) + c; } long long f2 (short a, short b, long long c) { return (a * b) + c; } long long f3 (char a, char b, long long c) { return (a * b) + c; } These have an extend between the multiply and plus operations, and the old implementation couldn't cope with that. The problem is that I've all caught all the cases where the user wanted (or C standard requires) that the multiply product be truncated. The solution then is to only convert to widening multiply when we can prove the operation will never overflow. I'll post a patch soon.
Patch posted here: http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01397.html
Author: ams Date: Tue Oct 18 19:57:15 2011 New Revision: 180164 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=180164 Log: 2011-10-18 Andrew Stubbs <ams@codesourcery.com> PR tree-optimization/50717 gcc/ * tree-ssa-math-opts.c (is_widening_mult_p): Remove the 'type' parameter. Calculate 'type' from stmt. (convert_mult_to_widen): Update call the is_widening_mult_p. (convert_plusminus_to_widen): Likewise. gcc/testsuite/ * gcc.dg/pr50717-1.c: New file. * gcc.target/arm/wmul-12.c: Correct types. * gcc.target/arm/wmul-8.c: Correct types. Added: trunk/gcc/testsuite/gcc.dg/pr50717-1.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/arm/wmul-12.c trunk/gcc/testsuite/gcc.target/arm/wmul-8.c trunk/gcc/tree-ssa-math-opts.c
The bug is now fixed.