Bug 44976 - reductions with short variables do not get vectorized
Summary: reductions with short variables do not get vectorized
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.6.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2010-07-18 05:18 UTC by Roy Rosen
Modified: 2021-02-23 10:14 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work: 10.1.0
Known to fail:
Last reconfirmed: 2010-07-20 14:48:32


Attachments
preprocessed file (144 bytes, text/plain)
2010-07-18 05:22 UTC, Roy Rosen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Roy Rosen 2010-07-18 05:18:14 UTC
short variables do not get vectorized the same as unsigned short variables
Comment 1 Roy Rosen 2010-07-18 05:22:18 UTC
Created attachment 21240 [details]
preprocessed file

For the following code, if ts is short it does not get vectorized.
If ts is unsigned short it does.

#define ts short
ts xx(ts* __restrict__ a)
{
    ts i;
    ts sum = 0;
    for (i = 0; i < 16; i++)
            sum += a[i];
    return sum;
}

{lnxb2} /home/swproj/sw/users/eyalhar/ia64-46/gcc/ > ./xgcc -v -save-temps  -O3 ./a.c -ftree-vectorizer-verbose=2
Using built-in specs.
COLLECT_GCC=./xgcc
Target: ia64-elf-linux
Configured with: ../gcc-4.6-20100710/configure --target=ia64-elf-linux --enable-languages=c
Thread model: posix
gcc version 4.6.0 20100710 (experimental) (GCC)
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O3' '-ftree-vectorizer-verbose=2'
 cc1 -E -quiet -v -iprefix ./../lib/gcc/ia64-elf-linux/4.6.0/ ./a.c -ftree-vectorizer-verbose=2 -O3 -fpch-preprocess -o a.i
ignoring nonexistent directory "./../lib/gcc/ia64-elf-linux/4.6.0/include"
ignoring nonexistent directory "./../lib/gcc/ia64-elf-linux/4.6.0/include-fixed"
ignoring nonexistent directory "./../lib/gcc/ia64-elf-linux/4.6.0/../../../../ia64-elf-linux/sys-include"
ignoring nonexistent directory "./../lib/gcc/ia64-elf-linux/4.6.0/../../../../ia64-elf-linux/include"
ignoring nonexistent directory "./../lib/gcc/../../lib/gcc/ia64-elf-linux/4.6.0/include"
ignoring nonexistent directory "./../lib/gcc/../../lib/gcc/ia64-elf-linux/4.6.0/include-fixed"
ignoring nonexistent directory "./../lib/gcc/../../lib/gcc/ia64-elf-linux/4.6.0/../../../../ia64-elf-linux/sys-include"
ignoring nonexistent directory "./../lib/gcc/../../lib/gcc/ia64-elf-linux/4.6.0/../../../../ia64-elf-linux/include"
#include "..." search starts here:
#include <...> search starts here:
End of search list.
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O3' '-ftree-vectorizer-verbose=2'
 cc1 -fpreprocessed a.i -quiet -dumpbase a.c -auxbase a -O3 -version -ftree-vectorizer-verbose=2 -o a.s
GNU C (GCC) version 4.6.0 20100710 (experimental) (ia64-elf-linux)
        compiled by GNU C version 4.1.2 20080704 (Red Hat 4.1.2-44), GMP version 4.3.2, MPFR version 2.4.2, MPC version 0.8.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
GNU C (GCC) version 4.6.0 20100710 (experimental) (ia64-elf-linux)
        compiled by GNU C version 4.1.2 20080704 (Red Hat 4.1.2-44), GMP version 4.3.2, MPFR version 2.4.2, MPC version 0.8.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: 151e5af8467c30bb3072214d4253f912

./a.c:79: note: not vectorized: unsupported use in stmt.
./a.c:75: note: vectorized 0 loops in function.
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O3' '-ftree-vectorizer-verbose=2'
 as -x -o a.o a.s
as: unrecognized option `-x'
Comment 2 Richard Biener 2010-07-20 14:48:32 UTC
t.c:6: note: reduction: not commutative/associative: sum_13 = (short int) D.2726_12;

this is because

  sum = (short)((int)sum + (int)a[i]);

cannot be folded to

  sum = sum + a[i];

as that exposes new undefined overflow.  Instead we fold it to

  sum = (short)((unsigned short)sum + (unsigned short)a[i]);

which the pattern detection does not handle explicitly.
Comment 3 Jorn Wolfgang Rennecke 2018-11-22 20:34:09 UTC
Ironically, this is a case where -fwrapv improves optimization.
Comment 4 Richard Biener 2021-02-23 10:14:30 UTC
Fixed in GCC 10.