This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/71414] 2x slower than clang summing small float array, GCC should consider larger vectorization factor for "unrolling" reductions


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71414

--- Comment #7 from Yichao Yu <yyc1992 at gmail dot com> ---
If I add `-fvariable-expansion-in-unroller` (omg this options is like half the
command line ;-p ...), the performance matches the clang one after the clang
3.8 regression.

```
% gcc -funroll-loops -fvariable-expansion-in-unroller -Ofast -march=core-avx2
benchmark.c -o benchmark2 
% ./benchmark2 
45.588861
% ./benchmark-gcc
80.518152
% ./benchmark-clang38
41.920054
% ./benchmark-clang37
25.093145
```

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]