This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug tree-optimization/71414] 2x slower than clang summing small float array, GCC should consider larger vectorization factor for "unrolling" reductions

From: "yyc1992 at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Tue, 07 Jun 2016 14:01:03 +0000
Subject: [Bug tree-optimization/71414] 2x slower than clang summing small float array, GCC should consider larger vectorization factor for "unrolling" reductions
Auto-submitted: auto-generated
References: <bug-71414-4 at http dot gcc dot gnu dot org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71414

--- Comment #7 from Yichao Yu <yyc1992 at gmail dot com> ---
If I add `-fvariable-expansion-in-unroller` (omg this options is like half the
command line ;-p ...), the performance matches the clang one after the clang
3.8 regression.

```
% gcc -funroll-loops -fvariable-expansion-in-unroller -Ofast -march=core-avx2
benchmark.c -o benchmark2 
% ./benchmark2 
45.588861
% ./benchmark-gcc
80.518152
% ./benchmark-clang38
41.920054
% ./benchmark-clang37
25.093145
```

References:
- [Bug other/71414] New: 2x slower than clang summing small float array
  - From: yyc1992 at gmail dot com

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]