This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/25621] Missed optimisation
- From: "jv244 at cam dot ac dot uk" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 1 Jan 2006 18:14:05 -0000
- Subject: [Bug tree-optimization/25621] Missed optimisation
- References: <bug-25621-6642@http.gcc.gnu.org/bugzilla/>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Comment #2 from jv244 at cam dot ac dot uk 2006-01-01 18:14 -------
(In reply to comment #1)
> What happens if you use -funroll-loops? It should get about the same
> improvement.
I have the following timings (for N=1024, calling these subroutines a number of
times+some external initialisation)
-O2 -ffast-math -funroll-loops
S31 S32
0.0229959786 0.0119980276
-O2 -ffast-math
0.0229960084 0.0119979978
I think the issue is not pure unrolling but the fact that you have two
independent sums in the loop
In fact, I now find that
-O2 -ffast-math -funroll-loops -ftree-loop-ivcanon -fivopts
-fvariable-expansion-in-unroller
yields much improved code:
0.0119979978 0.0079990029
The last option indeed seems to do what I did by hand, still the routine S32
seems about 30% faster.
> Also your two loops not equal if N is old.
I've added at least the comment ;-)
! assume N is even
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25621