This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)

From: "bfriesen at simple dot dallas.tx.us" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Wed, 18 Jul 2012 14:28:04 +0000
Subject: [Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)
Auto-submitted: auto-generated
References: <bug-53967-4@http.gcc.gnu.org/bugzilla/>

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967

--- Comment #14 from bfriesen at simple dot dallas.tx.us 2012-07-18 14:28:04 UTC ---
With

-m64 -mtune=generic -march=x86-64 -mfpmath=sse -O2 -funroll-loops
-fschedule-insns

I see a whole-program performance jump from 0.047 iter/s to 0.156 iter/s (331%
boost).  That is huge!  Given the fundamental properties of this algorithm (the
image processing algorithm most often recommended to be moved to a GPU) the
world would be a better place if this performance was the normal case.

With

-m64 -mtune=generic -march=x86-64 -mfpmath=sse -O2 -fschedule-insns

I see 0.101 iter/s

These must not be included in -O3 since

-m64 -mtune=generic -march=x86-64 -mfpmath=sse -O3

produces only 0.048 iter/s

References:
- [Bug c/53967] New: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)
  - From: bfriesen at simple dot dallas.tx.us

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]