[Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)
bfriesen at simple dot dallas.tx.us
gcc-bugzilla@gcc.gnu.org
Wed Jul 18 14:28:00 GMT 2012
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967
--- Comment #14 from bfriesen at simple dot dallas.tx.us 2012-07-18 14:28:04 UTC ---
With
-m64 -mtune=generic -march=x86-64 -mfpmath=sse -O2 -funroll-loops
-fschedule-insns
I see a whole-program performance jump from 0.047 iter/s to 0.156 iter/s (331%
boost). That is huge! Given the fundamental properties of this algorithm (the
image processing algorithm most often recommended to be moved to a GPU) the
world would be a better place if this performance was the normal case.
With
-m64 -mtune=generic -march=x86-64 -mfpmath=sse -O2 -fschedule-insns
I see 0.101 iter/s
These must not be included in -O3 since
-m64 -mtune=generic -march=x86-64 -mfpmath=sse -O3
produces only 0.048 iter/s
More information about the Gcc-bugs
mailing list