This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)

From: "evstupac at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Wed, 18 Jul 2012 09:45:15 +0000
Subject: [Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)
Auto-submitted: auto-generated
References: <bug-53967-4@http.gcc.gnu.org/bugzilla/>

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967

Stupachenko Evgeny <evstupac at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |evstupac at gmail dot com

--- Comment #12 from Stupachenko Evgeny <evstupac at gmail dot com> 2012-07-18 09:45:15 UTC ---
I tried it at "-O2" and got low performance with -mfpmath=sse. It looks like it
is caused by register dependency (%xmm0) between:

addss    %xmm0, %xmm1
cvtsi2ss    %eax, %xmm0

Renaming %xmm0 in cvtsi2ss to another free register in all such cases resolves
the issue. 

Also you can try "-O2 -funroll-loops", which made "sse" code even faster and
and "-O2 -fschedule-insns" which significantly reduced performance loses in
"sse" case.

References:
- [Bug c/53967] New: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)
  - From: bfriesen at simple dot dallas.tx.us

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]