This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/54349] _mm_cvtsi128_si64 unnecessary stores value at stack
- From: "neleai at seznam dot cz" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 23 Aug 2012 15:45:54 +0000
- Subject: [Bug target/54349] _mm_cvtsi128_si64 unnecessary stores value at stack
- Auto-submitted: auto-generated
- References: <bug-54349-4@http.gcc.gnu.org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54349
Ondrej Bilka <neleai at seznam dot cz> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |UNCONFIRMED
Resolution|INVALID |
--- Comment #2 from Ondrej Bilka <neleai at seznam dot cz> 2012-08-23 15:45:54 UTC ---
(In reply to comment #1)
> Not a bug. You need to tune for a CPU where inter-unit moves are desirable.
> The default is generic tuning, which is a compromise between Intel CPUs (where
> they are desirable) and AMD CPUs (where they are undesirable). In this
> particular case the generic tuning doesn't do inter-unit moves as part of the
> compromise. If you -mtune=corei7 or similar, you'll get an inter-unit move in
> both cases.
What amd procesors?
Compile following two files with march=core2 and march=amdfam10. Amd version
was always at least 5% slower.
Tested on AMD Athlon(tm) 64 Processor 3200+,AMD Opteron(tm) Processor 6134
AMD FX(tm)-8150 Eight-Core Processor, AMD Phenom(tm) II X6 1090T Processor
#include <emmintrin.h>
#include <stdint.h>
int64_t foo(int64_t a,int64_t c){__m128i b=
_mm_cvtsi64_si128(a),d=_mm_cvtsi64_si128(c);
return _mm_cvtsi128_si64(_mm_add_epi8(b,d));
}
/*need split otherwise simplified to identical code*/
#include <emmintrin.h>
#include <stdint.h>
int main(){
int i;
int64_t x=0;
for (i=0;i<100000000;i++) x=foo(x,1);
return x;
}