This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/47825] SSE bitwise operations on floats work -g, fail -O3
- From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Mon, 21 Feb 2011 10:41:13 +0000
- Subject: [Bug target/47825] SSE bitwise operations on floats work -g, fail -O3
- Auto-submitted: auto-generated
- References: <bug-47825-4@http.gcc.gnu.org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47825
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target| |x86_64-*-*
Status|RESOLVED |REOPENED
Last reconfirmed| |2011.02.21 10:40:58
CC| |hjl at gcc dot gnu.org
Component|c |target
Resolution|INVALID |
Ever Confirmed|0 |1
--- Comment #5 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-02-21 10:40:58 UTC ---
The issue is that maskarray is initialized as array of ints but then you
load it as array of floats, the scheduler re-orders those so you see
a load from uninitialized stack:
test_setelement:
.LFB546:
subq $40, %rsp
.LCFI0:
movaps 16(%rsp), %xmm0
movl $0, 16(%rsp)
movl $0, 20(%rsp)
movaps %xmm0, %xmm1
andnps .LC1(%rip), %xmm0
movl $0, 24(%rsp)
movl $-1, 28(%rsp)
...
what is a bit inconsistent is that for mm_load_pd we use a type that
allows aliasing:
/* The Intel API is flexible enough that we must allow aliasing with other
vector types, and their scalar components. */
typedef long long __m128i __attribute__ ((__vector_size__ (16),
__may_alias__));
typedef double __m128d __attribute__ ((__vector_size__ (16), __may_alias__));
...
/* Load two DPFP values from P. The address must be 16-byte aligned. */
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__,
__artificial__))
_mm_load_pd (double const *__P)
{
return *(__m128d *)__P;
}
but for mm_load_ps we don't:
/* Internal data types for implementing the intrinsics. */
typedef float __v4sf __attribute__ ((__vector_size__ (16)));
...
/* Load four SPFP values from P. The address must be 16-byte aligned. */
extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__,
__artificial__))
_mm_load_ps (float const *__P)
{
return (__m128) *(__v4sf *)__P;
}
re-opening to investigate that. HJ, are the SSE1 intrinsics not
aliasing in the Intel API? The above snippets are from trunk.