Take the following code and compile with -O2 -m64 -maltivec -mabi=altivec: #define vector __attribute__((vector_size(16))) float f(vector float t) { union { vector float t1; float t2[4]; }t5; t5.t1 = t; return t5.t2[0]; } Currently we get: addi 12,1,-32 stvx 2,0,12 ld 11,0(12) srdi 9,11,32 stw 9,-16(1) lfs 1,-16(1) blr Which is bad for at least the Cell as we now hit two store load hazards. We can remove one if the compile uses the vec_extract patterns causing us to use stvewx and lfs. This does not effect x86 as the union is recorded as BLKmode but x86_64 has the same issue, maybe even worse as there is no need to go through memory there. X86_64 produces: f: .LFB2: movaps %xmm0, -24(%rsp) movss -24(%rsp), %xmm0 ret When really it should produce just a ret.
Note a related testcase is: #define vector __attribute__((vector_size(16))) float f(vector float t) { return *(float*)(&t); } Which is only mentioned here in get_alias_set in the GCC sources as being not violating aliasing rules (see also http://gcc.gnu.org/ml/gcc/2006-07/msg00250.html).
(In reply to comment #1) > Note a related testcase is: I have a patch to fix that related testcase and also fix some other issues related to BIT_FIELD_EXPR expansion with vec_extract (and why vec_set is still unused after my patch too).
(In reply to comment #2) > I have a patch to fix that related testcase and also fix some other issues > related to BIT_FIELD_EXPR expansion with vec_extract (and why vec_set is still > unused after my patch too). And I filed it as PR 28436.
This is not target specific.
comment #1 is fixed, the original problem persists. optimized dump: ;; Function f (f) f (vector(4) float t) { union { vector(4) float t1; float t2[4]; } t5; float D.2687; <bb 2>: t5.t1 = t_1(D); D.2687_2 = t5.t2[0]; return D.2687_2; } FRE does not know how to handle different sizes in its support for type-punning stores/loads (visit_reference_op_load). Mine.
Created attachment 33674 [details] rough patch It needs more work (seems more complicated than it should be), but at least it managed to produce what I expected for: #include <x86intrin.h> union u { __v4sf v; float a[4]; }; float f(__v4sf x){ u t; t.v=x; return t.a[2]; }
Yeah, something along that line would be needed. The issue is that the partial value is not "available" (as in: it doesn't have a SSA name) and thus there is nothing to CSE the load to. I suppose that we should generalize this support for real-/imag-part of complex values as well. Not sure if it is worth using BIT_FIELD_REF for getting at pieces of scalar registers (maybe it would pay off for targets that can efficiently compute lowpart subregs).
Author: rguenth Date: Tue May 10 08:20:43 2016 New Revision: 236066 URL: https://gcc.gnu.org/viewcvs?rev=236066&root=gcc&view=rev Log: 2016-05-10 Richard Biener <rguenther@suse.de> PR tree-optimization/70497 PR tree-optimization/28367 * tree-ssa-sccvn.c (vn_nary_build_or_lookup): New function split out from ... (visit_reference_op_load): ... here. (vn_reference_lookup_3): Use it to handle subreg-like accesses with simplified BIT_FIELD_REFs. * tree-ssa-pre.c (eliminate_insert): Handle inserting BIT_FIELD_REFs. * tree-complex.c (extract_component): Handle BIT_FIELD_REFs correctly. * gcc.dg/torture/20160404-1.c: New testcase. * gcc.dg/tree-ssa/ssa-fre-54.c: Likewise. * gcc.dg/tree-ssa/ssa-fre-55.c: Likewise. Added: trunk/gcc/testsuite/gcc.dg/torture/20160404-1.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-54.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-55.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-complex.c trunk/gcc/tree-ssa-pre.c trunk/gcc/tree-ssa-sccvn.c
Fixed on trunk.