Bug 28367 - accessing via union on a vector does not cause vec_extract to be used
Summary: accessing via union on a vector does not cause vec_extract to be used
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.2.0
: P3 enhancement
Target Milestone: 7.0
Assignee: Richard Biener
URL:
Keywords: missed-optimization
Depends on:
Blocks: 29756
  Show dependency treegraph
 
Reported: 2006-07-13 02:43 UTC by Andrew Pinski
Modified: 2023-05-15 06:07 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2014-11-22 00:00:00


Attachments
rough patch (2.16 KB, patch)
2014-10-10 00:25 UTC, Marc Glisse
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Pinski 2006-07-13 02:43:47 UTC
Take the following code and compile with -O2 -m64 -maltivec -mabi=altivec:
#define vector __attribute__((vector_size(16)))
float f(vector float t)
{
  union {
    vector float t1;
    float t2[4];
  }t5;
  t5.t1 = t;
  return t5.t2[0];
}

Currently we get:
        addi 12,1,-32
        stvx 2,0,12
        ld 11,0(12)
        srdi 9,11,32
        stw 9,-16(1)
        lfs 1,-16(1)
        blr

Which is bad for at least the Cell as we now hit two store load hazards.  We can remove one if the compile uses the vec_extract patterns causing us to use stvewx and lfs.

This does not effect x86 as the union is recorded as BLKmode but x86_64 has the same issue, maybe even worse as there is no need to go through memory there.
X86_64 produces:
f:
.LFB2:
        movaps  %xmm0, -24(%rsp)
        movss   -24(%rsp), %xmm0
        ret
When really it should produce just a ret.
Comment 1 Andrew Pinski 2006-07-13 05:37:13 UTC
Note a related testcase is:
#define vector __attribute__((vector_size(16)))
float f(vector float t)
{
  return *(float*)(&t);
}

Which is only mentioned here in get_alias_set in the GCC sources as being not violating aliasing rules (see also http://gcc.gnu.org/ml/gcc/2006-07/msg00250.html).

Comment 2 Andrew Pinski 2006-07-14 07:58:22 UTC
(In reply to comment #1)
> Note a related testcase is:

I have a patch to fix that related testcase and also fix some other issues related to BIT_FIELD_EXPR expansion with vec_extract (and why vec_set is still unused after my patch too).
Comment 3 Andrew Pinski 2006-07-19 14:46:30 UTC
(In reply to comment #2)
> I have a patch to fix that related testcase and also fix some other issues
> related to BIT_FIELD_EXPR expansion with vec_extract (and why vec_set is still
> unused after my patch too).

And I filed it as PR 28436.
Comment 4 Andrew Pinski 2006-08-26 04:48:03 UTC
This is not target specific.
Comment 5 Richard Biener 2011-03-16 11:41:03 UTC
comment #1 is fixed, the original problem persists.  optimized dump:

;; Function f (f)

f (vector(4) float t)
{
  union
  {
    vector(4) float t1;
    float t2[4];
  } t5;
  float D.2687;

<bb 2>:
  t5.t1 = t_1(D);
  D.2687_2 = t5.t2[0];
  return D.2687_2;

}

FRE does not know how to handle different sizes in its support for
type-punning stores/loads (visit_reference_op_load).

Mine.
Comment 6 Marc Glisse 2014-10-10 00:25:04 UTC
Created attachment 33674 [details]
rough patch

It needs more work (seems more complicated than it should be), but at least it managed to produce what I expected for:

#include <x86intrin.h>
union u { __v4sf v; float a[4]; };

float f(__v4sf x){
  u t;
  t.v=x;
  return t.a[2];
}
Comment 7 Richard Biener 2014-11-24 09:03:20 UTC
Yeah, something along that line would be needed.  The issue is that
the partial value is not "available" (as in: it doesn't have a SSA name)
and thus there is nothing to CSE the load to.

I suppose that we should generalize this support for real-/imag-part of
complex values as well.  Not sure if it is worth using BIT_FIELD_REF
for getting at pieces of scalar registers (maybe it would pay off for
targets that can efficiently compute lowpart subregs).
Comment 8 Richard Biener 2016-05-10 08:21:16 UTC
Author: rguenth
Date: Tue May 10 08:20:43 2016
New Revision: 236066

URL: https://gcc.gnu.org/viewcvs?rev=236066&root=gcc&view=rev
Log:
2016-05-10  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/70497
	PR tree-optimization/28367
	* tree-ssa-sccvn.c (vn_nary_build_or_lookup): New function
	split out from ...
	(visit_reference_op_load): ... here.
	(vn_reference_lookup_3): Use it to handle subreg-like accesses
	with simplified BIT_FIELD_REFs.
	* tree-ssa-pre.c (eliminate_insert): Handle inserting BIT_FIELD_REFs.
	* tree-complex.c (extract_component): Handle BIT_FIELD_REFs
	correctly.

	* gcc.dg/torture/20160404-1.c: New testcase.
	* gcc.dg/tree-ssa/ssa-fre-54.c: Likewise.
	* gcc.dg/tree-ssa/ssa-fre-55.c: Likewise.

Added:
    trunk/gcc/testsuite/gcc.dg/torture/20160404-1.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-54.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-55.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-complex.c
    trunk/gcc/tree-ssa-pre.c
    trunk/gcc/tree-ssa-sccvn.c
Comment 9 Richard Biener 2016-05-10 08:21:25 UTC
Fixed on trunk.