This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/79709] Subobtimal code with -mavx and explicit vector
- From: "glisse at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 24 Feb 2017 21:58:11 +0000
- Subject: [Bug target/79709] Subobtimal code with -mavx and explicit vector
- Auto-submitted: auto-generated
- References: <bug-79709-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79709
Marc Glisse <glisse at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2017-02-24
Ever confirmed|0 |1
--- Comment #4 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to Marc Glisse from comment #2)
> In reload, subregs are extracted via the stack, whereas the low subreg
> should already be available (NOP) and the high one can be extracted by a
> single insn. That's probably the first thing to investigate. (-mtune doesn't
> change what happens)
To concentrate on this, with -O3 -mavx :
typedef long int v4i __attribute__((vector_size (32)));
v4i foo(v4i a, v4i b) { return a+b; }
vmovdqa %ymm0, -80(%rbp)
vmovdqa %ymm1, -112(%rbp)
vmovdqa -80(%rbp), %xmm4
vmovdqa -64(%rbp), %xmm6
vpaddq -112(%rbp), %xmm4, %xmm3
vpaddq -96(%rbp), %xmm6, %xmm5
vmovaps %xmm3, -48(%rbp)
vmovaps %xmm5, -32(%rbp)
vmovdqa -48(%rbp), %ymm0
(plus overhead to align the stack, etc)
compared to clang's
vextractf128 $1, %ymm0, %xmm2
vextractf128 $1, %ymm1, %xmm3
vpaddq %xmm2, %xmm3, %xmm2
vpaddq %xmm0, %xmm1, %xmm0
vinsertf128 $1, %xmm2, %ymm0, %ymm0