This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/79709] Subobtimal code with -mavx and explicit vector


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79709

Marc Glisse <glisse at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2017-02-24
     Ever confirmed|0                           |1

--- Comment #4 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to Marc Glisse from comment #2)
> In reload, subregs are extracted via the stack, whereas the low subreg
> should already be available (NOP) and the high one can be extracted by a
> single insn. That's probably the first thing to investigate. (-mtune doesn't
> change what happens)

To concentrate on this, with -O3 -mavx :
typedef long int v4i __attribute__((vector_size (32)));
v4i foo(v4i a, v4i b) { return a+b; }

        vmovdqa %ymm0, -80(%rbp)
        vmovdqa %ymm1, -112(%rbp)
        vmovdqa -80(%rbp), %xmm4
        vmovdqa -64(%rbp), %xmm6
        vpaddq  -112(%rbp), %xmm4, %xmm3
        vpaddq  -96(%rbp), %xmm6, %xmm5
        vmovaps %xmm3, -48(%rbp)
        vmovaps %xmm5, -32(%rbp)
        vmovdqa -48(%rbp), %ymm0
(plus overhead to align the stack, etc)

compared to clang's

        vextractf128    $1, %ymm0, %xmm2
        vextractf128    $1, %ymm1, %xmm3
        vpaddq  %xmm2, %xmm3, %xmm2
        vpaddq  %xmm0, %xmm1, %xmm0
        vinsertf128     $1, %xmm2, %ymm0, %ymm0

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]