Bug 102021 - Redudant mov instruction for broadcast.
Summary: Redudant mov instruction for broadcast.
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 12.0
: P3 normal
Target Milestone: 12.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2021-08-23 09:22 UTC by Hongtao.liu
Modified: 2022-02-03 09:07 UTC (History)
1 user (show)

See Also:
Host: x86_64-pc-linux-gnu
Target: x86_64-*-* i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-08-23 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Hongtao.liu 2021-08-23 09:22:47 UTC
#include<immintrin.h>

__m256i
foo ()
{
  return _mm256_set1_epi16 (12);
}


foo():
        movabsq $3377751260921868, %rax
        vpbroadcastq    %rax, %ymm31
        vmovdqa64       %ymm31, %ymm0
        ret

I guess scratch sse register somehow prevent LRA to merge move instructions.

Maybe we should add define_peephole2 for those if we still want to use ix86_gen_scratch_sse_rtx.
Comment 1 H.J. Lu 2021-08-23 12:44:39 UTC
(In reply to Hongtao.liu from comment #0)
> 
> Maybe we should add define_peephole2 for those if we still want to use
> ix86_gen_scratch_sse_rtx.

We should add define_peephole2 since don't use ix86_gen_scratch_sse_rtx will
cause many test failures.
Comment 2 GCC Commits 2021-08-24 12:56:48 UTC
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>:

https://gcc.gnu.org/g:6e5401e87d02919b0594e04f828892deef956407

commit r12-3117-g6e5401e87d02919b0594e04f828892deef956407
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Mon Aug 23 14:47:03 2021 -0700

    x86: Broadcast from integer to a pseudo vector register
    
    Broadcast from integer to a pseudo vector register instead of a hard
    vector register to allow LRA to remove redundant move instruction after
    broadcast.
    
    gcc/
    
            PR target/102021
            * config/i386/i386-expand.c (ix86_expand_vector_move): Broadcast
            from integer to a pseudo vector register.
    
    gcc/testsuite/
    
            PR target/102021
            * gcc.target/i386/pr100865-10b.c: Expect vzeroupper.
            * gcc.target/i386/pr100865-4b.c: Likewise.
            * gcc.target/i386/pr100865-6b.c: Expect vmovdqu and vzeroupper.
            * gcc.target/i386/pr100865-7b.c: Likewise.
            * gcc.target/i386/pr102021.c: New test.
Comment 3 H.J. Lu 2021-08-24 13:03:17 UTC
Fixed.