#include<immintrin.h> __m256i foo () { return _mm256_set1_epi16 (12); } foo(): movabsq $3377751260921868, %rax vpbroadcastq %rax, %ymm31 vmovdqa64 %ymm31, %ymm0 ret I guess scratch sse register somehow prevent LRA to merge move instructions. Maybe we should add define_peephole2 for those if we still want to use ix86_gen_scratch_sse_rtx.
(In reply to Hongtao.liu from comment #0) > > Maybe we should add define_peephole2 for those if we still want to use > ix86_gen_scratch_sse_rtx. We should add define_peephole2 since don't use ix86_gen_scratch_sse_rtx will cause many test failures.
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>: https://gcc.gnu.org/g:6e5401e87d02919b0594e04f828892deef956407 commit r12-3117-g6e5401e87d02919b0594e04f828892deef956407 Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Aug 23 14:47:03 2021 -0700 x86: Broadcast from integer to a pseudo vector register Broadcast from integer to a pseudo vector register instead of a hard vector register to allow LRA to remove redundant move instruction after broadcast. gcc/ PR target/102021 * config/i386/i386-expand.c (ix86_expand_vector_move): Broadcast from integer to a pseudo vector register. gcc/testsuite/ PR target/102021 * gcc.target/i386/pr100865-10b.c: Expect vzeroupper. * gcc.target/i386/pr100865-4b.c: Likewise. * gcc.target/i386/pr100865-6b.c: Expect vmovdqu and vzeroupper. * gcc.target/i386/pr100865-7b.c: Likewise. * gcc.target/i386/pr102021.c: New test.
Fixed.