#include <immintrin.h> __m512i foo (__m512i a, short b) { return _mm512_srlv_epi16 (a, _mm512_set1_epi16 (3)); } llvm generate vpsrlw zmm0, zmm0, 3 but gcc generate foo(long long __vector(8), short): movl $3, %eax vpbroadcastw %eax, %zmm31 vpsrlvw %zmm31, %zmm0, %zmm0 ret
There's 2 issues. 1. simplify_rtx should be able to simplify it. 2. x86 backend use ix86_gen_scratch_sse_rtx (mode) which prevent simplication.
> 2. x86 backend use ix86_gen_scratch_sse_rtx (mode) which prevent > simplication. Correct: ix86_gen_scratch_sse_rtx doesn't prevent optimization here.
Combine is able to do the combine but it fails as it does not match: Trying 10, 9 -> 14: 10: r92:HI=0x3 9: r91:V32HI=vec_duplicate(r92:HI) REG_DEAD r92:HI REG_EQUAL const_vector 14: r88:V32HI=r96:V8DI#0 0>>r91:V32HI REG_DEAD r96:V8DI REG_DEAD r91:V32HI Failed to match this instruction: (set (reg:V32HI 88) (lshiftrt:V32HI (subreg:V32HI (reg:V8DI 96) 0) (const_vector:V32HI [ (const_int 3 [0x3]) repeated x32 ]))) This instruction does not have alt for the dup/const_vect case I think: (define_insn "<avx2_avx512>_<insn>v<mode><mask_name>" [(set (match_operand:VI2_AVX512VL 0 "register_operand" "=v") (any_lshift:VI2_AVX512VL (match_operand:VI2_AVX512VL 1 "register_operand" "v") (match_operand:VI2_AVX512VL 2 "nonimmediate_operand" "vm")))] "TARGET_AVX512BW" "vp<vshift>v<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}" [(set_attr "type" "sseishft") (set_attr "prefix" "maybe_evex") (set_attr "mode" "<sseinsnmode>")]) Note I don't think simplify-rtx will change const_vector to just 3 though.
(In reply to Andrew Pinski from comment #3) > Combine is able to do the combine but it fails as it does not match: > Trying 10, 9 -> 14: > 10: r92:HI=0x3 > 9: r91:V32HI=vec_duplicate(r92:HI) > REG_DEAD r92:HI > REG_EQUAL const_vector > 14: r88:V32HI=r96:V8DI#0 0>>r91:V32HI > REG_DEAD r96:V8DI > REG_DEAD r91:V32HI > Failed to match this instruction: > (set (reg:V32HI 88) > (lshiftrt:V32HI (subreg:V32HI (reg:V8DI 96) 0) > (const_vector:V32HI [ > (const_int 3 [0x3]) repeated x32 > ]))) > > This instruction does not have alt for the dup/const_vect case I think: > (define_insn "<avx2_avx512>_<insn>v<mode><mask_name>" > [(set (match_operand:VI2_AVX512VL 0 "register_operand" "=v") > (any_lshift:VI2_AVX512VL > (match_operand:VI2_AVX512VL 1 "register_operand" "v") > (match_operand:VI2_AVX512VL 2 "nonimmediate_operand" "vm")))] > "TARGET_AVX512BW" > "vp<vshift>v<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, > %1, %2}" > [(set_attr "type" "sseishft") > (set_attr "prefix" "maybe_evex") > (set_attr "mode" "<sseinsnmode>")]) > > Note I don't think simplify-rtx will change const_vector to just 3 though. Yes, it's rejected at https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576837.html
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>: https://gcc.gnu.org/g:8f9fea41a767f6ad8cf3d521031048a2491f98b1 commit r12-5990-g8f9fea41a767f6ad8cf3d521031048a2491f98b1 Author: Haochen Jiang <haochen.jiang@intel.com> Date: Wed Dec 1 16:48:28 2021 +0800 Add combine splitter to transform vashr/vlshr/vashl_optab to ashr/lshr/ashl_optab for const vector duplicate operand. gcc/ChangeLog: PR target/101796 * config/i386/predicates.md (const_vector_operand): Add new predicate. * config/i386/sse.md(<insn><mode>3<mask_name>): Add new define_split below. gcc/testsuite/ChangeLog: PR target/101796 * gcc.target/i386/pr101796-1.c: New test.
Fixed in GCC12.