Following code should produce a cvtps2pd and cvtpi2pd instructions that operate on vectors: void test_fp (float *a, double *b) { int i; for (i = 0; i < 4; i++) b[i] = (double) a[i]; } void test_int (int *a, double *b) { int i; for (i = 0; i < 4; i++) b[i] = (double) a[i]; } Currently, gcc produces scalar instructions (gcc -O2 -march=pentium4 -mfpmath=sse -ftree-vectorize): .L2: movss -4(%ecx,%eax,4), %xmm0 cvtss2sd %xmm0, %xmm0 movsd %xmm0, -8(%edx,%eax,8) addl $1, %eax cmpl $5, %eax jne .L2 and .L9: cvtsi2sd -4(%ecx,%eax,4), %xmm0 movsd %xmm0, -8(%edx,%eax,8) addl $1, %eax cmpl $5, %eax jne .L9 (BTW: There is also one movss too many in the first example.)
Confirmed.
vectorization of type conversions has recently been added to autovect-branch. It requires modeling the respective unpack and pack optabs in the machine description.
Created attachment 12862 [details] vectorized assembly output from ICC v9.1 Generated from the "indefinite loop" variant of the testcase on OS X 10.4.7, using ICC v9.1: % icc -O2 -S 24659.c
I ran the testcase through ICC, and it unrolled the loops without vectorizing them. However, making the loops indefinite gets us the desired, vectorized result. Here is the modified, indefinite loop version of the testcase: void test_fp (float *a, double *b, int count) { int i; for (i = 0; i < count; i++) b[i] = (double) a[i]; } void test_int (int *a, double *b, int count) { int i; for (i = 0; i < count; i++) b[i] = (double) a[i]; } (Note to Apple: this is Radar 4079267)
SPU has vectorizable conversion also.
(In reply to comment #2) > vectorization of type conversions has recently been added to autovect-branch. > It requires modeling the respective unpack and pack optabs in the machine > description. Hm, there is no infrastructure for int<->float conversions. vectorizable_operation() calls optab_for_tree_code() with "code" argument set to FLOAT_EXPR and FIX_TRUNC_EXPR and these always return NULL. In DFmode<->SFmode case, vectorizable_operation() calls optab_for_tree_code() with VEC_PACK_MOD_EXPR and VEC_UNPACK_LO/HI_EXPR. At least the later case is intended for integer modes, as optab_for_tree_code() checks for TYPE_UNSIGNED on "type" argument.
Right, the vectorizer currently supports conversions only between integral types. Support for type conversions that involve also floating-point types are in the works.
Patch for double<->float conversions at http://gcc.gnu.org/ml/gcc-patches/2007-04/msg01346.html.
Subject: Bug 24659 Author: uros Date: Sun Apr 22 19:45:06 2007 New Revision: 124045 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=124045 Log: 2007-04-22 Uros Bizjak <ubizjak@gmail.com> PR tree-optimization/24659 * optabs.h (enum optab_index) [OTI_vec_unpacks_hi, OTI_vec_unpacks_lo]: Update comment to mention floating point operands. (vec_pack_trunc_optab): Rename from vec_pack_mod_optab. * genopinit.c (optabs): Rename vec_pack_mod_optab to vec_pack_trunc_optab. * tree-vect-transform.c (vectorizable_type_demotion): Do not fail early for scalar floating point operands for NOP_EXPR. (vectorizable_type_promotion): Ditto. * optabs.c (optab_for_tree_code) [VEC_PACK_TRUNC_EXPR]: Return vec_pack_trunc_optab. (expand_binop): Rename vec_float_trunc_optab to vec_pack_mod_optab. * tree.def (VEC_PACK_TRUNC_EXPR): Rename from VEC_PACK_MOD_EXPR. * tree-pretty-print.c (dump_generic_node) [VEC_PACK_TRUNC_EXPR]: Rename from VEC_PACK_MOD_EXPR. (op_prio) [VEC_PACK_TRUNC_EXPR]: Ditto. * expr.c (expand_expr_real_1): Ditto. * tree-inline.c (estimate_num_insns_1): Ditto. * tree-vect-generic.c (expand_vector_operations_1): Ditto. * config/i386/sse.md (vec_unpacks_hi_v4sf): New expander. (vec_unpacks_lo_v4sf): Ditto. (vec_pack_trunc_v2df): Ditto. (vec_pack_trunc_v8hi): Rename from vec_pack_mod_v8hi. (vec_pack_trunc_v4si): Rename from vec_pack_mod_v4si. (vec_pack_trunc_v2di): Rename from vec_pack_mod_v2di. * config/rs6000/altivec.md (vec_pack_trunc_v8hi): Rename from vec_pack_mod_v8hi. (vec_pack_trunc_v4si): Rename from vec_pack_mod_v4si. * doc/c-tree.texi (Expression trees) [VEC_PACK_TRUNC_EXPR]: Rename from VEC_PACK_MOD_EXPR. This expression also represent packing of floating point operands. [VEC_UNPACK_HI_EXPR, VEC_UNPACK_LO_EXPR]: These expression also represent unpacking of floating point operands. * doc/md.texi (Standard Names) [vec_pack_trunc]: Update documentation. [vec_unpacks_hi]: Ditto. [vec_unpacks_lo]: Ditto. testsuite/ChangeLog: 2007-04-22 Uros Bizjak <ubizjak@gmail.com> PR tree-optimization/24659 * gcc.dg/vect/vect-float-extend-1.c: New test. * gcc.dg/vect/vect-float-truncate-1.c: New test. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md trunk/gcc/config/rs6000/altivec.md trunk/gcc/doc/c-tree.texi trunk/gcc/doc/md.texi trunk/gcc/expr.c trunk/gcc/genopinit.c trunk/gcc/optabs.c trunk/gcc/optabs.h trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-inline.c trunk/gcc/tree-pretty-print.c trunk/gcc/tree-vect-generic.c trunk/gcc/tree-vect-transform.c trunk/gcc/tree-vectorizer.c trunk/gcc/tree.def
float->double and double->float conversions are new vectorized. For a slightly different test: --cut here-- void test_fp (float *a, double *b) { int i; for (i = 0; i < 4; i++) b[i] = (double) a[i]; } void test_int (int *a, double *b) { int i; for (i = 0; i < 4; i++) b[i] = (double) a[i]; } --cut here-- we generate following loops: test_fd: .L2: movaps a(%rax), %xmm0 movhlps %xmm0, %xmm2 cvtps2pd %xmm0, %xmm1 movapd %xmm1, c(%rax,%rax) cvtps2pd %xmm2, %xmm0 movapd %xmm0, c+16(%rax,%rax) addq $16, %rax cmpq $64, %rax jne .L2 test_df: .L8: cvtpd2ps c(%rax,%rax), %xmm0 cvtpd2ps c+16(%rax,%rax), %xmm1 movlhps %xmm1, %xmm0 movaps %xmm0, a(%rax) addq $16, %rax cmpq $64, %rax jne .L8 test_int (no vectorization): .L13: cvtsi2sd b(,%rax,4), %xmm0 movsd %xmm0, c(,%rax,8) addq $1, %rax cmpq $16, %rax jne .L13 Note, that we still don't vectorize double->int and int->double conversions.
(In reply to comment #10) > float->double and double->float conversions are new vectorized. For a slightly > different test: The test is actually: --cut here-- float a[16]; int b[16]; double c[16]; void test_fd (void) { int i; for (i = 0; i < 16; i++) c[i] = (double) a[i]; } void test_df (void) { int i; for (i = 0; i < 16; i++) a[i] = (float) c[i]; } void test_int (void) { int i; for (i = 0; i < 16; i++) c[i] = (double) b[i]; } --cut here--
Patch for several other missing conversions: http://gcc.gnu.org/ml/gcc-patches/2007-05/msg00966.html
Subject: Bug 24659 Author: uros Date: Thu May 17 06:31:05 2007 New Revision: 124784 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=124784 Log: PR tree-optimization/24659 * optabs.h (enum optab_index): Add OTI_vec_unpacks_float_hi, OTI_vec_unpacks_float_lo, OTI_vec_unpacku_float_hi, OTI_vec_unpacku_float_lo, OTI_vec_pack_sfix_trunc and OTI_vec_pack_ufix_trunc. (vec_unpacks_float_hi_optab): Define new macro. (vec_unpacks_float_lo_optab): Ditto. (vec_unpacku_float_hi_optab): Ditto. (vec_unpacku_float_lo_optab): Ditto. (vec_pack_sfix_trunc_optab): Ditto. (vec_pack_ufix_trunc_optab): Ditto. * genopinit.c (optabs): Implement vec_unpack[s|u]_[hi|lo]_optab and vec_pack_[s|u]fix_trunc_optab using vec_unpack[s|u]_[hi\lo]_* and vec_pack_[u|s]fix_trunc_* patterns * tree-vectorizer.c (supportable_widening_operation): Handle FLOAT_EXPR and CONVERT_EXPR. Update comment. (supportable_narrowing_operation): New function. * tree-vectorizer.h (supportable_narrowing_operation): Prototype. * tree-vect-transform.c (vectorizable_conversion): Handle (nunits_in == nunits_out / 2) and (nunits_out == nunits_in / 2) cases. (vect_gen_widened_results_half): Move before vectorizable_conversion. (vectorizable_type_demotion): Call supportable_narrowing_operation() to check for target support. * optabs.c (optab_for_tree_code) Return vec_unpack[s|u]_float_hi_optab for VEC_UNPACK_FLOAT_HI_EXPR, vec_unpack[s|u]_float_lo_optab for VEC_UNPACK_FLOAT_LO_EXPR and vec_pack_[u|s]fix_trunc_optab for VEC_PACK_FIX_TRUNC_EXPR. (expand_binop): Special case mode of the result for vec_pack_[u|s]fix_trunc_optab. (init_optabs): Initialize vec_unpack[s|u]_[hi|lo]_optab and vec_pack_[u|s]fix_trunc_optab. * tree.def (VEC_UNPACK_FLOAT_HI_EXPR, VEC_UNPACK_FLOAT_LO_EXPR, VEC_PACK_FIX_TRUNC_EXPR): New tree codes. * tree-pretty-print.c (dump_generic_node): Handle VEC_UNPACK_FLOAT_HI_EXPR, VEC_UNPACK_FLOAT_LO_EXPR and VEC_PACK_FIX_TRUNC_EXPR. (op_prio): Ditto. * expr.c (expand_expr_real_1): Ditto. * tree-inline.c (estimate_num_insns_1): Ditto. * tree-vect-generic.c (expand_vector_operations_1): Ditto. * config/i386/sse.md (vec_unpacks_float_hi_v8hi): New expander. (vec_unpacks_float_lo_v8hi): Ditto. (vec_unpacku_float_hi_v8hi): Ditto. (vec_unpacku_float_lo_v8hi): Ditto. (vec_unpacks_float_hi_v4si): Ditto. (vec_unpacks_float_lo_v4si): Ditto. (vec_pack_sfix_trunc_v2df): Ditto. * doc/c-tree.texi (Expression trees) [VEC_UNPACK_FLOAT_HI_EXPR]: Document. [VEC_UNPACK_FLOAT_LO_EXPR]: Ditto. [VEC_PACK_FIX_TRUNC_EXPR]: Ditto. * doc/md.texi (Standard Names) [vec_pack_sfix_trunc]: Document. [vec_pack_ufix_trunc]: Ditto. [vec_unpacks_float_hi]: Ditto. [vec_unpacks_float_lo]: Ditto. [vec_unpacku_float_hi]: Ditto. [vec_unpacku_float_lo]: Ditto. testsuite/ChangeLog: PR tree-optimization/24659 * gcc.dg/vect/vect-floatint-conversion-2.c: New test. * gcc.dg/vect/vect-intfloat-conversion-1.c: Require vect_float, not vect_int target. * gcc.dg/vect/vect-intfloat-conversion-2.c: Require vect_float, not vect_int target. Loop is vectorized for vect_intfloat_cvt targets. * gcc.dg/vect/vect-intfloat-conversion-3.c: New test. * gcc.dg/vect/vect-intfloat-conversion-4a.c: New test. * gcc.dg/vect/vect-intfloat-conversion-4b.c: New test. Added: trunk/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-2.c trunk/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-3.c trunk/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4a.c trunk/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4b.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md trunk/gcc/doc/c-tree.texi trunk/gcc/doc/md.texi trunk/gcc/expr.c trunk/gcc/genopinit.c trunk/gcc/optabs.c trunk/gcc/optabs.h trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-1.c trunk/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-2.c trunk/gcc/tree-inline.c trunk/gcc/tree-pretty-print.c trunk/gcc/tree-vect-generic.c trunk/gcc/tree-vect-transform.c trunk/gcc/tree-vectorizer.c trunk/gcc/tree-vectorizer.h trunk/gcc/tree.def
Altivec PPC has int->float cvt insn and provides signed/unsigned vec_unpack v8hi insn. It should be trivial to implement short->float and unsigned short->float conversions by providing vec_unpacks_float_lo_v8hi, vec_unpacks_float_hi_v8hi, vec_unpacku_float_lo_v8hi and vec_unpacku_float_hi_v8hi patterns (please look into i386/sse.md file). By providing these patterns, loops in gcc.dg/vect/vect-intfloat-conversion-4a.c and gcc.dg/vect/vect-intfloat-conversion-4b.c should be vectorized.
Just for the record, the only remaining x86 conversion (sse < 4) is vectorized BUILT_IN_LRINT that uses cvtpd2dq. The problem here is that n_in < n_out, so we probably need to apply narrowing modifier to TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION handling.
(In reply to comment #15) > Just for the record, the only remaining x86 conversion (sse < 4) is vectorized > BUILT_IN_LRINT that uses cvtpd2dq. The problem here is that n_in < n_out, so we > probably need to apply narrowing modifier to > TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION handling. The patch to generate cvtpd2dq is at http://gcc.gnu.org/ml/gcc-patches/2007-06/msg02101.html
Subject: Bug 24659 Author: uros Date: Fri Jun 29 10:30:06 2007 New Revision: 126111 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126111 Log: PR tree-optimization/24659 * tree-vect-transform.c (vectorizable_call): Handle (nunits_in == nunits_out / 2) and (nunits_out == nunits_in / 2) cases. * config/i386/sse.md (vec_pack_sfix_v2df): New expander. * config/i386/i386.c (enum ix86_builtins) [IX86_BUILTIN_VEC_PACK_SFIX]: New constant. (struct bdesc_2arg) [__builtin_ia32_vec_pack_sfix]: New builtin description. (ix86_init_mmx_sse_builtins): Define all builtins with 2 arguments as const using def_builtin_const. (ix86_expand_binop_builtin): Remove bogus assert() that insn wants input operands in the same modes as the result. (ix86_builtin_vectorized_function): Handle BUILT_IN_LRINT. testsuite/ChangeLog: PR tree-optimization/24659 * gcc.target/i386/vectorize2.c: New test. * gcc.target/i386/sse2-lrint-vec.c: New runtime test. * gcc.target/i386/sse2-lrintf-vec.c: Ditto. Added: trunk/gcc/testsuite/gcc.target/i386/sse2-lrint-vec.c trunk/gcc/testsuite/gcc.target/i386/sse2-lrintf-vec.c trunk/gcc/testsuite/gcc.target/i386/vectorize2.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/sse.md trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-transform.c
Fully implemented in mainline. (BTW: A PPC maintainer should implement missing patterns for altivec as outlined in Comment #14.)
testing this patch for Altivec: Index: config/rs6000/altivec.md =================================================================== *** config/rs6000/altivec.md (revision 126053) --- config/rs6000/altivec.md (working copy) *************** *** 147,152 **** --- 147,156 ---- (UNSPEC_VPERMHI 321) (UNSPEC_INTERHI 322) (UNSPEC_INTERLO 323) + (UNSPEC_VUPKHS_V4SF 324) + (UNSPEC_VUPKLS_V4SF 325) + (UNSPEC_VUPKHU_V4SF 326) + (UNSPEC_VUPKLU_V4SF 327) ]) (define_constants *************** *** 2933,2935 **** --- 2937,2995 ---- emit_insn (gen_altivec_vmrgl<VI_char> (operands[0], operands[1], operands[2])); DONE; }") + + (define_expand "vec_unpacks_float_hi_v8hi" + [(set (match_operand:V4SF 0 "register_operand" "") + (unspec:V4SF [(match_operand:V8HI 1 "register_operand" "")] + UNSPEC_VUPKHS_V4SF))] + "TARGET_ALTIVEC" + " + { + rtx tmp = gen_reg_rtx (V4SImode); + + emit_insn (gen_vec_unpacks_hi_v8hi (tmp, operands[1])); + emit_insn (gen_altivec_vcfsx (operands[0], tmp, const0_rtx)); + DONE; + }") + + (define_expand "vec_unpacks_float_lo_v8hi" + [(set (match_operand:V4SF 0 "register_operand" "") + (unspec:V4SF [(match_operand:V8HI 1 "register_operand" "")] + UNSPEC_VUPKLS_V4SF))] + "TARGET_ALTIVEC" + " + { + rtx tmp = gen_reg_rtx (V4SImode); + + emit_insn (gen_vec_unpacks_lo_v8hi (tmp, operands[1])); + emit_insn (gen_altivec_vcfsx (operands[0], tmp, const0_rtx)); + DONE; + }") + + (define_expand "vec_unpacku_float_hi_v8hi" + [(set (match_operand:V4SF 0 "register_operand" "") + (unspec:V4SF [(match_operand:V8HI 1 "register_operand" "")] + UNSPEC_VUPKHU_V4SF))] + "TARGET_ALTIVEC" + " + { + rtx tmp = gen_reg_rtx (V4SImode); + + emit_insn (gen_vec_unpacku_hi_v8hi (tmp, operands[1])); + emit_insn (gen_altivec_vcfux (operands[0], tmp, const0_rtx)); + DONE; + }") + + (define_expand "vec_unpacku_float_lo_v8hi" + [(set (match_operand:V4SF 0 "register_operand" "") + (unspec:V4SF [(match_operand:V8HI 1 "register_operand" "")] + UNSPEC_VUPKLU_V4SF))] + "TARGET_ALTIVEC" + " + { + rtx tmp = gen_reg_rtx (V4SImode); + + emit_insn (gen_vec_unpacku_lo_v8hi (tmp, operands[1])); + emit_insn (gen_altivec_vcfux (operands[0], tmp, const0_rtx)); + DONE; + }")