Bug 24659 - Conversions are not vectorized
Summary: Conversions are not vectorized
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.1.0
: P3 enhancement
Target Milestone: ---
Assignee: Uroš Bizjak
URL:
Keywords: missed-optimization
Depends on:
Blocks: 31945
  Show dependency treegraph
 
Reported: 2005-11-03 16:21 UTC by Uroš Bizjak
Modified: 2007-06-29 16:46 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2007-04-21 19:43:07


Attachments
vectorized assembly output from ICC v9.1 (1.34 KB, text/plain)
2007-01-05 18:27 UTC, Stuart Hastings
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Uroš Bizjak 2005-11-03 16:21:47 UTC
Following code should produce a cvtps2pd and cvtpi2pd instructions that operate on vectors:

void test_fp (float *a, double *b)
{
  int i;

  for (i = 0; i < 4; i++)
    b[i] = (double) a[i];
}

void test_int (int *a, double *b)
{
  int i;

  for (i = 0; i < 4; i++)
    b[i] = (double) a[i];
}

Currently, gcc produces scalar instructions
(gcc -O2 -march=pentium4 -mfpmath=sse -ftree-vectorize):

.L2:
        movss   -4(%ecx,%eax,4), %xmm0
        cvtss2sd        %xmm0, %xmm0
        movsd   %xmm0, -8(%edx,%eax,8)
        addl    $1, %eax
        cmpl    $5, %eax
        jne     .L2

and
.L9:
        cvtsi2sd        -4(%ecx,%eax,4), %xmm0
        movsd   %xmm0, -8(%edx,%eax,8)
        addl    $1, %eax
        cmpl    $5, %eax
        jne     .L9

(BTW: There is also one movss too many in the first example.)
Comment 1 Andrew Pinski 2005-11-03 16:24:59 UTC
Confirmed.
Comment 2 Dorit Naishlos 2005-11-03 17:11:11 UTC
vectorization of type conversions has recently been added to autovect-branch. It requires modeling the respective unpack and pack optabs in the machine description.

Comment 3 Stuart Hastings 2007-01-05 18:27:28 UTC
Created attachment 12862 [details]
vectorized assembly output from ICC v9.1

Generated from the "indefinite loop" variant of the testcase on OS X 10.4.7, using ICC v9.1:
% icc -O2 -S 24659.c
Comment 4 Stuart Hastings 2007-01-05 18:30:43 UTC
I ran the testcase through ICC, and it unrolled the loops without vectorizing them.  However, making the loops indefinite gets us the desired, vectorized result.  Here is the modified, indefinite loop version of the testcase:

void test_fp (float *a, double *b, int count)
{
  int i;

  for (i = 0; i < count; i++)
    b[i] = (double) a[i];
}

void test_int (int *a, double *b, int count)
{
  int i;

  for (i = 0; i < count; i++)
    b[i] = (double) a[i];
}

(Note to Apple: this is Radar 4079267)
Comment 5 Andrew Pinski 2007-01-05 18:39:32 UTC
SPU has vectorizable conversion also.
Comment 6 Uroš Bizjak 2007-01-06 17:47:05 UTC
(In reply to comment #2)
> vectorization of type conversions has recently been added to autovect-branch.
> It requires modeling the respective unpack and pack optabs in the machine
> description.

Hm, there is no infrastructure for int<->float conversions. vectorizable_operation() calls optab_for_tree_code() with "code" argument set to FLOAT_EXPR and FIX_TRUNC_EXPR and these always return NULL.

In DFmode<->SFmode case, vectorizable_operation() calls optab_for_tree_code() with VEC_PACK_MOD_EXPR and VEC_UNPACK_LO/HI_EXPR. At least the later case is intended for integer modes, as optab_for_tree_code() checks for TYPE_UNSIGNED on
"type" argument.
 
Comment 7 Tehila Meyzels 2007-01-07 08:03:45 UTC
Right, the vectorizer currently supports conversions only between integral types. Support for type conversions that involve also floating-point types are in the works.
Comment 8 Uroš Bizjak 2007-04-21 19:43:07 UTC
Patch for double<->float conversions at http://gcc.gnu.org/ml/gcc-patches/2007-04/msg01346.html.
Comment 9 uros 2007-04-22 19:45:18 UTC
Subject: Bug 24659

Author: uros
Date: Sun Apr 22 19:45:06 2007
New Revision: 124045

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=124045
Log:
2007-04-22  Uros Bizjak  <ubizjak@gmail.com>

        PR tree-optimization/24659
        * optabs.h (enum optab_index) [OTI_vec_unpacks_hi,
        OTI_vec_unpacks_lo]: Update comment to mention floating point operands.
        (vec_pack_trunc_optab): Rename from vec_pack_mod_optab.
        * genopinit.c (optabs): Rename vec_pack_mod_optab
        to vec_pack_trunc_optab.
        * tree-vect-transform.c (vectorizable_type_demotion): Do not fail
        early for scalar floating point operands for NOP_EXPR.
        (vectorizable_type_promotion): Ditto.
        * optabs.c (optab_for_tree_code) [VEC_PACK_TRUNC_EXPR]: Return
        vec_pack_trunc_optab.
        (expand_binop): Rename vec_float_trunc_optab to vec_pack_mod_optab.

        * tree.def (VEC_PACK_TRUNC_EXPR): Rename from VEC_PACK_MOD_EXPR.
        * tree-pretty-print.c (dump_generic_node) [VEC_PACK_TRUNC_EXPR]:
        Rename from VEC_PACK_MOD_EXPR.
        (op_prio) [VEC_PACK_TRUNC_EXPR]: Ditto.
        * expr.c (expand_expr_real_1): Ditto.
        * tree-inline.c (estimate_num_insns_1): Ditto.
        * tree-vect-generic.c (expand_vector_operations_1): Ditto.

        * config/i386/sse.md (vec_unpacks_hi_v4sf): New expander.
        (vec_unpacks_lo_v4sf): Ditto.
        (vec_pack_trunc_v2df): Ditto.
        (vec_pack_trunc_v8hi): Rename from vec_pack_mod_v8hi.
        (vec_pack_trunc_v4si): Rename from vec_pack_mod_v4si.
        (vec_pack_trunc_v2di): Rename from vec_pack_mod_v2di.
    
        * config/rs6000/altivec.md (vec_pack_trunc_v8hi): Rename from
        vec_pack_mod_v8hi.
        (vec_pack_trunc_v4si): Rename from vec_pack_mod_v4si.

        * doc/c-tree.texi (Expression trees) [VEC_PACK_TRUNC_EXPR]:
        Rename from VEC_PACK_MOD_EXPR.  This expression also represent
        packing of floating point operands.
        [VEC_UNPACK_HI_EXPR, VEC_UNPACK_LO_EXPR]: These expression also
        represent unpacking of floating point operands.
        * doc/md.texi (Standard Names) [vec_pack_trunc]: Update documentation.
        [vec_unpacks_hi]: Ditto.
        [vec_unpacks_lo]: Ditto.

testsuite/ChangeLog:

2007-04-22  Uros Bizjak  <ubizjak@gmail.com>

    PR tree-optimization/24659
    * gcc.dg/vect/vect-float-extend-1.c: New test.
    * gcc.dg/vect/vect-float-truncate-1.c: New test.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/sse.md
    trunk/gcc/config/rs6000/altivec.md
    trunk/gcc/doc/c-tree.texi
    trunk/gcc/doc/md.texi
    trunk/gcc/expr.c
    trunk/gcc/genopinit.c
    trunk/gcc/optabs.c
    trunk/gcc/optabs.h
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-inline.c
    trunk/gcc/tree-pretty-print.c
    trunk/gcc/tree-vect-generic.c
    trunk/gcc/tree-vect-transform.c
    trunk/gcc/tree-vectorizer.c
    trunk/gcc/tree.def

Comment 10 Uroš Bizjak 2007-04-22 20:08:15 UTC
float->double and double->float conversions are new vectorized. For a slightly different test:

--cut here--
void test_fp (float *a, double *b)
{
  int i;

  for (i = 0; i < 4; i++)
    b[i] = (double) a[i];
}

void test_int (int *a, double *b)
{
  int i;

  for (i = 0; i < 4; i++)
    b[i] = (double) a[i];
}
--cut here--

we generate following loops:

test_fd:

.L2:
        movaps  a(%rax), %xmm0
        movhlps %xmm0, %xmm2
        cvtps2pd        %xmm0, %xmm1
        movapd  %xmm1, c(%rax,%rax)
        cvtps2pd        %xmm2, %xmm0
        movapd  %xmm0, c+16(%rax,%rax)
        addq    $16, %rax
        cmpq    $64, %rax
        jne     .L2

test_df:

.L8:
        cvtpd2ps        c(%rax,%rax), %xmm0
        cvtpd2ps        c+16(%rax,%rax), %xmm1
        movlhps %xmm1, %xmm0
        movaps  %xmm0, a(%rax)
        addq    $16, %rax
        cmpq    $64, %rax
        jne     .L8

test_int (no vectorization):

.L13:
        cvtsi2sd        b(,%rax,4), %xmm0
        movsd   %xmm0, c(,%rax,8)
        addq    $1, %rax
        cmpq    $16, %rax
        jne     .L13

Note, that we still don't vectorize double->int and int->double conversions.
Comment 11 Uroš Bizjak 2007-04-22 20:10:05 UTC
(In reply to comment #10)
> float->double and double->float conversions are new vectorized. For a slightly
> different test:

The test is actually:

--cut here--
float a[16];
int b[16];
double c[16];

void test_fd (void)
{
  int i;

  for (i = 0; i < 16; i++)
    c[i] = (double) a[i];
}

void test_df (void)
{
  int i;

  for (i = 0; i < 16; i++)
    a[i] = (float) c[i];
}

void test_int (void)
{
  int i;

  for (i = 0; i < 16; i++)
    c[i] = (double) b[i];
}
--cut here--
Comment 12 Uroš Bizjak 2007-05-15 15:06:59 UTC
Patch for several other missing conversions:
http://gcc.gnu.org/ml/gcc-patches/2007-05/msg00966.html
Comment 13 uros 2007-05-17 07:31:35 UTC
Subject: Bug 24659

Author: uros
Date: Thu May 17 06:31:05 2007
New Revision: 124784

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=124784
Log:
	PR tree-optimization/24659
        * optabs.h (enum optab_index): Add OTI_vec_unpacks_float_hi,
	OTI_vec_unpacks_float_lo, OTI_vec_unpacku_float_hi,
	OTI_vec_unpacku_float_lo, OTI_vec_pack_sfix_trunc and
	OTI_vec_pack_ufix_trunc.
	(vec_unpacks_float_hi_optab): Define new macro.
	(vec_unpacks_float_lo_optab): Ditto.
	(vec_unpacku_float_hi_optab): Ditto.
	(vec_unpacku_float_lo_optab): Ditto.
	(vec_pack_sfix_trunc_optab): Ditto.
	(vec_pack_ufix_trunc_optab): Ditto.
	* genopinit.c (optabs): Implement vec_unpack[s|u]_[hi|lo]_optab
	and vec_pack_[s|u]fix_trunc_optab using
	vec_unpack[s|u]_[hi\lo]_* and vec_pack_[u|s]fix_trunc_* patterns
	* tree-vectorizer.c (supportable_widening_operation): Handle
	FLOAT_EXPR and CONVERT_EXPR.  Update comment.
	(supportable_narrowing_operation): New function.
	* tree-vectorizer.h (supportable_narrowing_operation): Prototype.
	* tree-vect-transform.c (vectorizable_conversion): Handle
	(nunits_in == nunits_out / 2) and (nunits_out == nunits_in / 2) cases.
	(vect_gen_widened_results_half): Move before vectorizable_conversion.
	(vectorizable_type_demotion): Call supportable_narrowing_operation()
	to check for target support.
	* optabs.c (optab_for_tree_code) Return vec_unpack[s|u]_float_hi_optab
	for VEC_UNPACK_FLOAT_HI_EXPR, vec_unpack[s|u]_float_lo_optab
	for VEC_UNPACK_FLOAT_LO_EXPR and vec_pack_[u|s]fix_trunc_optab
	for VEC_PACK_FIX_TRUNC_EXPR.
	(expand_binop): Special case mode of the result for
	vec_pack_[u|s]fix_trunc_optab.
	(init_optabs): Initialize vec_unpack[s|u]_[hi|lo]_optab and
	vec_pack_[u|s]fix_trunc_optab.

	* tree.def (VEC_UNPACK_FLOAT_HI_EXPR, VEC_UNPACK_FLOAT_LO_EXPR,
	VEC_PACK_FIX_TRUNC_EXPR): New tree codes.
	* tree-pretty-print.c (dump_generic_node): Handle
	VEC_UNPACK_FLOAT_HI_EXPR, VEC_UNPACK_FLOAT_LO_EXPR and
	VEC_PACK_FIX_TRUNC_EXPR.
	(op_prio): Ditto.
	* expr.c (expand_expr_real_1): Ditto.
	* tree-inline.c (estimate_num_insns_1): Ditto.
	* tree-vect-generic.c (expand_vector_operations_1): Ditto.

	* config/i386/sse.md (vec_unpacks_float_hi_v8hi): New expander.
	(vec_unpacks_float_lo_v8hi): Ditto.
	(vec_unpacku_float_hi_v8hi): Ditto.
	(vec_unpacku_float_lo_v8hi): Ditto.
	(vec_unpacks_float_hi_v4si): Ditto.
	(vec_unpacks_float_lo_v4si): Ditto.
	(vec_pack_sfix_trunc_v2df): Ditto.

	* doc/c-tree.texi (Expression trees) [VEC_UNPACK_FLOAT_HI_EXPR]:
	Document.
	[VEC_UNPACK_FLOAT_LO_EXPR]: Ditto.
	[VEC_PACK_FIX_TRUNC_EXPR]: Ditto.
	* doc/md.texi (Standard Names) [vec_pack_sfix_trunc]: Document.
	[vec_pack_ufix_trunc]: Ditto.
	[vec_unpacks_float_hi]: Ditto.
	[vec_unpacks_float_lo]: Ditto.
	[vec_unpacku_float_hi]: Ditto.
	[vec_unpacku_float_lo]: Ditto.

testsuite/ChangeLog:

	PR tree-optimization/24659
	* gcc.dg/vect/vect-floatint-conversion-2.c: New test.
	* gcc.dg/vect/vect-intfloat-conversion-1.c: Require vect_float,
	not vect_int target.
	* gcc.dg/vect/vect-intfloat-conversion-2.c: Require vect_float,
	not vect_int target.  Loop is vectorized for vect_intfloat_cvt
	targets.
	* gcc.dg/vect/vect-intfloat-conversion-3.c: New test.
	* gcc.dg/vect/vect-intfloat-conversion-4a.c: New test.
	* gcc.dg/vect/vect-intfloat-conversion-4b.c: New test.


Added:
    trunk/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-2.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-3.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4a.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4b.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/sse.md
    trunk/gcc/doc/c-tree.texi
    trunk/gcc/doc/md.texi
    trunk/gcc/expr.c
    trunk/gcc/genopinit.c
    trunk/gcc/optabs.c
    trunk/gcc/optabs.h
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-1.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-2.c
    trunk/gcc/tree-inline.c
    trunk/gcc/tree-pretty-print.c
    trunk/gcc/tree-vect-generic.c
    trunk/gcc/tree-vect-transform.c
    trunk/gcc/tree-vectorizer.c
    trunk/gcc/tree-vectorizer.h
    trunk/gcc/tree.def

Comment 14 Uroš Bizjak 2007-05-17 07:45:58 UTC
Altivec PPC has int->float cvt insn and provides signed/unsigned vec_unpack v8hi insn. It should be trivial to implement short->float and unsigned short->float conversions by providing vec_unpacks_float_lo_v8hi, vec_unpacks_float_hi_v8hi, vec_unpacku_float_lo_v8hi and vec_unpacku_float_hi_v8hi patterns (please look into i386/sse.md file).

By providing these patterns, loops in gcc.dg/vect/vect-intfloat-conversion-4a.c and gcc.dg/vect/vect-intfloat-conversion-4b.c should be vectorized.

Comment 15 Uroš Bizjak 2007-05-17 08:28:49 UTC
Just for the record, the only remaining x86 conversion (sse < 4) is vectorized BUILT_IN_LRINT that uses cvtpd2dq. The problem here is that n_in < n_out, so we probably need to apply narrowing modifier to TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION handling.
Comment 16 Uroš Bizjak 2007-06-29 08:53:48 UTC
(In reply to comment #15)
> Just for the record, the only remaining x86 conversion (sse < 4) is vectorized
> BUILT_IN_LRINT that uses cvtpd2dq. The problem here is that n_in < n_out, so we
> probably need to apply narrowing modifier to
> TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION handling.

The patch to generate cvtpd2dq is at http://gcc.gnu.org/ml/gcc-patches/2007-06/msg02101.html


Comment 17 uros 2007-06-29 10:30:21 UTC
Subject: Bug 24659

Author: uros
Date: Fri Jun 29 10:30:06 2007
New Revision: 126111

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126111
Log:
	PR tree-optimization/24659
	* tree-vect-transform.c (vectorizable_call): Handle
	(nunits_in == nunits_out / 2) and (nunits_out == nunits_in / 2) cases.

	* config/i386/sse.md (vec_pack_sfix_v2df): New expander.
	* config/i386/i386.c (enum ix86_builtins)
	[IX86_BUILTIN_VEC_PACK_SFIX]: New constant.
	(struct bdesc_2arg) [__builtin_ia32_vec_pack_sfix]: New builtin
	description.
	(ix86_init_mmx_sse_builtins): Define all builtins with 2 arguments as
	const using def_builtin_const.
	(ix86_expand_binop_builtin): Remove bogus assert() that insn wants
	input operands in the same modes as the result.
	(ix86_builtin_vectorized_function): Handle BUILT_IN_LRINT.

testsuite/ChangeLog:

	PR tree-optimization/24659
	* gcc.target/i386/vectorize2.c: New test.
	* gcc.target/i386/sse2-lrint-vec.c: New runtime test.
	* gcc.target/i386/sse2-lrintf-vec.c: Ditto.


Added:
    trunk/gcc/testsuite/gcc.target/i386/sse2-lrint-vec.c
    trunk/gcc/testsuite/gcc.target/i386/sse2-lrintf-vec.c
    trunk/gcc/testsuite/gcc.target/i386/vectorize2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/i386/sse.md
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vect-transform.c

Comment 18 Uroš Bizjak 2007-06-29 10:38:06 UTC
Fully implemented in mainline.

(BTW: A PPC maintainer should implement missing patterns for altivec as outlined in Comment #14.)
Comment 19 Dorit Naishlos 2007-06-29 16:46:04 UTC
testing this patch for Altivec:

Index: config/rs6000/altivec.md
===================================================================
*** config/rs6000/altivec.md    (revision 126053)
--- config/rs6000/altivec.md    (working copy)
***************
*** 147,152 ****
--- 147,156 ----
     (UNSPEC_VPERMHI    321)
     (UNSPEC_INTERHI      322)
     (UNSPEC_INTERLO      323)
+    (UNSPEC_VUPKHS_V4SF   324)
+    (UNSPEC_VUPKLS_V4SF   325)
+    (UNSPEC_VUPKHU_V4SF   326)
+    (UNSPEC_VUPKLU_V4SF   327)
  ])

  (define_constants
***************
*** 2933,2935 ****
--- 2937,2995 ----
    emit_insn (gen_altivec_vmrgl<VI_char> (operands[0], operands[1], operands[2]));
    DONE;
  }")
+
+ (define_expand "vec_unpacks_float_hi_v8hi"
+  [(set (match_operand:V4SF 0 "register_operand" "")
+         (unspec:V4SF [(match_operand:V8HI 1 "register_operand" "")]
+                      UNSPEC_VUPKHS_V4SF))]
+   "TARGET_ALTIVEC"
+   "
+ {
+   rtx tmp = gen_reg_rtx (V4SImode);
+
+   emit_insn (gen_vec_unpacks_hi_v8hi (tmp, operands[1]));
+   emit_insn (gen_altivec_vcfsx (operands[0], tmp, const0_rtx));
+   DONE;
+ }")
+
+ (define_expand "vec_unpacks_float_lo_v8hi"
+  [(set (match_operand:V4SF 0 "register_operand" "")
+         (unspec:V4SF [(match_operand:V8HI 1 "register_operand" "")]
+                      UNSPEC_VUPKLS_V4SF))]
+   "TARGET_ALTIVEC"
+   "
+ {
+   rtx tmp = gen_reg_rtx (V4SImode);
+
+   emit_insn (gen_vec_unpacks_lo_v8hi (tmp, operands[1]));
+   emit_insn (gen_altivec_vcfsx (operands[0], tmp, const0_rtx));
+   DONE;
+ }")
+
+ (define_expand "vec_unpacku_float_hi_v8hi"
+  [(set (match_operand:V4SF 0 "register_operand" "")
+         (unspec:V4SF [(match_operand:V8HI 1 "register_operand" "")]
+                      UNSPEC_VUPKHU_V4SF))]
+   "TARGET_ALTIVEC"
+   "
+ {
+   rtx tmp = gen_reg_rtx (V4SImode);
+
+   emit_insn (gen_vec_unpacku_hi_v8hi (tmp, operands[1]));
+   emit_insn (gen_altivec_vcfux (operands[0], tmp, const0_rtx));
+   DONE;
+ }")
+
+ (define_expand "vec_unpacku_float_lo_v8hi"
+  [(set (match_operand:V4SF 0 "register_operand" "")
+         (unspec:V4SF [(match_operand:V8HI 1 "register_operand" "")]
+                      UNSPEC_VUPKLU_V4SF))]
+   "TARGET_ALTIVEC"
+   "
+ {
+   rtx tmp = gen_reg_rtx (V4SImode);
+
+   emit_insn (gen_vec_unpacku_lo_v8hi (tmp, operands[1]));
+   emit_insn (gen_altivec_vcfux (operands[0], tmp, const0_rtx));
+   DONE;
+ }")