[PATCH] [4.3. projects] Vectorize int to float conversions - dorit

Thu Jan 25 15:32:00 GMT 2007

"Richard Guenther" <richard.guenther@gmail.com> wrote on 24/01/2007
17:49:08:

Hi Richard,

> Here's a more thorough review.
....
> +   /* Check types of lhs and rhs.  */
> +   op0 = TREE_OPERAND (operation, 0);
> +   rhs_type = TREE_TYPE (op0);
>
> so you only allow unary operations - please add a helper function here
> that extracts the source type from the operation so we can also allow
> function calls here.
>

I think that vectorization of function calls should be handled by
vectorizable_call (that you had recently added). I think vectorizable_call
doesn't need to really care about the semantics of the called function -
whether it does some math function or a conversion - from the vectorizer's
point of view it's replacing one function call with another. right?

> so this is ok for now.  But the canonical way is to write ??? or FIXME
> instead of
> FORNOW.
>

The convention in the vectorizer files has been to use a FORNOW for
restrictions on vectorization that can (and should) be relaxed, and
FIXME/CHECKME for other things.  It can easily be changed to FIXMEs (do we
really want to do that though?)

> +       new_stmt = build_function_call_expr (builtin_decl, params);
> +
> +       /* Arguments are ready. create the new vector stmt.  */
> +       new_stmt = build2 (GIMPLE_MODIFY_STMT, void_type_node, vec_dest,
> +                        new_stmt);
> +       new_temp = make_ssa_name (vec_dest, new_stmt);
> +       GIMPLE_STMT_OPERAND (new_stmt, 0) = new_temp;
> +       vect_finish_stmt_generation (stmt, new_stmt, bsi);
> +
> +       if (j == 0)
> +       STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
> +       else
> +       STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
> +       prev_stmt_info = vinfo_for_stmt (new_stmt);
> +     }
> +   return true;
> + }
>
> Do you have a testcase excercising ncopies != 1?
>

good point, a testcase should be added. Here's an exmaple:

for (i=0; i<n; i++){
  float_arr[i] = (float) int_arr[i];
  char_arr[i] = 0;
}

(the above loop will be vectorized using VF=16 (for vector size 16 bytes),
which means that we have to "unroll" the int->float conversion operation by
4 (i.e. create 4 copies, in order to generate 16 float results in each
vectorized iteration).

By the way - this (supporting the case that ncopies>1) is something that is
also missing in vectorizable_call - I was going to address this a few weeks
ago but never got around to doing that (I totally missed it when I read
over your patch - sorry about that). I'm travelling next week, but could
provide a patch to add the required support in the following week. (What
I'd really want to do is to put together some template for all the
"vectorizable_*" functions so that we won't forget stuff like that in the
future).

> Note that the patch somewhat overlaps with the support for vectorizing
builtin
> functions, so you might want to call the builtin_vectorized_function
> target hook
> for conversions that involve a function call.

which goes back to the point I was trying to make above - vectorizing
function calls (that happen to do a conversion) should be handled by
vectorizable_call. vectorizable_conversion will handle only tree-code that
do conversions. What do you say?

> To ever support nunits_in != nunits_out natively it
> would be nice to also pass the desired operand vector type to the
> target hooks.
>
> So it would look like
>
>    tree (* builtin_vectorized_operation) (unsigned tree_code, tree
> lhs_vec_type, tree rhs_vec_type);
>

it will take more than that - if the size of the arguments is different
than the size of the result then vectorization involves working with a
different number of vector registers coming in to the conversion compared
to the number of vector register coming out of the conversion (i.e. doing
something like we do in vectorize_demotion/promotion, although maybe not
using the same pack/unpack idioms) - e.g. - say you want to vectorize a
short->float conversion, and say we're working with a vector size 16 bytes
and the VF is 8 - your rhs argument is one vector of 8 shorts; your lhs
needs to be 2 vectors of 4 floats each. This is part of why the intention
was to take it one step at a time - solve the same-size-conversion first,
and the more general case later.

dorit