[Bug target/101929] [12 Regression] r12-7319 regress x264_r by 4% on CLX.

crazylht at gmail dot com gcc-bugzilla@gcc.gnu.org
Mon Mar 7 08:22:01 GMT 2022


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101929

--- Comment #8 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #7)
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 9188d727e33..7f1f12fb6c6 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -2374,7 +2375,7 @@ fail:
>                 n_vector_builds++;
>             }
>         }
> -      if (all_uniform_p
> +      if ((all_uniform_p && !two_operators)
>           || n_vector_builds > 1
>           || (n_vector_builds == children.length ()
>               && is_a <gphi *> (stmt_info->stmt)))
> 
> 
> will re-enable the vectorization - it evades the vect_construct cost bump
> by instead using scalar_to_vec (aka splat) which has not yet been fixed to
> account for a possible gpr to xmm move (so it would be a temporary "solution"
> at best).
> 
> Another change to mute the effect somewhat (but not fixing x264) that was
> mentioned is
> 
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index b2bf90576d5..acf2cc977b4 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -22595,7 +22595,7 @@ ix86_builtin_vectorization_cost (enum
> vect_cost_for_stmt type_of_cost,
>        case vec_construct:
>         {
>           /* N element inserts into SSE vectors.  */
> -         int cost = TYPE_VECTOR_SUBPARTS (vectype) * ix86_cost->sse_op;
> +         int cost = (TYPE_VECTOR_SUBPARTS (vectype) - 1) *
> ix86_cost->sse_op;

(In reply to Richard Biener from comment #7)
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 9188d727e33..7f1f12fb6c6 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -2374,7 +2375,7 @@ fail:
>                 n_vector_builds++;
>             }
>         }
> -      if (all_uniform_p
> +      if ((all_uniform_p && !two_operators)
>           || n_vector_builds > 1
>           || (n_vector_builds == children.length ()
>               && is_a <gphi *> (stmt_info->stmt)))
> 
> 
> will re-enable the vectorization - it evades the vect_construct cost bump
> by instead using scalar_to_vec (aka splat) which has not yet been fixed to
> account for a possible gpr to xmm move (so it would be a temporary "solution"
> at best).
> 
> Another change to mute the effect somewhat (but not fixing x264) that was
> mentioned is
> 
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index b2bf90576d5..acf2cc977b4 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -22595,7 +22595,7 @@ ix86_builtin_vectorization_cost (enum
> vect_cost_for_stmt type_of_cost,
>        case vec_construct:
>         {
>           /* N element inserts into SSE vectors.  */
> -         int cost = TYPE_VECTOR_SUBPARTS (vectype) * ix86_cost->sse_op;
> +         int cost = (TYPE_VECTOR_SUBPARTS (vectype) - 1) *
> ix86_cost->sse_op;
n - 1 is right for 128-bit vector, but for 256-bit vector, shouldn't it be n -
2, since we have a separate cost for vinserti128, and n - 4 for 512-bit one.


More information about the Gcc-bugs mailing list