[Bug target/103554] -mavx generates worse code on scalar code

Tue Dec 7 08:36:40 GMT 2021

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #8)
> > but the x86 backend chooses to not let the vectorizer compare costs with
> > different vector sizes but instead asks it to pick the first working
> > solution from the vector of modes to consider (and in that order).  We
> > might want to reconsider that (maybe at least for BB vectorization and
> > maybe with some extra special mode?).
> 
> Shouldn't the vectorizer compare costs of different vector factors and
> choose the samllest one, or vectorizer already support the corresponding
> framework, but the x86 backend doesn't implement the corresponding
> target_hook?

This is controlled by the autovectorize_vector_modes hook where the
return value is documented as

The hook returns a bitmask of flags that control how the modes in\n\
@var{modes} are used.  The flags are:\n\
@table @code\n\
@item VECT_COMPARE_COSTS\n\
Tells the loop vectorizer to try all the provided modes and pick the one\n\
with the lowest cost.  By default the vectorizer will choose the first\n\
mode that works.\n\
@end table\n\
\n\
The hook does not need to do anything if the vector returned by\n\
@code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE} is the only one relevant\n\
for autovectorization.  The default implementation adds no modes and\n\
returns 0.

IIRC we don't compare costs since we have so many vector modes and iterating
over them is (compile-time) costly and the question is how well we can trust
our cost model here to make a concise decision.  The hook currently is not
told the vectorization mode (loop vs. basic-block vectorization) - we might
want to add this info and amend the hook accordingly.  We might also want
to add another mode that says to stop iterating over modes when the
vectorizer runs into a mode with larger cost - like if we have mode/cost
pairs { V64QI, 64 } { V32QI, 56 } { V16QI, 60 } then stop and not try
V8QI and V4QI.

Note that returning VECT_COMPARE_COSTS has to be done carefully to avoid
changing semantics of -mprefer-vector-with, currently if the preferred
width can be used we use it but with comparing costs we can end up
using a smaller vector size if that's deemed better.  -mprefer-vector-width
would behave more like a -mmax-vector-width when comparing costs.

We could add VECT_FIRST_PREFERRED and make the vectorizer pick the first
mode (which we'd then need to order first) and only if that isn't supported
compare costs.  Alternatively simply only return VECT_COMPARE_COSTS when
no -mprefer-* is given.

I was mostly pointing out that the cost modeling for this particular case
would have prefered SSE but we told the vectorizer to pick the first
successful attempt.  Note the very first mode tried is _not_ the first
mode in the array but it's the one auto-detected from the testcases
use and the preferred_simd_mode hook.