[Bug target/92658] x86 lacks vector extend / truncate

rsandifo at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Thu May 14 13:41:14 GMT 2020


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92658

--- Comment #12 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #11)
> On Thu, 14 May 2020, ubizjak at gmail dot com wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92658
> > 
> > --- Comment #10 from Uroš Bizjak <ubizjak at gmail dot com> ---
> > The patch is ready to be pushed, it is waiting for a decision what to do with
> > failed cases.
> > 
> > Richi, should this patch move forward (eventually XFAILing failed cases), or do
> > you plan to look at the fails from the generic vectorizer POV?
> 
> I think we should go forward with the patch, either XFAILing the testcases
> or splitting out the testcase (and backend patterns that do not get
> exercised due to the issue).
> 
> I've already said in comment#8 that the issue here is optabs working
> with modes and not vector types, so it's a bit hard to use the
> same mechanism to deal with the currently failing cases.  One
> possible route would be to add V4QImode similar to how we now
> do V2SFmode with SSE but of course where do we stop ...
> 
> As said we can try to tackle this incrementally.  Maybe Richard also
> has input here?
Nothing to add really, but: yeah, the idea was that the target
would provide smaller vector modes if it can efficiently load,
store and (at least) add them.  I think it would be good to do
that for aarch64 Advanced SIMD at some point.

There are three points behind using vector modes for this:

1. It gives the target the flexibility to decide how best to
   implement the operation (All at one end, and if so, which end?
   Or spread out evenly across the vector?)  On aarch64, the
   choice would differ between SVE and Advanced SIMD.

2. It means that __attribute__((vector_size)) vectors benefit.

3. The smaller modes can be used in conjunction with
   autovectorize_vector_modes to give the loop vectoriser
   a choice between operating on packed or unpacked vectors.
   This is mostly useful for VECT_COMPARE_COSTS but can help
   with low-trip-count loops even without.

So yeah, defining V4QI sounds like the way to go if the
architecture can process it efficiently.  I get the "where
do we stop" concern, but .md mode iterators should help.


More information about the Gcc-bugs mailing list