Bug 54346 - combine permutations
Summary: combine permutations
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.8.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
: 102056 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-08-21 14:02 UTC by Marc Glisse
Modified: 2022-10-25 05:53 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-08-14 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marc Glisse 2012-08-21 14:02:09 UTC
Hello,

when we have two VEC_PERM_EXPR with constant mask, where one is the only user of the result of the other one, it would be good to compose/merge them into a single VEC_PERM_EXPR. However, it is too hard for backends to always generate optimal code for shuffles, so we want to do the optimization only if we know it actually helps. Currently this means when the composed permutation is the identity. In the future, it could mean asking the backend.

See the conversation that started at:
http://gcc.gnu.org/ml/gcc-patches/2012-08/msg00676.html

and around this message for cost hooks (which could also help the vectorizer):
http://gcc.gnu.org/ml/gcc-patches/2012-08/msg00973.html

Related bug is http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147 but that one is about RTL (unless x86 eventually follows ARM and decides to implement _mm_* functions in terms of __builtin_shuffle).
Comment 1 Andrew Pinski 2016-08-14 17:11:17 UTC
Confirmed.
Comment 2 Andrew Pinski 2021-08-25 07:26:10 UTC
*** Bug 102056 has been marked as a duplicate of this bug. ***
Comment 3 Andrew Pinski 2021-08-25 09:02:42 UTC
Here is example code which should produce the same code:
typedef int v4si __attribute__((vector_size (16)));

v4si
foo (v4si a, v4si b)
{
    v4si c = __builtin_shuffle (a, b, __extension__ (v4si) {1, 4, 2, 7});
    v4si d = __builtin_shuffle (c, __extension__ (v4si) { 3, 2, 0, 1 });
    return d;
}

typedef int v4si __attribute__((vector_size (16)));

v4si
foo1 (v4si a, v4si b)
{
    v4si c = __builtin_shuffle (a, b, __extension__ (v4si){ 7, 2, 1, 4 });
    return c;
}
Comment 4 Hongtao.liu 2022-06-07 09:10:44 UTC
Now more x86 shuffle instrinsics are folded into gimple VEC_PERM_EXPR, I guess we need some Gimple-level pattern match to simplify successive vec_perm_expr.
Comment 5 GCC Commits 2022-10-11 06:12:30 UTC
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:b88adba751da635c6f0c353c5bc51bbe2ecf4c89

commit r13-3212-gb88adba751da635c6f0c353c5bc51bbe2ecf4c89
Author: Liwei Xu <liwei.xu@intel.com>
Date:   Fri Sep 23 13:46:02 2022 +0800

    Optimize nested permutation to single VEC_PERM_EXPR [PR54346]
    
            This patch implemented the optimization in PR 54346, which Merges
    
            c = VEC_PERM_EXPR <a, b, VCST0>;
            d = VEC_PERM_EXPR <c, c, VCST1>;
                    to
            d = VEC_PERM_EXPR <a, b, NEW_VCST>;
    
            Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
            tree-ssa/forwprop-19.c fail to pass but I'm not sure whether it
            is ok to removed it.
    
    gcc/ChangeLog:
    
            PR tree-optimization/54346
            * match.pd: Merge the index of VCST then generates the new vec_perm.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.dg/pr54346.c: New test.
    
    Co-authored-by: liuhongt <hongtao.liu@intel.com>
Comment 6 Marc Glisse 2022-10-11 06:21:53 UTC
The log says that this breaks tree-ssa/forwprop-19.c, but I don't see any xfail or anything. Does it only fail because gimple-simplify leaves some dead code around, so you could update the test to scan the next DCE pass dump instead of forwprop1? Or are we missing a transformation that just detects a VEC_PERM_EXPR with an identity permutation?
Comment 7 Hongtao.liu 2022-10-11 06:36:01 UTC
(In reply to Marc Glisse from comment #6)
> The log says that this breaks tree-ssa/forwprop-19.c, but I don't see any
> xfail or anything. Does it only fail because gimple-simplify leaves some
> dead code around, so you could update the test to scan the next DCE pass
> dump instead of forwprop1? Or are we missing a transformation that just
> detects a VEC_PERM_EXPR with an identity permutation?

Uoops, I didn't notice this, will add an incremental patch to handle the indentical index (forwporp-19.c) scenario, sorry.
Comment 8 GCC Commits 2022-10-21 07:17:54 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:fa553ff26d96f6fecaa8f1b00649cfdc6cda5f5a

commit r13-3430-gfa553ff26d96f6fecaa8f1b00649cfdc6cda5f5a
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Fri Oct 21 09:16:44 2022 +0200

    match.pd: Fix up gcc.dg/pr54346.c on i686-linux [PR54346]
    
    The pr54346.c testcase FAILs on i686-linux (without -msse*) for multiple
    reasons.  One is the trivial missing -Wno-psabi which the following patch
    adds, but that isn't enough.  The thing is that without native vector
    support, we have VEC_PERM_EXPRs in the IL and are actually considering
    the nested VEC_PERM_EXPRs into one VEC_PERM_EXPR optimization, but punt
    because can_vec_perm_const_p (result_mode, op_mode, sel2, false) is false.
    
    Such a test makes sense to prevent "optimizing" two VEC_PERM_EXPRs
    that can be handled by the backend natively into one VEC_PERM_EXPR
    that can't be handled.  But if both of the original VEC_PERM_EXPRs
    can't be handled natively either, having just one VEC_PERM_EXPR that will be
    lowered by generic vec lowering is IMHO still better than 2.
    Or even if we trade just one VEC_PERM_EXPR that can't be handled plus
    one that can to one that can't be handled.
    
    Also, removing the testcase's executable permissions...
    
    2022-10-21  <jakub@redhat.com>
    
            PR tree-optimization/54346
            * match.pd ((vec_perm (vec_perm@0 @1 @2 VECTOR_CST) @0 VECTOR_CST)):
            Optimize nested VEC_PERM_EXPRs even if target can't handle the
            new one provided we don't increase number of VEC_PERM_EXPRs the
            target can't handle.
    
            * gcc.dg/pr54346.c: Add -Wno-psabi to dg-options.
Comment 9 Hongtao.liu 2022-10-25 05:53:24 UTC
(In reply to Hongtao.liu from comment #7)
> (In reply to Marc Glisse from comment #6)
> > The log says that this breaks tree-ssa/forwprop-19.c, but I don't see any
> > xfail or anything. Does it only fail because gimple-simplify leaves some
> > dead code around, so you could update the test to scan the next DCE pass
> > dump instead of forwprop1? Or are we missing a transformation that just
> > detects a VEC_PERM_EXPR with an identity permutation?
> 
> Uoops, I didn't notice this, will add an incremental patch to handle the
> indentical index (forwporp-19.c) scenario, sorry.

It's fixed.