Bug 117379 - Failure to vectorize multi add + mulit sub
Summary: Failure to vectorize multi add + mulit sub
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 15.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2024-10-31 06:01 UTC by Hu Lin
Modified: 2024-10-31 12:46 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2024-10-31 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Hu Lin 2024-10-31 06:01:52 UTC
I found GCC can't optimize

typedef unsigned long long u64;

u64 mobility(u64 * a, u64 * b) {

    u64 mobility = 0;
    mobility += a[0];
    mobility += a[1];
    mobility += a[2];
    mobility += a[3];

    mobility -= b[0];
    mobility -= b[1];
    mobility -= b[2];
    mobility -= b[3];

    return mobility;
}

but llvm can
https://godbolt.org/z/srWxxKhM9.

The failure reason is "missed:  not vectorized: no grouped stores in basic block."

Any chance of gcc implementing vectorization like llvm?
Comment 1 Andrew Pinski 2024-10-31 06:25:10 UTC
Confirmed.
Comment 2 Richard Biener 2024-10-31 12:46:43 UTC
We are correctly detecting the reduction chain in vect_slp_check_for_roots but
currently reject mixed operation chains there:

                  if (chain[i].code != code)
                    {
                      invalid = true;
                      break;
                    }

(gdb) p chain.length ()
$1 = 8
(gdb) p chain
$2 = {<vec<chain_op_t, va_heap, vl_ptr>> = {m_vec = 0x4d957a0 = {{
        code = PLUS_EXPR, dt = vect_internal_def, 
        op = <ssa_name 0x7ffff6e1c000 4>}, {code = PLUS_EXPR, 
        dt = vect_internal_def, op = <ssa_name 0x7ffff700df30 3>}, {
        code = PLUS_EXPR, dt = vect_internal_def, 
        op = <ssa_name 0x7ffff700de10 1>}, {code = PLUS_EXPR, 
        dt = vect_internal_def, op = <ssa_name 0x7ffff700dea0 2>}, {
        code = MINUS_EXPR, dt = vect_internal_def, 
        op = <ssa_name 0x7ffff6e1c1f8 8>}, {code = MINUS_EXPR, 
        dt = vect_internal_def, op = <ssa_name 0x7ffff6e1c168 7>}, {
        code = MINUS_EXPR, dt = vect_internal_def, 
        op = <ssa_name 0x7ffff6e1c0d8 6>}, {code = MINUS_EXPR, 
        dt = vect_internal_def, 
        op = <ssa_name 0x7ffff6e1c048 5>}}}, <No data fields>}

mixed operation handling is missing in vectorize_slp_instance_root_stmt
and it misses meta-data to indicate which lanes to negate.