Bug 102947 - SPEC2006 481.wrf compiler time regression (-Ofast -march=native -flto) between g:1932e1169a236849 and g:9cfb95f9b92326e8
Summary: SPEC2006 481.wrf compiler time regression (-Ofast -march=native -flto) betwee...
Status: RESOLVED DUPLICATE of bug 102943
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 12.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: needs-bisection
Depends on:
Blocks: spec
  Show dependency treegraph
 
Reported: 2021-10-26 14:42 UTC by Jan Hubicka
Modified: 2021-10-26 14:57 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-10-26 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Hubicka 2021-10-26 14:42:36 UTC
This is seen in https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=322.291.8&plot.1=307.291.8&plot.2=343.291.8&plot.3=266.291.8&plot.4=395.291.8&plot.5=412.291.8&plot.6=289.291.8&
but also other periodic testers.

It seems to me that most likely suspect work related to VRP.

Changes in the range:
commit 9cfb95f9b92326e86e99b50350ebf04fa9cd2477
Author: liuhongt <hongtao.liu@intel.com>
Date:   Fri Sep 10 10:15:58 2021 +0800

    Relax condition of (vec_concat:M(vec_select op0 idx0)(vec_select op0 idx1)) to allow different modes between op0 and M, but have same inner mode.
    
    This will enable optimization for below pattern.
    
    (set (reg:V2DF 87 [ xx ])
        (vec_concat:V2DF (vec_select:DF (reg:V4DF 92)
                (parallel [
                        (const_int 2 [0x2])
                    ]))
            (vec_select:DF (reg:V4DF 92)
                (parallel [
                        (const_int 3 [0x3])
                    ]))))
    
    gcc/ChangeLog:
    
            * simplify-rtx.c
            (simplify_context::simplify_binary_operation_1): Relax
            condition of simplifying (vec_concat:M (vec_select op0
            index0)(vec_select op1 index1)) to allow different modes
            between op0 and M, but have same inner mode.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/i386/vect-rebuild.c: Adjust testcases.
            * gcc.target/i386/avx512f-vect-rebuild.c: New test.

commit 3540429be7ad1085af83600483908b621078fb6f
Author: liuhongt <hongtao.liu@intel.com>
Date:   Mon Sep 27 14:57:38 2021 +0800

    Support 128/256/512-bit vector plus/smin/smax reduction for _Float16.
    
    gcc/ChangeLog:
    
            * config/i386/i386-expand.c (emit_reduc_half): Handle
            V8HF/V16HF/V32HFmode.
            * config/i386/sse.md (REDUC_SSE_PLUS_MODE): Add V8HF.
            (REDUC_SSE_SMINMAX_MODE): Ditto.
            (REDUC_PLUS_MODE): Add V16HF and V32HF.
            (REDUC_SMINMAX_MODE): Ditto.
    
    gcc/testsuite
    
            * gcc.target/i386/avx512fp16-reduce-op-2.c: New test.
            * gcc.target/i386/avx512fp16-reduce-op-3.c: New test.

commit cf966403d91afcf475347f0d06dd2b7215ae3611
Author: GCC Administrator <gccadmin@gcc.gnu.org>
Date:   Tue Sep 28 00:16:21 2021 +0000

    Daily bump.

commit 51018dd1395c72b3681ae5f84eceb94320472922
Author: Patrick Palka <ppalka@redhat.com>
Date:   Mon Sep 27 16:01:10 2021 -0400

    c++: deduction guides and ttp rewriting [PR102479]
    
    The problem here is ultimately that rewrite_tparm_list when rewriting a
    TEMPLATE_TEMPLATE_PARM introduces a tree cycle in the rewritten
    ttp that structural_comptypes can't cope with.  In particular the
    DECL_TEMPLATE_PARMS of a ttp's TEMPLATE_DECL normally captures an empty
    parameter list at its own level (and so the TEMPLATE_DECL doesn't appear
    in its own DECL_TEMPLATE_PARMS), but rewrite_tparm_list ends up giving
    it a complete parameter list.  In the new testcase below, this causes
    infinite recursion from structural_comptypes when comparing Tmpl<char>
    with Tmpl<long> (where both 'Tmpl's are rewritten ttps).
    
    This patch fixes this by making rewrite_template_parm give a rewritten
    template template parm an empty parameter list at its own level, thereby
    avoiding the tree cycle.  Testing the alias CTAD case revealed that
    we're not setting current_template_parms in alias_ctad_tweaks, which
    this patch also fixes.
    
            PR c++/102479
    
    gcc/cp/ChangeLog:
    
            * pt.c (rewrite_template_parm): Handle single-level tsubst_args.
            Avoid a tree cycle when assigning the DECL_TEMPLATE_PARMS for a
            rewritten ttp.
            (alias_ctad_tweaks): Set current_template_parms accordingly.
    
    gcc/testsuite/ChangeLog:
    
            * g++.dg/cpp1z/class-deduction12.C: Also test alias CTAD in the
            same way.
            * g++.dg/cpp1z/class-deduction99.C: New test.

commit 83668368607ac70dcce466a54673bbf88d0ab2da
Author: Aldy Hernandez <aldyh@redhat.com>
Date:   Sat Sep 25 09:28:10 2021 +0200

    Minor cleanups to solver.
    
    These are some minor cleanups and renames that surfaced after the
    hybrid_threader work.
    
    gcc/ChangeLog:
    
            * gimple-range-path.cc
            (path_range_query::precompute_ranges_in_block): Rename to...
            (path_range_query::compute_ranges_in_block): ...this.
            (path_range_query::precompute_ranges): Rename to...
            (path_range_query::compute_ranges): ...this.
            (path_range_query::precompute_relations): Rename to...
            (path_range_query::compute_relations): ...this.
            (path_range_query::precompute_phi_relations): Rename to...
            (path_range_query::compute_phi_relations): ...this.
            * gimple-range-path.h: Rename precompute* to compute*.
            * tree-ssa-threadbackward.c
            (back_threader::find_taken_edge_switch): Same.
            (back_threader::find_taken_edge_cond): Same.
            * tree-ssa-threadedge.c
            (hybrid_jt_simplifier::compute_ranges_from_state): Same.
            (hybrid_jt_state::register_equivs_stmt): Inline...
            * tree-ssa-threadedge.h: ...here.

commit 4ef1e524fd87a679f5da06116029c66a84daac80
Author: Aldy Hernandez <aldyh@redhat.com>
Date:   Fri Sep 24 18:39:47 2021 +0200

    Remove old VRP jump threader code.
    
    There's a lot of code that melts away without the ASSERT_EXPR based jump
    threader.  Also, I cleaned up the include files as part of the process.
    
    gcc/ChangeLog:
    
            * tree-vrp.c (lhs_of_dominating_assert): Remove.
            (class vrp_jt_state): Remove.
            (class vrp_jt_simplifier): Remove.
            (vrp_jt_simplifier::simplify): Remove.
            (class vrp_jump_threader): Remove.
            (vrp_jump_threader::vrp_jump_threader): Remove.
            (vrp_jump_threader::~vrp_jump_threader): Remove.
            (vrp_jump_threader::before_dom_children): Remove.
            (vrp_jump_threader::after_dom_children): Remove.

commit 0288527f47cec6698b31ccb3210816415506009e
Author: Aldy Hernandez <aldyh@redhat.com>
Date:   Tue Sep 21 10:27:53 2021 +0200

    Replace VRP threader with a hybrid forward threader.
    
    This patch implements the new hybrid forward threader and replaces the
    embedded VRP threader with it.
    
    With all the pieces that have gone in, the implementation of the hybrid
    threader is straightforward: convert the current state into
    SSA imports that the solver will understand, and let the path solver
    precompute ranges and relations for the path.  After this setup is done,
    we can use the range_query API to solve gimple statements in the threader.
    The forward threader is now engine agnostic so there are no changes to
    the threader per se.
    
    I have put the hybrid bits in tree-ssa-threadedge.*, instead of VRP,
    because they will also be used in the evrp removal of the DOM/threader,
    which is my next task.
    
    Most of the patch, is actually test changes.  I have gone through every
    single one and verified that we're correct.  Most were trivial dump
    file name changes, but others required going through the IL an
    certifying that the different IL was expected.
    
    For example, in pr59597.c, we have one less thread because the
    ASSERT_EXPR was getting in the way, and making it seem like things were
    not crossing loops.  The hybrid threader sees the correct representation
    of the IL, and avoids threading this one case.
    
    The final numbers are a 12.16% improvement in jump threads immediately
    after VRP, and a 0.82% improvement in overall jump threads.  The
    performance drop is 0.6% (plus the 1.43% hit from moving the embedded
    threader into its own pass).  As I've said, I'd prefer to keep the
    threader in its own pass, but if this is an issue, we can address this
    with a shared ranger when VRP is replaced with an evrp instance
    (upcoming).
    
    Note, that these numbers are slightly different than what I originally
    posted.  A few correctness tweaks, plus restricting loop threads, made
    the difference.  That being said, I was aiming for par.  A 12% gain is
    just gravy ;-).  When we merge the threaders, we should see even better
    numbers-- and we'll have the benefit of an entire release stress testing
    the solver.
    
    As I mentioned in my introductory note, paths ending in MEM_REF
    conditional are missing.  In reality, this didn't make a difference, as
    it was so rare.  However, as a follow-up, I will distill a test and add
    a suitable PR to keep us honest.
    
    There is a one-line change to libgomp/team.c silencing a new used
    uninitialized warning.  As my previous work with the threaders has
    shown, warnings flare up after each improvement to jump threading.  I
    expect this to be no different.  I've promised Jakub to investigate
    fully, so I will analyze and add the appropriate PR for the warning
    experts.
    
    Oh yeah, the new pass dump is called vrp-threader[12] to match each
    VRP[12] pass.  However, there's no reason for it to either be named
    vrp-threader, or for it to live in tree-vrp.c.
    
    Tested on x86-64 Linux.
    
    OK?
    
    p.s. "Did I say 5 weeks?  My bad, I meant 5 months."
    
    gcc/ChangeLog:
    
            * passes.def (pass_vrp_threader): New.
            * tree-pass.h (make_pass_vrp_threader): Add make_pass_vrp_threader.
            * tree-ssa-threadedge.c (hybrid_jt_state::register_equivs_stmt): New.
            (hybrid_jt_simplifier::hybrid_jt_simplifier): New.
            (hybrid_jt_simplifier::simplify): New.
            (hybrid_jt_simplifier::compute_ranges_from_state): New.
            * tree-ssa-threadedge.h (class hybrid_jt_state): New.
            (class hybrid_jt_simplifier): New.
            * tree-vrp.c (execute_vrp): Remove ASSERT_EXPR based jump
            threader.
            (class hybrid_threader): New.
            (hybrid_threader::hybrid_threader): New.
            (hybrid_threader::~hybrid_threader): New.
            (hybrid_threader::before_dom_children): New.
            (hybrid_threader::after_dom_children): New.
            (execute_vrp_threader): New.
            (class pass_vrp_threader): New.
            (make_pass_vrp_threader): New.
    
    libgomp/ChangeLog:
    
            * team.c: Initialize start_data.
            * testsuite/libgomp.graphite/force-parallel-4.c: Adjust.
            * testsuite/libgomp.graphite/force-parallel-8.c: Adjust.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.dg/torture/pr55107.c: Adjust.
            * gcc.dg/tree-ssa/phi_on_compare-1.c: Adjust.
            * gcc.dg/tree-ssa/phi_on_compare-2.c: Adjust.
            * gcc.dg/tree-ssa/phi_on_compare-3.c: Adjust.
            * gcc.dg/tree-ssa/phi_on_compare-4.c: Adjust.
            * gcc.dg/tree-ssa/pr21559.c: Adjust.
            * gcc.dg/tree-ssa/pr59597.c: Adjust.
            * gcc.dg/tree-ssa/pr61839_1.c: Adjust.
            * gcc.dg/tree-ssa/pr61839_3.c: Adjust.
            * gcc.dg/tree-ssa/pr71437.c: Adjust.
            * gcc.dg/tree-ssa/ssa-dom-thread-11.c: Adjust.
            * gcc.dg/tree-ssa/ssa-dom-thread-16.c: Adjust.
            * gcc.dg/tree-ssa/ssa-dom-thread-18.c: Adjust.
            * gcc.dg/tree-ssa/ssa-dom-thread-2a.c: Adjust.
            * gcc.dg/tree-ssa/ssa-dom-thread-4.c: Adjust.
            * gcc.dg/tree-ssa/ssa-thread-14.c: Adjust.
            * gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Adjust.
            * gcc.dg/tree-ssa/vrp106.c: Adjust.
            * gcc.dg/tree-ssa/vrp55.c: Adjust.

commit dd11aab6463880c35d942c4a4fd346fdaeeb8e72
Author: Martin Liska <mliska@suse.cz>
Date:   Mon Sep 6 17:40:16 2021 +0200

    Come up with section_flag enum.
    
    gcc/ChangeLog:
    
            * output.h (enum section_flag): New.
            (SECTION_FORGET): Remove.
            (SECTION_ENTSIZE): Make it (1UL << 8) - 1.
            (SECTION_STYLE_MASK): Define it based on other enum
            values.
            * varasm.c (switch_to_section): Remove unused handling of
            SECTION_FORGET.

commit a64697d7a3e0bf9e5b0d79e253f2b7dc3eb2fb00
Author: Martin Liska <mliska@suse.cz>
Date:   Fri Sep 3 10:53:00 2021 +0200

    flag_complex_method: support optimize attribute
    
    gcc/c-family/ChangeLog:
    
            * c-opts.c (c_common_init_options_struct): Set also
              x_flag_default_complex_method.
    
    gcc/ChangeLog:
    
            * common.opt: Add new variable flag_default_complex_method.
            * opts.c (finish_options): Handle flags related to
              x_flag_complex_method.
            * toplev.c (process_options): Remove option handling related
            to flag_complex_method.
    
    gcc/go/ChangeLog:
    
            * go-lang.c (go_langhook_init_options_struct): Set also
              x_flag_default_complex_method.
    
    gcc/lto/ChangeLog:
    
            * lto-lang.c (lto_init_options_struct): Set also
              x_flag_default_complex_method.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.c-torture/compile/attr-complex-method-2.c: New test.
            * gcc.c-torture/compile/attr-complex-method.c: New test.

commit 3e6a511b94fd653d8d03491eae20307bd27b8f8e
Author: Vincent Lefevre <vincent@vinc17.net>
Date:   Mon Sep 27 10:56:14 2021 -0400

    Update pathname for IBM long double description.
    
    include/
            * floatformat.h: Update pathname for IBM long double description.

commit d06dc8a2c73735e9496f434787ba4c93ceee5eea
Author: Richard Biener <rguenther@suse.de>
Date:   Mon Sep 27 13:36:12 2021 +0200

    middle-end/102450 - avoid type_for_size for non-existing modes
    
    This avoids asking type_for_size for types with sizes for which
    no scalar integer mode exists.  Instead the following uses
    int_mode_for_size to get the same result.
    
    2021-09-27  Richard Biener  <rguenther@suse.de>
    
            PR middle-end/102450
            * gimple-fold.c (gimple_fold_builtin_memory_op): Avoid using
            type_for_size, instead use int_mode_for_size.

commit da1f6391b7c255e4e2eea983832120eff4f7d3df
Author: Tobias Burnus <tobias@codesourcery.com>
Date:   Mon Sep 27 14:33:39 2021 +0200

    libgomp.oacc-fortran/privatized-ref-2.f90: Fix dg-note
    
    In my last commit, r12-3897-g00f6de9c69119594f7dad3bd525937c94c8200d0,
    which inlined array-size code, I had to update the expected output.  However,
    in doing so, I accidentally (copy'n'paste) changed dg-note into dg-message.
    
    libgomp/
            * testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Change
            dg-message back to dg-note.

commit 00f6de9c69119594f7dad3bd525937c94c8200d0
Author: Tobias Burnus <tobias@codesourcery.com>
Date:   Mon Sep 27 14:04:54 2021 +0200

    Fortran: Fix assumed-size to assumed-rank passing [PR94070]
    
    This code inlines the size0 and size1 libgfortran calls, the former is still
    used by libgfortan itself (and by old code). Besides permitting more
    optimizations, it also permits to handle assumed-rank dummies better: If the
    dummy argument is a nonpointer/nonallocatable, an assumed-size actual arg is
    repesented by having ubound == -1 for the last dimension. However, for
    allocatable/pointers, this value can also exist. Hence, the dummy arg attr
    has to be honored.
    
    For that reason, when calling an assumed-rank procedure with nonpointer,
    nonallocatable dummy arguments, the bounds have to be updated to avoid
    the case ubound == -1 for the last dimension.
    
            PR fortran/94070
    
    gcc/fortran/ChangeLog:
    
            * trans-array.c (gfc_tree_array_size): New function to
            find size inline (whole array or one dimension).
            (array_parameter_size): Use it, take stmt_block as arg.
            (gfc_conv_array_parameter): Update call.
            * trans-array.h (gfc_tree_array_size): Add prototype.
            * trans-decl.c (gfor_fndecl_size0, gfor_fndecl_size1): Remove
            these global vars.
            (gfc_build_intrinsic_function_decls): Remove their initialization.
            * trans-expr.c (gfc_conv_procedure_call): Update
            bounds of pointer/allocatable actual args to nonallocatable/nonpointer
            dummies to be one based.
            * trans-intrinsic.c (gfc_conv_intrinsic_shape): Fix case for
            assumed rank with allocatable/pointer dummy.
            (gfc_conv_intrinsic_size): Update to use inline function.
            * trans.h (gfor_fndecl_size0, gfor_fndecl_size1): Remove var decl.
    
    libgfortran/ChangeLog:
    
            * intrinsics/size.c (size0, size1): Comment that now not
            used by newer compiler code.
    
    libgomp/ChangeLog:
    
            * testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Update
            expected dg-note output.
    
    gcc/testsuite/ChangeLog:
    
            * gfortran.dg/c-interop/cf-out-descriptor-6.f90: Remove xfail.
            * gfortran.dg/c-interop/size.f90: Remove xfail.
            * gfortran.dg/intrinsic_size_3.f90: Update scan-tree-dump-times.
            * gfortran.dg/transpose_optimization_2.f90: Likewise.
            * gfortran.dg/size_optional_dim_1.f90: Add scan-tree-dump-not.
            * gfortran.dg/assumed_rank_22.f90: New test.
            * gfortran.dg/assumed_rank_22_aux.c: New test.

commit 76773d3fea4daaaf5b0f6d79d9f48ffe6b3c97fd
Author: Andrew Pinski <apinski@marvell.com>
Date:   Sun Sep 26 05:44:58 2021 +0000

    Fix PR c/94726: ICE with __builtin_shuffle and changing of types
    
    The problem here is __builtin_shuffle when called with two arguments
    instead of 1, uses a SAVE_EXPR to put in for the 1st and 2nd operand
    of VEC_PERM_EXPR and when we go and gimplify the SAVE_EXPR, the type
    is now error_mark_node and that fails hard.
    This fixes the problem by adding a simple check for type of operand
    of SAVE_EXPR not to be error_mark_node.
    
    OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.
    
    gcc/ChangeLog:
    
            PR c/94726
            * gimplify.c (gimplify_save_expr): Return early
            if the type of val is error_mark_node.
    
    gcc/testsuite/ChangeLog:
    
            PR c/94726
            * gcc.dg/pr94726.c: New test.

commit d5f8abe1d3f718a75cbff0a453c1d961be5939b7
Author: Aldy Hernandez <aldyh@redhat.com>
Date:   Mon Sep 27 09:20:56 2021 +0200

    Use on-demand ranges in ssa_name_has_boolean_range before querying nonzero bits.
    
    The function ssa_name_has_boolean_range looks at the nonzero bits stored
    in SSA_NAME_RANGE_INFO.  These are global in nature and are the result
    of a previous evrp/VRP run (technically other passes can also set them).
    
    However, we can do better if we use get_range_query.  Doing so will use
    a ranger if enabled in a pass, or global ranges otherwise.  The call to
    get_nonzero_bits remains, as there are passes that will set them
    independently of the global range info.
    
    Tested on x86-64 Linux with a regstrap as well as in a DOM environment
    using an on-demand ranger instead of evrp.
    
    gcc/ChangeLog:
    
            * tree-ssanames.c (ssa_name_has_boolean_range): Use
            get_range_query.

commit e1d01f4973eee8d229ddc326ff7c3bd5f4cf32c1
Author: Aldy Hernandez <aldyh@redhat.com>
Date:   Sat Sep 25 13:02:21 2021 +0200

    Convert some evrp uses in DOM to the range_query API.
    
    DOM is the last remaining user of the evrp engine.  This patch converts
    a few uses of the engine and vr-values into the new API.
    
    There is one subtle change.  The call to vr_value's
    op_with_constant_singleton_value_range can theoretically return
    non-constants, unlike the range_query API which only returns constants.
    In this particular case it doesn't matter because the symbolic stuff will
    have been handled by the const_and_copies/avail_exprs read in the
    SSA_NAME_VALUE copy immediately before.  I have verified this is the case
    by asserting that all calls to op_with_constant_singleton_value_range at
    this point return either NULL or an INTEGER_CST.
    
    Tested on x86-64 Linux with a regstrap, as well as the aforementioned
    assert.
    
    gcc/ChangeLog:
    
            * gimple-ssa-evrp-analyze.h (class evrp_range_analyzer): Remove
            vrp_visit_cond_stmt.
            * tree-ssa-dom.c (cprop_operand): Convert to range_query API.
            (cprop_into_stmt): Same.
            (dom_opt_dom_walker::optimize_stmt): Same.

commit 6390c5047adb75960f86d56582e6322aaa4d9281
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Nov 18 09:36:57 2020 +0100

    Allow different vector types for stmt groups
    
    This allows vectorization (in practice non-loop vectorization) to
    have a stmt participate in different vector type vectorizations.
    It allows us to remove vect_update_shared_vectype and replace it
    by pushing/popping STMT_VINFO_VECTYPE from SLP_TREE_VECTYPE around
    vect_analyze_stmt and vect_transform_stmt.
    
    For data-ref the situation is a bit more complicated since we
    analyze alignment info with a specific vector type in mind which
    doesn't play well when that changes.
    
    So the bulk of the change is passing down the actual vector type
    used for a vectorized access to the various accessors of alignment
    info, first and foremost dr_misalignment but also aligned_access_p,
    known_alignment_for_access_p, vect_known_alignment_in_bytes and
    vect_supportable_dr_alignment.  I took the liberty to replace
    ALL_CAPS macro accessors with the lower-case function invocations.
    
    The actual changes to the behavior are in dr_misalignment which now
    is the place factoring in the negative step adjustment as well as
    handling alignment queries for a vector type with bigger alignment
    requirements than what we can (or have) analyze(d).
    
    vect_slp_analyze_node_alignment makes use of this and upon receiving
    a vector type with a bigger alingment desire re-analyzes the DR
    with respect to it but keeps an older more precise result if possible.
    In this context it might be possible to do the analysis just once
    but instead of analyzing with respect to a specific desired alignment
    look for the biggest alignment we can compute a not unknown alignment.
    
    The ChangeLog includes the functional changes but not the bulk due
    to the alignment accessor API changes - I hope that's something good.
    
    2021-09-17  Richard Biener  <rguenther@suse.de>
    
            PR tree-optimization/97351
            PR tree-optimization/97352
            PR tree-optimization/82426
            * tree-vectorizer.h (dr_misalignment): Add vector type
            argument.
            (aligned_access_p): Likewise.
            (known_alignment_for_access_p): Likewise.
            (vect_supportable_dr_alignment): Likewise.
            (vect_known_alignment_in_bytes): Likewise.  Refactor.
            (DR_MISALIGNMENT): Remove.
            (vect_update_shared_vectype): Likewise.
            * tree-vect-data-refs.c (dr_misalignment): Refactor, handle
            a vector type with larger alignment requirement and apply
            the negative step adjustment here.
            (vect_calculate_target_alignment): Remove.
            (vect_compute_data_ref_alignment): Get explicit vector type
            argument, do not apply a negative step alignment adjustment
            here.
            (vect_slp_analyze_node_alignment): Re-analyze alignment
            when we re-visit the DR with a bigger desired alignment but
            keep more precise results from smaller alignments.
            * tree-vect-slp.c (vect_update_shared_vectype): Remove.
            (vect_slp_analyze_node_operations_1): Do not update the
            shared vector type on stmts.
            * tree-vect-stmts.c (vect_analyze_stmt): Push/pop the
            vector type of an SLP node to the representative stmt-info.
            (vect_transform_stmt): Likewise.
    
            * gcc.target/i386/vect-pr82426.c: New testcase.
            * gcc.target/i386/vect-pr97352.c: Likewise.

commit e7b8d7020052110e5717230104e647f6235dd2c1
Author: liuhongt <hongtao.liu@intel.com>
Date:   Mon Sep 27 14:57:02 2021 +0800

    Revert "Optimize v4sf reduction.".
    
    This reverts commit 8f323c712ea76cc4506b03895e9b991e4e4b2baf.
    
         PR target/102473
         PR target/101059
Comment 2 Martin Liška 2021-10-26 14:49:47 UTC
Yes, I can confirm also other testers reports slower compilation of WRF:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=2.270.8

I know the benchmark contains a huge config module that delays the compilation rapidly. Moreover, one ltrans always runs much longer than the others.
Comment 3 Martin Liška 2021-10-26 14:53:54 UTC
I'm going to bisect that.
Comment 4 Andrew Pinski 2021-10-26 14:57:23 UTC
Dup of bug 102943.

*** This bug has been marked as a duplicate of bug 102943 ***