[Bug tree-optimization/108601] [13 Regression] vector peeling ICEs with VLA in gcc_r in SPEC2017 since g:c13223b790bbc5e4a3f5605e057eac59b61b2c85

Tue Jan 31 20:24:32 GMT 2023

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108601

Tamar Christina <tnfchris at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |aarch64*
            Summary|[13 Regression] vector      |[13 Regression] vector
                   |peeling ICEs with PGO + LTO |peeling ICEs with VLA in
                   |+ IPA inlining in gcc_r in  |gcc_r in SPEC2017 since
                   |SPEC2017                    |g:c13223b790bbc5e4a3f5605e0
                   |                            |57eac59b61b2c85

--- Comment #7 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #1)
> So here is how I would tackle this:
> Put all the needed .i/.ii files in a response file.
> 
> 
> $CC -c @files @options
> $CC -r -o file.o @fileso @options 
> 
> Since this is only at profile generated stage it is not as hard ...
> Then start by reducing the needed .o files in `fileso` .
> When that is finished. Update `files` to match `fileso`.
> and then run delta (or another automated reducer) over the files in `files`.
> Maybe even change -flto=auto etc.

Thanks! Managed to reduce it to something fairly simple.

Repro:

----

decode_options() {
  int flag = 1;
  for (; flag <= 1 << 21; flag <<= 1)
    ;
}

----

compile with gcc -fprofile-generate -mcpu=neoverse-v1 -Ofast opts.i

I also did a bisect and indeed it landed on

commit c13223b790bbc5e4a3f5605e057eac59b61b2c85
Author: liuhongt <hongtao.liu@intel.com>
Date:   Thu Aug 4 09:04:22 2022 +0800

    Extend vectorizer to handle nonlinear induction for neg, mul/lshift/rshift
with a constant.

    For neg, the patch create a vec_init as [ a, -a, a, -a, ...  ] and no
    vec_step is needed to update vectorized iv since vf is always multiple
    of 2(negative * negative is positive).

    For shift, the patch create a vec_init as [ a, a >> c, a >> 2*c, ..]
    as vec_step as [ c * nunits, c * nunits, c * nunits, ... ], vectorized iv
is
    updated as vec_def = vec_init >>/<< vec_step.

    For mul, the patch create a vec_init as [ a, a * c, a * pow(c, 2), ..]
    as vec_step as [ pow(c,nunits), pow(c,nunits),...] iv is updated as vec_def
=
    vec_init * vec_step.

    The patch handles nonlinear iv for
    1. Integer type only, floating point is not handled.
    2. No slp_node.
    3. iv_loop should be same as vector loop, not nested loop.
    4. No UD is created, for mul, use unsigned mult to avoid UD, for
       shift, shift count should be less than type precision.