Bug 80928 - SLP vectorization does not handle induction in outer loop vectorization
Summary: SLP vectorization does not handle induction in outer loop vectorization
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 7.1.0
: P3 normal
Target Milestone: ---
Assignee: Richard Biener
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2017-05-31 13:24 UTC by Richard Biener
Modified: 2020-11-03 12:41 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2017-05-31 00:00:00


Attachments
32-bit sparc-sun-solaris2.12 slp-13.c.156t.vect (14.93 KB, application/x-bzip)
2017-06-07 08:46 UTC, Rainer Orth
Details
patch for the vectorizer ICE (969 bytes, patch)
2017-06-22 14:28 UTC, Richard Biener
Details | Diff
patch that seems to work (1.46 KB, patch)
2017-06-23 11:40 UTC, Richard Biener
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2017-05-31 13:24:38 UTC
int a[1024];
void foo (int n)
{
  for (int i = 0; i < 1020; i += 5)
    {
      a[i] = i;
      a[i+1] = i;
      a[i+2] = i;
      a[i+3] = i;
      a[i+4] = i;
    }
}

is not vectorized.

t.c:4:3: note: === vect_analyze_slp ===
t.c:4:3: note: Build SLP for a[i_17] = i_17;
t.c:4:3: note: Build SLP for a[_1] = i_17;
t.c:4:3: note: Build SLP for a[_2] = i_17;
t.c:4:3: note: Build SLP for a[_3] = i_17;
t.c:4:3: note: Build SLP for a[_4] = i_17;
t.c:4:3: note: vect_is_simple_use: operand i_17
t.c:4:3: note: def_stmt: i_17 = PHI <i_13(4), 0(2)>
t.c:4:3: note: type of def: induction
t.c:4:3: note: Build SLP failed: illegal type of def i_17

that's because we do not handle inductions (neither during SLP discovery
nor later during code-gen).

      /* Check the types of the definitions.  */
      switch (dt)
        {
        case vect_constant_def:
        case vect_external_def:
        case vect_reduction_def:
          break;

        case vect_internal_def:
          oprnd_info->def_stmts.quick_push (def_stmt);
          break;

        default:
          /* FORNOW: Not supported.  */
          if (dump_enabled_p ())
            {
              dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                               "Build SLP failed: illegal type of def ");
              dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, oprnd);
              dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
            }

          return -1;
        }
Comment 1 Richard Biener 2017-05-31 13:28:03 UTC
Mine.
Comment 2 Richard Biener 2017-06-06 07:37:46 UTC
Author: rguenth
Date: Tue Jun  6 07:37:14 2017
New Revision: 248909

URL: https://gcc.gnu.org/viewcvs?rev=248909&root=gcc&view=rev
Log:
2017-06-06  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/80928
	* tree-vect-loop.c (vect_update_vf_for_slp): Amend dumps.
	(vect_analyze_loop_operations): Properly guard analysis for
	pure SLP case.
	(vect_transform_loop): Likewise.
	(vect_analyze_loop_2): Also reset SLP type on PHIs.
	(vect_model_induction_cost): Do not cost for pure SLP.
	(vectorizable_induction): Pass in SLP node, implement SLP vectorization
	of induction in inner loop vectorization.
	* tree-vect-slp.c (vect_create_new_slp_node): Handle PHIs.
	(vect_get_and_check_slp_defs): Handle vect_induction_def.
	(vect_build_slp_tree): Likewise.  Handle PHIs as terminating the
	recursion.
	(vect_analyze_slp_cost_1): Cost induction.
	(vect_detect_hybrid_slp_stmts): Handle PHIs.
	(vect_get_slp_vect_defs): Likewise.
	* tree-vect-stmts.c (vect_analyze_stmt): Handle induction.
	(vect_transform_stmt): Handle SLP reductions.
	* tree-vectorizer.h (vectorizable_induction): Adjust.

	* gcc.dg/vect/pr80928.c: New testcase.
	* gcc.dg/vect/slp-13-big-array.c: Remove XFAILs.
	* gcc.dg/vect/slp-13.c: Likewise.
	* gcc.dg/vect/slp-perm-9.c: Prevent vectorization of check loop.

Added:
    trunk/gcc/testsuite/gcc.dg/vect/pr80928.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c
    trunk/gcc/testsuite/gcc.dg/vect/slp-13.c
    trunk/gcc/testsuite/gcc.dg/vect/slp-perm-9.c
    trunk/gcc/tree-vect-loop.c
    trunk/gcc/tree-vect-slp.c
    trunk/gcc/tree-vect-stmts.c
    trunk/gcc/tree-vectorizer.h
Comment 3 Richard Biener 2017-06-06 08:59:00 UTC
So mostly fixed now with outer loop vectorization support still missing.  Adjusting summary accordingly.
Comment 4 Christophe Lyon 2017-06-06 14:21:53 UTC
This patch (r248909) caused regressions on arm/aarch64:
- PASS now FAIL             [PASS => FAIL]:

Executed from: gcc.dg/vect/vect.exp
  gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorizing stmts using SLP" 0
  gcc.dg/vect/slp-perm-8.c scan-tree-dump-times vect "vectorizing stmts using SLP" 0
  gcc.dg/vect/slp-perm-9.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorized 1 loops" 1
  gcc.dg/vect/slp-perm-9.c scan-tree-dump-times vect "vectorized 1 loops" 1
Comment 5 Rainer Orth 2017-06-07 08:42:44 UTC
The patch also caused a couple of regressions on i386-pc-solaris2.12:

+FAIL: gcc.dg/vect/slp-perm-8.c (internal compiler error)
+FAIL: gcc.dg/vect/slp-perm-8.c (test for excess errors)
+FAIL: gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects (internal compiler error
)
+FAIL: gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects (test for excess errors)
+WARNING: gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects compilation failed to
 produce executable
+WARNING: gcc.dg/vect/slp-perm-8.c compilation failed to produce executable

Excess errors:
during GIMPLE pass: vect
dump file: slp-perm-8.c.156t.vect
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/slp-perm-8.c:25:5: internal compiler error: in operator[], at vec.h:729
0x8b58359 vec<edge_def*, va_gc, vl_embed>::operator[](unsigned int)
        /vol/gcc/src/hg/trunk/local/gcc/vec.h:729
0x8b58359 gimple_phi_arg_edge
        /vol/gcc/src/hg/trunk/local/gcc/gimple.h:4398
0x8b58359 dump_gimple_phi
        /vol/gcc/src/hg/trunk/local/gcc/gimple-pretty-print.c:2185
0x8b5a668 print_gimple_stmt(__FILE*, gimple*, int, unsigned long long)
        /vol/gcc/src/hg/trunk/local/gcc/gimple-pretty-print.c:117
0x8a254c5 dump_gimple_stmt(unsigned long long, unsigned long long, gimple*, int)
        /vol/gcc/src/hg/trunk/local/gcc/dumpfile.c:340
0x90750dd vect_schedule_slp_instance
        /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3680
0x9074f6f vect_schedule_slp_instance
        /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3641
0x9074f6f vect_schedule_slp_instance
        /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3641
0x9074f6f vect_schedule_slp_instance
        /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3641
0x9075861 vect_schedule_slp(vec_info*)
        /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3834
0x905a2ba vect_transform_loop(_loop_vec_info*)
        /vol/gcc/src/hg/trunk/local/gcc/tree-vect-loop.c:7151
0x907b4e8 vectorize_loops()
        /vol/gcc/src/hg/trunk/local/gcc/tree-vectorizer.c:690

  32 and 64-bit x86

+FAIL: libgomp.fortran/vla1.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel
-loops -ftracer -finline-functions  (internal compiler error)
+FAIL: libgomp.fortran/vla1.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel
-loops -ftracer -finline-functions  (test for excess errors)
+WARNING: libgomp.fortran/vla1.f90   -O3 -fomit-frame-pointer -funroll-loops -fp
eel-loops -ftracer -finline-functions  compilation failed to produce executable
+FAIL: libgomp.fortran/vla1.f90   -O3 -g  (internal compiler error)
+FAIL: libgomp.fortran/vla1.f90   -O3 -g  (test for excess errors)
+WARNING: libgomp.fortran/vla1.f90   -O3 -g  compilation failed to produce execu
table

  and several more

Excess errors:
during GIMPLE pass: vect
/vol/gcc/src/hg/trunk/local/libgomp/testsuite/libgomp.fortran/vla1.f90:40:0: internal compiler error: in vect_free_slp_tree, at tree-vect-slp.c:62
0x90e874f vect_free_slp_tree
        /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:62
0x90e859d vect_free_slp_tree
        /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:55
0x90e859d vect_free_slp_tree
        /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:55
0x90e859d vect_free_slp_tree
        /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:55
0x90eb870 vect_free_slp_instance(_slp_instance*)
        /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:80
0x90d5678 vect_transform_loop(_loop_vec_info*)
        /vol/gcc/src/hg/trunk/local/gcc/tree-vect-loop.c:7249
0x90f6a48 vectorize_loops()
        /vol/gcc/src/hg/trunk/local/gcc/tree-vectorizer.c:690

  64-bit x86
Comment 6 ro@CeBiTec.Uni-Bielefeld.DE 2017-06-07 08:44:53 UTC
... and also on sparc-sun-solaris2.12:

+FAIL: gcc.dg/vect/slp-13-big-array.c -flto -ffat-lto-objects  scan-tree-dump-ti
mes vect "vectorizing stmts using SLP" 3
+FAIL: gcc.dg/vect/slp-13-big-array.c scan-tree-dump-times vect "vectorizing stm
ts using SLP" 3
+FAIL: gcc.dg/vect/slp-13.c -flto -ffat-lto-objects  scan-tree-dump-times vect "
vectorizing stmts using SLP" 3
+FAIL: gcc.dg/vect/slp-13.c scan-tree-dump-times vect "vectorizing stmts using S
LP" 3

  32 and 64-bit sparc
Comment 7 Rainer Orth 2017-06-07 08:46:30 UTC
Created attachment 41482 [details]
32-bit sparc-sun-solaris2.12 slp-13.c.156t.vect
Comment 8 Richard Biener 2017-06-07 09:10:49 UTC
Author: rguenth
Date: Wed Jun  7 09:10:17 2017
New Revision: 248948

URL: https://gcc.gnu.org/viewcvs?rev=248948&root=gcc&view=rev
Log:
2017-06-07  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/80928
	* gcc.dg/vect/slp-perm-8.c: Avoid vectorizing loop computing
	check_results.

Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
Comment 9 Richard Biener 2017-06-07 09:38:20 UTC
(In reply to ro@CeBiTec.Uni-Bielefeld.DE from comment #6)
> ... and also on sparc-sun-solaris2.12:
> 
> +FAIL: gcc.dg/vect/slp-13-big-array.c -flto -ffat-lto-objects 
> scan-tree-dump-ti
> mes vect "vectorizing stmts using SLP" 3
> +FAIL: gcc.dg/vect/slp-13-big-array.c scan-tree-dump-times vect "vectorizing
> stm
> ts using SLP" 3
> +FAIL: gcc.dg/vect/slp-13.c -flto -ffat-lto-objects  scan-tree-dump-times
> vect "
> vectorizing stmts using SLP" 3
> +FAIL: gcc.dg/vect/slp-13.c scan-tree-dump-times vect "vectorizing stmts
> using S
> LP" 3
> 
>   32 and 64-bit sparc

/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/slp-13.c:18:3: note: not vectorized: relevant stmt not supported: _3 = (short unsigned int) i_316;
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/slp-13.c:18:3: note: removing SLP instance operations starting from: out[_1] = _4;

ok, it needs demotion, will fix.
Comment 10 rguenther@suse.de 2017-06-07 09:38:50 UTC
On Wed, 7 Jun 2017, ro at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80928
> 
> Rainer Orth <ro at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |ro at gcc dot gnu.org
> 
> --- Comment #5 from Rainer Orth <ro at gcc dot gnu.org> ---
> The patch also caused a couple of regressions on i386-pc-solaris2.12:
> 
> +FAIL: gcc.dg/vect/slp-perm-8.c (internal compiler error)
> +FAIL: gcc.dg/vect/slp-perm-8.c (test for excess errors)
> +FAIL: gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects (internal compiler
> error
> )
> +FAIL: gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects (test for excess
> errors)
> +WARNING: gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects compilation failed
> to
>  produce executable
> +WARNING: gcc.dg/vect/slp-perm-8.c compilation failed to produce executable

Can't reproduce with a cross.

> Excess errors:
> during GIMPLE pass: vect
> dump file: slp-perm-8.c.156t.vect
> /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/slp-perm-8.c:25:5:
> internal compiler error: in operator[], at vec.h:729
> 0x8b58359 vec<edge_def*, va_gc, vl_embed>::operator[](unsigned int)
>         /vol/gcc/src/hg/trunk/local/gcc/vec.h:729
> 0x8b58359 gimple_phi_arg_edge
>         /vol/gcc/src/hg/trunk/local/gcc/gimple.h:4398
> 0x8b58359 dump_gimple_phi
>         /vol/gcc/src/hg/trunk/local/gcc/gimple-pretty-print.c:2185
> 0x8b5a668 print_gimple_stmt(__FILE*, gimple*, int, unsigned long long)
>         /vol/gcc/src/hg/trunk/local/gcc/gimple-pretty-print.c:117
> 0x8a254c5 dump_gimple_stmt(unsigned long long, unsigned long long, gimple*,
> int)
>         /vol/gcc/src/hg/trunk/local/gcc/dumpfile.c:340
> 0x90750dd vect_schedule_slp_instance
>         /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3680
> 0x9074f6f vect_schedule_slp_instance
>         /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3641
> 0x9074f6f vect_schedule_slp_instance
>         /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3641
> 0x9074f6f vect_schedule_slp_instance
>         /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3641
> 0x9075861 vect_schedule_slp(vec_info*)
>         /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3834
> 0x905a2ba vect_transform_loop(_loop_vec_info*)
>         /vol/gcc/src/hg/trunk/local/gcc/tree-vect-loop.c:7151
> 0x907b4e8 vectorize_loops()
>         /vol/gcc/src/hg/trunk/local/gcc/tree-vectorizer.c:690
> 
>   32 and 64-bit x86

That is, i686-linux with -m32 / -m64?  I see no issues on
x86_64 with -m32 or -m64.

> +FAIL: libgomp.fortran/vla1.f90   -O3 -fomit-frame-pointer -funroll-loops
> -fpeel
> -loops -ftracer -finline-functions  (internal compiler error)
> +FAIL: libgomp.fortran/vla1.f90   -O3 -fomit-frame-pointer -funroll-loops
> -fpeel
> -loops -ftracer -finline-functions  (test for excess errors)
> +WARNING: libgomp.fortran/vla1.f90   -O3 -fomit-frame-pointer -funroll-loops
> -fp
> eel-loops -ftracer -finline-functions  compilation failed to produce executable
> +FAIL: libgomp.fortran/vla1.f90   -O3 -g  (internal compiler error)
> +FAIL: libgomp.fortran/vla1.f90   -O3 -g  (test for excess errors)
> +WARNING: libgomp.fortran/vla1.f90   -O3 -g  compilation failed to produce
> execu
> table
> 
>   and several more
> 
> Excess errors:
> during GIMPLE pass: vect
> /vol/gcc/src/hg/trunk/local/libgomp/testsuite/libgomp.fortran/vla1.f90:40:0:
> internal compiler error: in vect_free_slp_tree, at tree-vect-slp.c:62
> 0x90e874f vect_free_slp_tree
>         /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:62
> 0x90e859d vect_free_slp_tree
>         /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:55
> 0x90e859d vect_free_slp_tree
>         /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:55
> 0x90e859d vect_free_slp_tree
>         /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:55
> 0x90eb870 vect_free_slp_instance(_slp_instance*)
>         /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:80
> 0x90d5678 vect_transform_loop(_loop_vec_info*)
>         /vol/gcc/src/hg/trunk/local/gcc/tree-vect-loop.c:7249
> 0x90f6a48 vectorize_loops()
>         /vol/gcc/src/hg/trunk/local/gcc/tree-vectorizer.c:690
>
>   64-bit x86
Comment 11 Richard Biener 2017-06-07 09:40:26 UTC
Author: rguenth
Date: Wed Jun  7 09:39:53 2017
New Revision: 248950

URL: https://gcc.gnu.org/viewcvs?rev=248950&root=gcc&view=rev
Log:
2017-06-07  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/80928
	* gcc.dg/vect/slp-13.c: Adjust patterns with vect_pack_trunc.
	* gcc.dg/vect/slp-13-big-array.c: Likewise.

Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c
    trunk/gcc/testsuite/gcc.dg/vect/slp-13.c
Comment 12 ro@CeBiTec.Uni-Bielefeld.DE 2017-06-07 11:54:15 UTC
>> --- Comment #5 from Rainer Orth <ro at gcc dot gnu.org> ---
>> The patch also caused a couple of regressions on i386-pc-solaris2.12:
>> 
>> +FAIL: gcc.dg/vect/slp-perm-8.c (internal compiler error)
>> +FAIL: gcc.dg/vect/slp-perm-8.c (test for excess errors)
>> +FAIL: gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects (internal compiler
>> error
>> )
>> +FAIL: gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects (test for excess
>> errors)
>> +WARNING: gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects compilation failed
>> to
>>  produce executable
>> +WARNING: gcc.dg/vect/slp-perm-8.c compilation failed to produce executable
>
> Can't reproduce with a cross.

I see the same in a i686-pc-linux-gnu build.

>> Excess errors:
>> during GIMPLE pass: vect
>> dump file: slp-perm-8.c.156t.vect
>> /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/slp-perm-8.c:25:5:
>> internal compiler error: in operator[], at vec.h:729
>> 0x8b58359 vec<edge_def*, va_gc, vl_embed>::operator[](unsigned int)
>>         /vol/gcc/src/hg/trunk/local/gcc/vec.h:729
>> 0x8b58359 gimple_phi_arg_edge
>>         /vol/gcc/src/hg/trunk/local/gcc/gimple.h:4398
>> 0x8b58359 dump_gimple_phi
>>         /vol/gcc/src/hg/trunk/local/gcc/gimple-pretty-print.c:2185
>> 0x8b5a668 print_gimple_stmt(__FILE*, gimple*, int, unsigned long long)
>>         /vol/gcc/src/hg/trunk/local/gcc/gimple-pretty-print.c:117
>> 0x8a254c5 dump_gimple_stmt(unsigned long long, unsigned long long, gimple*,
>> int)
>>         /vol/gcc/src/hg/trunk/local/gcc/dumpfile.c:340
>> 0x90750dd vect_schedule_slp_instance
>>         /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3680
>> 0x9074f6f vect_schedule_slp_instance
>>         /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3641
>> 0x9074f6f vect_schedule_slp_instance
>>         /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3641
>> 0x9074f6f vect_schedule_slp_instance
>>         /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3641
>> 0x9075861 vect_schedule_slp(vec_info*)
>>         /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3834
>> 0x905a2ba vect_transform_loop(_loop_vec_info*)
>>         /vol/gcc/src/hg/trunk/local/gcc/tree-vect-loop.c:7151
>> 0x907b4e8 vectorize_loops()
>>         /vol/gcc/src/hg/trunk/local/gcc/tree-vectorizer.c:690
>> 
>>   32 and 64-bit x86
>
> That is, i686-linux with -m32 / -m64?  I see no issues on
> x86_64 with -m32 or -m64.

Both i386-pc-solaris2.12 and i686-pc-linux-gnu.  Haven't tried an x86_64
build yet.

>> +FAIL: libgomp.fortran/vla1.f90   -O3 -fomit-frame-pointer -funroll-loops
>> -fpeel
>> -loops -ftracer -finline-functions  (internal compiler error)
>> +FAIL: libgomp.fortran/vla1.f90   -O3 -fomit-frame-pointer -funroll-loops
>> -fpeel
>> -loops -ftracer -finline-functions  (test for excess errors)
>> +WARNING: libgomp.fortran/vla1.f90   -O3 -fomit-frame-pointer -funroll-loops
>> -fp
>> eel-loops -ftracer -finline-functions  compilation failed to produce executable
>> +FAIL: libgomp.fortran/vla1.f90   -O3 -g  (internal compiler error)
>> +FAIL: libgomp.fortran/vla1.f90   -O3 -g  (test for excess errors)
>> +WARNING: libgomp.fortran/vla1.f90   -O3 -g  compilation failed to produce
>> execu
>> table
>> 
>>   and several more
>> 
>> Excess errors:
>> during GIMPLE pass: vect
>> /vol/gcc/src/hg/trunk/local/libgomp/testsuite/libgomp.fortran/vla1.f90:40:0:
>> internal compiler error: in vect_free_slp_tree, at tree-vect-slp.c:62
>> 0x90e874f vect_free_slp_tree
>>         /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:62
>> 0x90e859d vect_free_slp_tree
>>         /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:55
>> 0x90e859d vect_free_slp_tree
>>         /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:55
>> 0x90e859d vect_free_slp_tree
>>         /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:55
>> 0x90eb870 vect_free_slp_instance(_slp_instance*)
>>         /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:80
>> 0x90d5678 vect_transform_loop(_loop_vec_info*)
>>         /vol/gcc/src/hg/trunk/local/gcc/tree-vect-loop.c:7249
>> 0x90f6a48 vectorize_loops()
>>         /vol/gcc/src/hg/trunk/local/gcc/tree-vectorizer.c:690
>>
>>   64-bit x86

Also on both i386-pc-solaris2.12 -m64 and i686-pc-linux-gnu -m64.

	Rainer
Comment 13 Christophe Lyon 2017-06-07 14:43:02 UTC
(In reply to Richard Biener from comment #8)
> Author: rguenth
> Date: Wed Jun  7 09:10:17 2017
> New Revision: 248948
> 
> URL: https://gcc.gnu.org/viewcvs?rev=248948&root=gcc&view=rev
> Log:
> 2017-06-07  Richard Biener  <rguenther@suse.de>
> 
> 	PR tree-optimization/80928
> 	* gcc.dg/vect/slp-perm-8.c: Avoid vectorizing loop computing
> 	check_results.
> 
> Modified:
>     trunk/gcc/testsuite/ChangeLog
>     trunk/gcc/testsuite/gcc.dg/vect/slp-perm-8.c

After, arm and aarch64 regress:
FAIL: gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorized 1 loops" 2
FAIL: gcc.dg/vect/slp-perm-8.c scan-tree-dump-times vect "vectorized 1 loops" 2

but these improve:
PASS: gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorizing stmts using SLP" 0
PASS: gcc.dg/vect/slp-perm-8.c scan-tree-dump-times vect "vectorizing stmts using SLP" 0
Comment 14 Richard Biener 2017-06-08 07:33:25 UTC
Author: rguenth
Date: Thu Jun  8 07:32:52 2017
New Revision: 249004

URL: https://gcc.gnu.org/viewcvs?rev=249004&root=gcc&view=rev
Log:
2017-06-08  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/80928
	* gcc.dg/vect/slp-perm-8.c: Do not expect check loop to be vectorized.

Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
Comment 15 Rainer Orth 2017-06-21 07:50:37 UTC
Richard,

do you have the i686-pc-linux-gnu/i386-pc-solaris2.* libgomp ICEs with -m64
on the radar?  They still happen as of r249422.

Thanks.
  Rainer
Comment 16 rguenther@suse.de 2017-06-21 08:03:15 UTC
On Wed, 21 Jun 2017, ro at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80928
> 
> --- Comment #15 from Rainer Orth <ro at gcc dot gnu.org> ---
> Richard,
> 
> do you have the i686-pc-linux-gnu/i386-pc-solaris2.* libgomp ICEs with -m64
> on the radar?  They still happen as of r249422.

I tried hard to reproduce but failed so yes, on my radar but nothing I can 
do about :/

If you can direct me to a CF machine that reproduces the issue that
would be nice.
Comment 17 ro@CeBiTec.Uni-Bielefeld.DE 2017-06-21 08:08:18 UTC
> --- Comment #16 from rguenther at suse dot de <rguenther at suse dot de> ---
> I tried hard to reproduce but failed so yes, on my radar but nothing I can 
> do about :/
>
> If you can direct me to a CF machine that reproduces the issue that
> would be nice.

Weird: I have it in my regular i686-pc-linux-gnu builds on Fedora 25
(Xeon X7542), only for -m64.  Doesn't happen for an x86_64-pc-linux-gnu
compiler on the same box, though.

I'll see if I can find a cfarm box where it reproduces.
Comment 18 rguenther@suse.de 2017-06-21 08:21:39 UTC
On Wed, 21 Jun 2017, ro at CeBiTec dot Uni-Bielefeld.DE wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80928
> 
> --- Comment #17 from ro at CeBiTec dot Uni-Bielefeld.DE <ro at CeBiTec dot Uni-Bielefeld.DE> ---
> > --- Comment #16 from rguenther at suse dot de <rguenther at suse dot de> ---
> > I tried hard to reproduce but failed so yes, on my radar but nothing I can 
> > do about :/
> >
> > If you can direct me to a CF machine that reproduces the issue that
> > would be nice.
> 
> Weird: I have it in my regular i686-pc-linux-gnu builds on Fedora 25
> (Xeon X7542), only for -m64.  Doesn't happen for an x86_64-pc-linux-gnu
> compiler on the same box, though.
> 
> I'll see if I can find a cfarm box where it reproduces.

I have not yet built a native i686 compiler with 64bit support but only
tried a x86_64 -> i686 cross with 64bit support where it doesn't
reproduce.

Native builds for i686 on a x86_64 host are always a bit odd to produce.
Comment 19 ro@CeBiTec.Uni-Bielefeld.DE 2017-06-21 08:31:34 UTC
> --- Comment #18 from rguenther at suse dot de <rguenther at suse dot de> ---
[...]
> I have not yet built a native i686 compiler with 64bit support but only
> tried a x86_64 -> i686 cross with 64bit support where it doesn't
> reproduce.
>
> Native builds for i686 on a x86_64 host are always a bit odd to produce.

Indeed, probably worth documenting the procedure somewhere.  I believe
what you need is to configure with

* CC='gcc -m32' 'CXX='g++ -m32'

* --enable-targets=all

* --build=i686-pc-linux-gnu --host=i686-pc-linux-gnu --target=i686-pc-linux-gnu

  so configure doesn't conclude it's a cross

* using 32-bit gas and gld helps, but isn't required, I believe (it gets
  a few assembler configure tests wrong where the 32-bit and 64-bit
  results differ)

However, it's easier than having to have a separate machine with a
32-bit kernel around (if such beasts still exist)...

	Rainer
Comment 20 ro@CeBiTec.Uni-Bielefeld.DE 2017-06-21 09:53:35 UTC
> --- Comment #18 from rguenther at suse dot de <rguenther at suse dot de> ---
[...]
> I have not yet built a native i686 compiler with 64bit support but only
> tried a x86_64 -> i686 cross with 64bit support where it doesn't
> reproduce.

Now that you mention it, the ICE doesn't occur on both
x86_64-pc-linux-gnu and amd64-pc-solaris compilers, but only in the
i?86-*-* ones.

	Rainer
Comment 21 Rainer Orth 2017-06-22 11:19:12 UTC
(In reply to ro@CeBiTec.Uni-Bielefeld.DE from comment #19)
> > --- Comment #18 from rguenther at suse dot de <rguenther at suse dot de> ---
> > Native builds for i686 on a x86_64 host are always a bit odd to produce.
> 
> Indeed, probably worth documenting the procedure somewhere.  I believe
> what you need is to configure with
> 
> * CC='gcc -m32' 'CXX='g++ -m32'
> 
> * --enable-targets=all
> 
> * --build=i686-pc-linux-gnu --host=i686-pc-linux-gnu
> --target=i686-pc-linux-gnu
> 
>   so configure doesn't conclude it's a cross
> 
> * using 32-bit gas and gld helps, but isn't required, I believe (it gets
>   a few assembler configure tests wrong where the 32-bit and 64-bit
>   results differ)

In fact this isn't necessary: on a Fedora 25 system with bundled 64-bit binutils
2.26.1, there's only a single difference in gcc/auto-host.h from a self-compiled
32-bit binutils 2.28:

-#define HAVE_AS_IX86_TLS_GET_ADDR_GOT 1
+#define HAVE_AS_IX86_TLS_GET_ADDR_GOT 0

This is benign since that feature only landed in binutils 2.27.

> However, it's easier than having to have a separate machine with a
> 32-bit kernel around (if such beasts still exist)...

I've now successfully verified the procedure above on Fedora 25 and it worked
fine with one addition (if one stays with the bundled 64-bit binutils):

* One needs to configure with --disable-lto-plugin

  Otherwise, an attempt to load the just compiled 32-bit liblto_plugin.so into
  the 64-bit ld will fail...

In such a build, I could reproduce again the ICE reported earlier.

  Rainer
Comment 22 Richard Biener 2017-06-22 13:44:55 UTC
Ok.  So the ICEs are because we now have PHIs in the SLP tree but those can
get re-allocated during early transform phase that does peeling by
copy_bbs which uses duplicate_block which duplicates successor edges (thereby
allocating new PHI args which eventually makes PHIs grow over capacity).
We're later throwing away the stale entry edges (and the PHI args) but only
from the re-allocated PHI which is not the one referenced.

It seems wasteful to do this PHI arg handling in copy_bbs.  A hack is possible
(luckily gimple_cfg_hooks and the like are not const...).

Doesn't work so nicely for the edges to out of bbs[] though :/

Eventually adding duplicate_block_raw and doing this more manually would work.

Bah.

Sanity checking of vinfo_for_stmt (stmt)->stmt == stmt would have uncovered
this earlier for PHIs.
Comment 23 Richard Biener 2017-06-22 14:28:49 UTC
Created attachment 41612 [details]
patch for the vectorizer ICE

This fixes the slp-perm-8.c ICE but causes libgomp vla1.f90 to ICE with

during RTL pass: loop2_done
/tmp/trunk/libgomp/testsuite/libgomp.fortran/vla1.f90:40:0: internal compiler error: in patch_jump_insn, at cfgrtl.c:1264
0x86029db patch_jump_insn
        /tmp/trunk/gcc/cfgrtl.c:1264
0x8602ad4 redirect_branch_edge
        /tmp/trunk/gcc/cfgrtl.c:1298
0x8609146 cfg_layout_redirect_edge_and_branch
        /tmp/trunk/gcc/cfgrtl.c:4406
0x85efdc7 redirect_edge_and_branch(edge_def*, basic_block_def*)
        /tmp/trunk/gcc/cfghooks.c:356
0x952ff29 try_forward_edges
        /tmp/trunk/gcc/cfgcleanup.c:575
0x95361ed try_optimize_cfg
        /tmp/trunk/gcc/cfgcleanup.c:2992
0x95367a6 cleanup_cfg(int)
        /tmp/trunk/gcc/cfgcleanup.c:3203
0x89477a9 execute
        /tmp/trunk/gcc/loop-init.c:475
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

so sth goes wrong with labels when splitting the work :/

I suppose one of the side-effects of redirect_edge_and_branch_force
was to duplicate the label appropriately which means we cannot simply use
unchecked_make_edge.

Probably the reason for using the weird add edge to old label and redirect
code :(
Comment 24 Richard Biener 2017-06-23 11:40:34 UTC
Created attachment 41620 [details]
patch that seems to work

Patch in testing/posted.
Comment 25 Richard Biener 2017-06-26 07:20:10 UTC
Author: rguenth
Date: Mon Jun 26 07:19:37 2017
New Revision: 249638

URL: https://gcc.gnu.org/viewcvs?rev=249638&root=gcc&view=rev
Log:
2017-06-26  Richard Biener  <rguenther@suse.de>

        PR tree-optimization/80928
	* cfghooks.c (duplicate_block): Do not copy BB_DUPLICATED flag.
	(copy_bbs): Set BB_DUPLICATED flag early.
	(execute_on_growing_pred): Do not execute for BB_DUPLICATED
	marked blocks.
	(execute_on_shrinking_pred): Likewise.
	* tree-ssa.c (ssa_redirect_edge): Do not look for PHI args in
	BB_DUPLICATED blocks.
	* tree-ssa-phionlycoprop.c (eliminate_degenerate_phis_1): Properly
	iterate over all PHIs considering removal of *gsi.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cfghooks.c
    trunk/gcc/tree-ssa-phionlycprop.c
    trunk/gcc/tree-ssa.c
Comment 26 Tamar Christina 2017-10-05 15:27:15 UTC
Author: tnfchris
Date: Thu Oct 05 15:17:39 2017
New Revision: 253452

URL: https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=253452
Log:
gcc/testsuite/
2017-10-05  Tamar Christina  <tamar.christina@arm.com>

	* gcc.dg/vect/slp-perm-9.c: Use vect_sizes_16B_8B.
	* lib/target-supports.exp (vect_sizes_16B_8B): New.

gcc/
2017-10-05  Tamar Christina  <tamar.christina@arm.com>

	* doc/sourcebuild.texi (vect_sizes_16B_8B, vect_sizes_32B_16B): New.

Modified:
	trunk/gcc/ChangeLog
	trunk/gcc/doc/sourcebuild.texi
	trunk/gcc/testsuite/ChangeLog
	trunk/gcc/testsuite/gcc.dg/vect/slp-perm-9.c
	trunk/gcc/testsuite/lib/target-supports.exp
Comment 27 Arseny Solokha 2020-10-19 10:05:20 UTC
Is it still an issue?
Comment 28 Richard Biener 2020-10-19 12:16:10 UTC
Yes, the original issue is still present.
Comment 29 Richard Biener 2020-10-19 12:39:43 UTC
So a testcase for missed outer loop induction SLP (and nested cycle SLP) is
for example

int a[1024];
void foo (unsigned n)
{
  for (int i = 0; i < 1020; i += 4)
    {
      int suma = a[i];
      int sumb = a[i+1];
      int sumc = a[i+2];
      int sumd = a[i+3];
      for (unsigned j = 0; j < 17; ++j)
        {
          suma = (suma ^ i) + 1;
          sumb = (sumb ^ i) + 2;
          sumc = (sumc ^ i) + 3;
          sumd = (sumd ^ i) + 4;
        }
      a[i] = suma;
      a[i+1] = sumb;
      a[i+2] = sumc;
      a[i+3] = sumd;
    }
}
Comment 30 Richard Biener 2020-10-19 15:37:20 UTC
(In reply to Richard Biener from comment #29)
> So a testcase for missed outer loop induction SLP (and nested cycle SLP) is
> for example
> 
> int a[1024];
> void foo (unsigned n)
> {
>   for (int i = 0; i < 1020; i += 4)
>     {
>       int suma = a[i];
>       int sumb = a[i+1];
>       int sumc = a[i+2];
>       int sumd = a[i+3];
>       for (unsigned j = 0; j < 17; ++j)
>         {
>           suma = (suma ^ i) + 1;
>           sumb = (sumb ^ i) + 2;
>           sumc = (sumc ^ i) + 3;
>           sumd = (sumd ^ i) + 4;
>         }
>       a[i] = suma;
>       a[i+1] = sumb;
>       a[i+2] = sumc;
>       a[i+3] = sumd;
>     }
> }

Actually this is still not an inner loop induction in outer loop vectorization.
But missed nested cycle SLP handling.  I have a patch for this in testing.
Comment 31 Richard Biener 2020-10-27 11:52:41 UTC
The following is a testcase triggering the

      /* FORNOW: outer loop induction with SLP not supported.  */
      if (STMT_SLP_TYPE (stmt_info))
        return false;

test:

double image[40];

void
foo (void)
{
  for (int i = 0; i < 20; i++)
    {
      double suma = 0;
      double sumb = 0;
      for (int j = 0; j < 40; j++)
        {
          suma += j+i;
          sumb += j+i;
        }
      image[2*i] = suma;
      image[2*i+1] = sumb;
    }
}
Comment 32 GCC Commits 2020-11-03 12:33:50 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:ac6affba97130bcbffb21bd9f8ca53c7aac89551

commit r11-4652-gac6affba97130bcbffb21bd9f8ca53c7aac89551
Author: Richard Biener <rguenther@suse.de>
Date:   Tue Nov 3 11:52:47 2020 +0100

    tree-optimization/80928 - SLP vectorize nested loop induction
    
    This adds SLP vectorization of nested inductions.
    
    2020-11-03  Richard Biener <rguenther@suse.de>
    
            PR tree-optimization/80928
            * tree-vect-loop.c (vectorizable_induction): SLP vectorize
            nested inductions.
    
            * gcc.dg/vect/vect-outer-slp-2.c: New testcase.
            * gcc.dg/vect/vect-outer-slp-3.c: Likewise.
Comment 33 Richard Biener 2020-11-03 12:41:34 UTC
Now fixed.