int a[1024]; void foo (int n) { for (int i = 0; i < 1020; i += 5) { a[i] = i; a[i+1] = i; a[i+2] = i; a[i+3] = i; a[i+4] = i; } } is not vectorized. t.c:4:3: note: === vect_analyze_slp === t.c:4:3: note: Build SLP for a[i_17] = i_17; t.c:4:3: note: Build SLP for a[_1] = i_17; t.c:4:3: note: Build SLP for a[_2] = i_17; t.c:4:3: note: Build SLP for a[_3] = i_17; t.c:4:3: note: Build SLP for a[_4] = i_17; t.c:4:3: note: vect_is_simple_use: operand i_17 t.c:4:3: note: def_stmt: i_17 = PHI <i_13(4), 0(2)> t.c:4:3: note: type of def: induction t.c:4:3: note: Build SLP failed: illegal type of def i_17 that's because we do not handle inductions (neither during SLP discovery nor later during code-gen). /* Check the types of the definitions. */ switch (dt) { case vect_constant_def: case vect_external_def: case vect_reduction_def: break; case vect_internal_def: oprnd_info->def_stmts.quick_push (def_stmt); break; default: /* FORNOW: Not supported. */ if (dump_enabled_p ()) { dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "Build SLP failed: illegal type of def "); dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, oprnd); dump_printf (MSG_MISSED_OPTIMIZATION, "\n"); } return -1; }
Mine.
Author: rguenth Date: Tue Jun 6 07:37:14 2017 New Revision: 248909 URL: https://gcc.gnu.org/viewcvs?rev=248909&root=gcc&view=rev Log: 2017-06-06 Richard Biener <rguenther@suse.de> PR tree-optimization/80928 * tree-vect-loop.c (vect_update_vf_for_slp): Amend dumps. (vect_analyze_loop_operations): Properly guard analysis for pure SLP case. (vect_transform_loop): Likewise. (vect_analyze_loop_2): Also reset SLP type on PHIs. (vect_model_induction_cost): Do not cost for pure SLP. (vectorizable_induction): Pass in SLP node, implement SLP vectorization of induction in inner loop vectorization. * tree-vect-slp.c (vect_create_new_slp_node): Handle PHIs. (vect_get_and_check_slp_defs): Handle vect_induction_def. (vect_build_slp_tree): Likewise. Handle PHIs as terminating the recursion. (vect_analyze_slp_cost_1): Cost induction. (vect_detect_hybrid_slp_stmts): Handle PHIs. (vect_get_slp_vect_defs): Likewise. * tree-vect-stmts.c (vect_analyze_stmt): Handle induction. (vect_transform_stmt): Handle SLP reductions. * tree-vectorizer.h (vectorizable_induction): Adjust. * gcc.dg/vect/pr80928.c: New testcase. * gcc.dg/vect/slp-13-big-array.c: Remove XFAILs. * gcc.dg/vect/slp-13.c: Likewise. * gcc.dg/vect/slp-perm-9.c: Prevent vectorization of check loop. Added: trunk/gcc/testsuite/gcc.dg/vect/pr80928.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c trunk/gcc/testsuite/gcc.dg/vect/slp-13.c trunk/gcc/testsuite/gcc.dg/vect/slp-perm-9.c trunk/gcc/tree-vect-loop.c trunk/gcc/tree-vect-slp.c trunk/gcc/tree-vect-stmts.c trunk/gcc/tree-vectorizer.h
So mostly fixed now with outer loop vectorization support still missing. Adjusting summary accordingly.
This patch (r248909) caused regressions on arm/aarch64: - PASS now FAIL [PASS => FAIL]: Executed from: gcc.dg/vect/vect.exp gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 0 gcc.dg/vect/slp-perm-8.c scan-tree-dump-times vect "vectorizing stmts using SLP" 0 gcc.dg/vect/slp-perm-9.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 1 loops" 1 gcc.dg/vect/slp-perm-9.c scan-tree-dump-times vect "vectorized 1 loops" 1
The patch also caused a couple of regressions on i386-pc-solaris2.12: +FAIL: gcc.dg/vect/slp-perm-8.c (internal compiler error) +FAIL: gcc.dg/vect/slp-perm-8.c (test for excess errors) +FAIL: gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects (internal compiler error ) +FAIL: gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects (test for excess errors) +WARNING: gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects compilation failed to produce executable +WARNING: gcc.dg/vect/slp-perm-8.c compilation failed to produce executable Excess errors: during GIMPLE pass: vect dump file: slp-perm-8.c.156t.vect /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/slp-perm-8.c:25:5: internal compiler error: in operator[], at vec.h:729 0x8b58359 vec<edge_def*, va_gc, vl_embed>::operator[](unsigned int) /vol/gcc/src/hg/trunk/local/gcc/vec.h:729 0x8b58359 gimple_phi_arg_edge /vol/gcc/src/hg/trunk/local/gcc/gimple.h:4398 0x8b58359 dump_gimple_phi /vol/gcc/src/hg/trunk/local/gcc/gimple-pretty-print.c:2185 0x8b5a668 print_gimple_stmt(__FILE*, gimple*, int, unsigned long long) /vol/gcc/src/hg/trunk/local/gcc/gimple-pretty-print.c:117 0x8a254c5 dump_gimple_stmt(unsigned long long, unsigned long long, gimple*, int) /vol/gcc/src/hg/trunk/local/gcc/dumpfile.c:340 0x90750dd vect_schedule_slp_instance /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3680 0x9074f6f vect_schedule_slp_instance /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3641 0x9074f6f vect_schedule_slp_instance /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3641 0x9074f6f vect_schedule_slp_instance /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3641 0x9075861 vect_schedule_slp(vec_info*) /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3834 0x905a2ba vect_transform_loop(_loop_vec_info*) /vol/gcc/src/hg/trunk/local/gcc/tree-vect-loop.c:7151 0x907b4e8 vectorize_loops() /vol/gcc/src/hg/trunk/local/gcc/tree-vectorizer.c:690 32 and 64-bit x86 +FAIL: libgomp.fortran/vla1.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel -loops -ftracer -finline-functions (internal compiler error) +FAIL: libgomp.fortran/vla1.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel -loops -ftracer -finline-functions (test for excess errors) +WARNING: libgomp.fortran/vla1.f90 -O3 -fomit-frame-pointer -funroll-loops -fp eel-loops -ftracer -finline-functions compilation failed to produce executable +FAIL: libgomp.fortran/vla1.f90 -O3 -g (internal compiler error) +FAIL: libgomp.fortran/vla1.f90 -O3 -g (test for excess errors) +WARNING: libgomp.fortran/vla1.f90 -O3 -g compilation failed to produce execu table and several more Excess errors: during GIMPLE pass: vect /vol/gcc/src/hg/trunk/local/libgomp/testsuite/libgomp.fortran/vla1.f90:40:0: internal compiler error: in vect_free_slp_tree, at tree-vect-slp.c:62 0x90e874f vect_free_slp_tree /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:62 0x90e859d vect_free_slp_tree /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:55 0x90e859d vect_free_slp_tree /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:55 0x90e859d vect_free_slp_tree /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:55 0x90eb870 vect_free_slp_instance(_slp_instance*) /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:80 0x90d5678 vect_transform_loop(_loop_vec_info*) /vol/gcc/src/hg/trunk/local/gcc/tree-vect-loop.c:7249 0x90f6a48 vectorize_loops() /vol/gcc/src/hg/trunk/local/gcc/tree-vectorizer.c:690 64-bit x86
... and also on sparc-sun-solaris2.12: +FAIL: gcc.dg/vect/slp-13-big-array.c -flto -ffat-lto-objects scan-tree-dump-ti mes vect "vectorizing stmts using SLP" 3 +FAIL: gcc.dg/vect/slp-13-big-array.c scan-tree-dump-times vect "vectorizing stm ts using SLP" 3 +FAIL: gcc.dg/vect/slp-13.c -flto -ffat-lto-objects scan-tree-dump-times vect " vectorizing stmts using SLP" 3 +FAIL: gcc.dg/vect/slp-13.c scan-tree-dump-times vect "vectorizing stmts using S LP" 3 32 and 64-bit sparc
Created attachment 41482 [details] 32-bit sparc-sun-solaris2.12 slp-13.c.156t.vect
Author: rguenth Date: Wed Jun 7 09:10:17 2017 New Revision: 248948 URL: https://gcc.gnu.org/viewcvs?rev=248948&root=gcc&view=rev Log: 2017-06-07 Richard Biener <rguenther@suse.de> PR tree-optimization/80928 * gcc.dg/vect/slp-perm-8.c: Avoid vectorizing loop computing check_results. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
(In reply to ro@CeBiTec.Uni-Bielefeld.DE from comment #6) > ... and also on sparc-sun-solaris2.12: > > +FAIL: gcc.dg/vect/slp-13-big-array.c -flto -ffat-lto-objects > scan-tree-dump-ti > mes vect "vectorizing stmts using SLP" 3 > +FAIL: gcc.dg/vect/slp-13-big-array.c scan-tree-dump-times vect "vectorizing > stm > ts using SLP" 3 > +FAIL: gcc.dg/vect/slp-13.c -flto -ffat-lto-objects scan-tree-dump-times > vect " > vectorizing stmts using SLP" 3 > +FAIL: gcc.dg/vect/slp-13.c scan-tree-dump-times vect "vectorizing stmts > using S > LP" 3 > > 32 and 64-bit sparc /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/slp-13.c:18:3: note: not vectorized: relevant stmt not supported: _3 = (short unsigned int) i_316; /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/slp-13.c:18:3: note: removing SLP instance operations starting from: out[_1] = _4; ok, it needs demotion, will fix.
On Wed, 7 Jun 2017, ro at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80928 > > Rainer Orth <ro at gcc dot gnu.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |ro at gcc dot gnu.org > > --- Comment #5 from Rainer Orth <ro at gcc dot gnu.org> --- > The patch also caused a couple of regressions on i386-pc-solaris2.12: > > +FAIL: gcc.dg/vect/slp-perm-8.c (internal compiler error) > +FAIL: gcc.dg/vect/slp-perm-8.c (test for excess errors) > +FAIL: gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects (internal compiler > error > ) > +FAIL: gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects (test for excess > errors) > +WARNING: gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects compilation failed > to > produce executable > +WARNING: gcc.dg/vect/slp-perm-8.c compilation failed to produce executable Can't reproduce with a cross. > Excess errors: > during GIMPLE pass: vect > dump file: slp-perm-8.c.156t.vect > /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/slp-perm-8.c:25:5: > internal compiler error: in operator[], at vec.h:729 > 0x8b58359 vec<edge_def*, va_gc, vl_embed>::operator[](unsigned int) > /vol/gcc/src/hg/trunk/local/gcc/vec.h:729 > 0x8b58359 gimple_phi_arg_edge > /vol/gcc/src/hg/trunk/local/gcc/gimple.h:4398 > 0x8b58359 dump_gimple_phi > /vol/gcc/src/hg/trunk/local/gcc/gimple-pretty-print.c:2185 > 0x8b5a668 print_gimple_stmt(__FILE*, gimple*, int, unsigned long long) > /vol/gcc/src/hg/trunk/local/gcc/gimple-pretty-print.c:117 > 0x8a254c5 dump_gimple_stmt(unsigned long long, unsigned long long, gimple*, > int) > /vol/gcc/src/hg/trunk/local/gcc/dumpfile.c:340 > 0x90750dd vect_schedule_slp_instance > /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3680 > 0x9074f6f vect_schedule_slp_instance > /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3641 > 0x9074f6f vect_schedule_slp_instance > /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3641 > 0x9074f6f vect_schedule_slp_instance > /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3641 > 0x9075861 vect_schedule_slp(vec_info*) > /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3834 > 0x905a2ba vect_transform_loop(_loop_vec_info*) > /vol/gcc/src/hg/trunk/local/gcc/tree-vect-loop.c:7151 > 0x907b4e8 vectorize_loops() > /vol/gcc/src/hg/trunk/local/gcc/tree-vectorizer.c:690 > > 32 and 64-bit x86 That is, i686-linux with -m32 / -m64? I see no issues on x86_64 with -m32 or -m64. > +FAIL: libgomp.fortran/vla1.f90 -O3 -fomit-frame-pointer -funroll-loops > -fpeel > -loops -ftracer -finline-functions (internal compiler error) > +FAIL: libgomp.fortran/vla1.f90 -O3 -fomit-frame-pointer -funroll-loops > -fpeel > -loops -ftracer -finline-functions (test for excess errors) > +WARNING: libgomp.fortran/vla1.f90 -O3 -fomit-frame-pointer -funroll-loops > -fp > eel-loops -ftracer -finline-functions compilation failed to produce executable > +FAIL: libgomp.fortran/vla1.f90 -O3 -g (internal compiler error) > +FAIL: libgomp.fortran/vla1.f90 -O3 -g (test for excess errors) > +WARNING: libgomp.fortran/vla1.f90 -O3 -g compilation failed to produce > execu > table > > and several more > > Excess errors: > during GIMPLE pass: vect > /vol/gcc/src/hg/trunk/local/libgomp/testsuite/libgomp.fortran/vla1.f90:40:0: > internal compiler error: in vect_free_slp_tree, at tree-vect-slp.c:62 > 0x90e874f vect_free_slp_tree > /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:62 > 0x90e859d vect_free_slp_tree > /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:55 > 0x90e859d vect_free_slp_tree > /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:55 > 0x90e859d vect_free_slp_tree > /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:55 > 0x90eb870 vect_free_slp_instance(_slp_instance*) > /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:80 > 0x90d5678 vect_transform_loop(_loop_vec_info*) > /vol/gcc/src/hg/trunk/local/gcc/tree-vect-loop.c:7249 > 0x90f6a48 vectorize_loops() > /vol/gcc/src/hg/trunk/local/gcc/tree-vectorizer.c:690 > > 64-bit x86
Author: rguenth Date: Wed Jun 7 09:39:53 2017 New Revision: 248950 URL: https://gcc.gnu.org/viewcvs?rev=248950&root=gcc&view=rev Log: 2017-06-07 Richard Biener <rguenther@suse.de> PR tree-optimization/80928 * gcc.dg/vect/slp-13.c: Adjust patterns with vect_pack_trunc. * gcc.dg/vect/slp-13-big-array.c: Likewise. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c trunk/gcc/testsuite/gcc.dg/vect/slp-13.c
>> --- Comment #5 from Rainer Orth <ro at gcc dot gnu.org> --- >> The patch also caused a couple of regressions on i386-pc-solaris2.12: >> >> +FAIL: gcc.dg/vect/slp-perm-8.c (internal compiler error) >> +FAIL: gcc.dg/vect/slp-perm-8.c (test for excess errors) >> +FAIL: gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects (internal compiler >> error >> ) >> +FAIL: gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects (test for excess >> errors) >> +WARNING: gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects compilation failed >> to >> produce executable >> +WARNING: gcc.dg/vect/slp-perm-8.c compilation failed to produce executable > > Can't reproduce with a cross. I see the same in a i686-pc-linux-gnu build. >> Excess errors: >> during GIMPLE pass: vect >> dump file: slp-perm-8.c.156t.vect >> /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/slp-perm-8.c:25:5: >> internal compiler error: in operator[], at vec.h:729 >> 0x8b58359 vec<edge_def*, va_gc, vl_embed>::operator[](unsigned int) >> /vol/gcc/src/hg/trunk/local/gcc/vec.h:729 >> 0x8b58359 gimple_phi_arg_edge >> /vol/gcc/src/hg/trunk/local/gcc/gimple.h:4398 >> 0x8b58359 dump_gimple_phi >> /vol/gcc/src/hg/trunk/local/gcc/gimple-pretty-print.c:2185 >> 0x8b5a668 print_gimple_stmt(__FILE*, gimple*, int, unsigned long long) >> /vol/gcc/src/hg/trunk/local/gcc/gimple-pretty-print.c:117 >> 0x8a254c5 dump_gimple_stmt(unsigned long long, unsigned long long, gimple*, >> int) >> /vol/gcc/src/hg/trunk/local/gcc/dumpfile.c:340 >> 0x90750dd vect_schedule_slp_instance >> /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3680 >> 0x9074f6f vect_schedule_slp_instance >> /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3641 >> 0x9074f6f vect_schedule_slp_instance >> /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3641 >> 0x9074f6f vect_schedule_slp_instance >> /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3641 >> 0x9075861 vect_schedule_slp(vec_info*) >> /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:3834 >> 0x905a2ba vect_transform_loop(_loop_vec_info*) >> /vol/gcc/src/hg/trunk/local/gcc/tree-vect-loop.c:7151 >> 0x907b4e8 vectorize_loops() >> /vol/gcc/src/hg/trunk/local/gcc/tree-vectorizer.c:690 >> >> 32 and 64-bit x86 > > That is, i686-linux with -m32 / -m64? I see no issues on > x86_64 with -m32 or -m64. Both i386-pc-solaris2.12 and i686-pc-linux-gnu. Haven't tried an x86_64 build yet. >> +FAIL: libgomp.fortran/vla1.f90 -O3 -fomit-frame-pointer -funroll-loops >> -fpeel >> -loops -ftracer -finline-functions (internal compiler error) >> +FAIL: libgomp.fortran/vla1.f90 -O3 -fomit-frame-pointer -funroll-loops >> -fpeel >> -loops -ftracer -finline-functions (test for excess errors) >> +WARNING: libgomp.fortran/vla1.f90 -O3 -fomit-frame-pointer -funroll-loops >> -fp >> eel-loops -ftracer -finline-functions compilation failed to produce executable >> +FAIL: libgomp.fortran/vla1.f90 -O3 -g (internal compiler error) >> +FAIL: libgomp.fortran/vla1.f90 -O3 -g (test for excess errors) >> +WARNING: libgomp.fortran/vla1.f90 -O3 -g compilation failed to produce >> execu >> table >> >> and several more >> >> Excess errors: >> during GIMPLE pass: vect >> /vol/gcc/src/hg/trunk/local/libgomp/testsuite/libgomp.fortran/vla1.f90:40:0: >> internal compiler error: in vect_free_slp_tree, at tree-vect-slp.c:62 >> 0x90e874f vect_free_slp_tree >> /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:62 >> 0x90e859d vect_free_slp_tree >> /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:55 >> 0x90e859d vect_free_slp_tree >> /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:55 >> 0x90e859d vect_free_slp_tree >> /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:55 >> 0x90eb870 vect_free_slp_instance(_slp_instance*) >> /vol/gcc/src/hg/trunk/local/gcc/tree-vect-slp.c:80 >> 0x90d5678 vect_transform_loop(_loop_vec_info*) >> /vol/gcc/src/hg/trunk/local/gcc/tree-vect-loop.c:7249 >> 0x90f6a48 vectorize_loops() >> /vol/gcc/src/hg/trunk/local/gcc/tree-vectorizer.c:690 >> >> 64-bit x86 Also on both i386-pc-solaris2.12 -m64 and i686-pc-linux-gnu -m64. Rainer
(In reply to Richard Biener from comment #8) > Author: rguenth > Date: Wed Jun 7 09:10:17 2017 > New Revision: 248948 > > URL: https://gcc.gnu.org/viewcvs?rev=248948&root=gcc&view=rev > Log: > 2017-06-07 Richard Biener <rguenther@suse.de> > > PR tree-optimization/80928 > * gcc.dg/vect/slp-perm-8.c: Avoid vectorizing loop computing > check_results. > > Modified: > trunk/gcc/testsuite/ChangeLog > trunk/gcc/testsuite/gcc.dg/vect/slp-perm-8.c After, arm and aarch64 regress: FAIL: gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 1 loops" 2 FAIL: gcc.dg/vect/slp-perm-8.c scan-tree-dump-times vect "vectorized 1 loops" 2 but these improve: PASS: gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 0 PASS: gcc.dg/vect/slp-perm-8.c scan-tree-dump-times vect "vectorizing stmts using SLP" 0
Author: rguenth Date: Thu Jun 8 07:32:52 2017 New Revision: 249004 URL: https://gcc.gnu.org/viewcvs?rev=249004&root=gcc&view=rev Log: 2017-06-08 Richard Biener <rguenther@suse.de> PR tree-optimization/80928 * gcc.dg/vect/slp-perm-8.c: Do not expect check loop to be vectorized. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
Richard, do you have the i686-pc-linux-gnu/i386-pc-solaris2.* libgomp ICEs with -m64 on the radar? They still happen as of r249422. Thanks. Rainer
On Wed, 21 Jun 2017, ro at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80928 > > --- Comment #15 from Rainer Orth <ro at gcc dot gnu.org> --- > Richard, > > do you have the i686-pc-linux-gnu/i386-pc-solaris2.* libgomp ICEs with -m64 > on the radar? They still happen as of r249422. I tried hard to reproduce but failed so yes, on my radar but nothing I can do about :/ If you can direct me to a CF machine that reproduces the issue that would be nice.
> --- Comment #16 from rguenther at suse dot de <rguenther at suse dot de> --- > I tried hard to reproduce but failed so yes, on my radar but nothing I can > do about :/ > > If you can direct me to a CF machine that reproduces the issue that > would be nice. Weird: I have it in my regular i686-pc-linux-gnu builds on Fedora 25 (Xeon X7542), only for -m64. Doesn't happen for an x86_64-pc-linux-gnu compiler on the same box, though. I'll see if I can find a cfarm box where it reproduces.
On Wed, 21 Jun 2017, ro at CeBiTec dot Uni-Bielefeld.DE wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80928 > > --- Comment #17 from ro at CeBiTec dot Uni-Bielefeld.DE <ro at CeBiTec dot Uni-Bielefeld.DE> --- > > --- Comment #16 from rguenther at suse dot de <rguenther at suse dot de> --- > > I tried hard to reproduce but failed so yes, on my radar but nothing I can > > do about :/ > > > > If you can direct me to a CF machine that reproduces the issue that > > would be nice. > > Weird: I have it in my regular i686-pc-linux-gnu builds on Fedora 25 > (Xeon X7542), only for -m64. Doesn't happen for an x86_64-pc-linux-gnu > compiler on the same box, though. > > I'll see if I can find a cfarm box where it reproduces. I have not yet built a native i686 compiler with 64bit support but only tried a x86_64 -> i686 cross with 64bit support where it doesn't reproduce. Native builds for i686 on a x86_64 host are always a bit odd to produce.
> --- Comment #18 from rguenther at suse dot de <rguenther at suse dot de> --- [...] > I have not yet built a native i686 compiler with 64bit support but only > tried a x86_64 -> i686 cross with 64bit support where it doesn't > reproduce. > > Native builds for i686 on a x86_64 host are always a bit odd to produce. Indeed, probably worth documenting the procedure somewhere. I believe what you need is to configure with * CC='gcc -m32' 'CXX='g++ -m32' * --enable-targets=all * --build=i686-pc-linux-gnu --host=i686-pc-linux-gnu --target=i686-pc-linux-gnu so configure doesn't conclude it's a cross * using 32-bit gas and gld helps, but isn't required, I believe (it gets a few assembler configure tests wrong where the 32-bit and 64-bit results differ) However, it's easier than having to have a separate machine with a 32-bit kernel around (if such beasts still exist)... Rainer
> --- Comment #18 from rguenther at suse dot de <rguenther at suse dot de> --- [...] > I have not yet built a native i686 compiler with 64bit support but only > tried a x86_64 -> i686 cross with 64bit support where it doesn't > reproduce. Now that you mention it, the ICE doesn't occur on both x86_64-pc-linux-gnu and amd64-pc-solaris compilers, but only in the i?86-*-* ones. Rainer
(In reply to ro@CeBiTec.Uni-Bielefeld.DE from comment #19) > > --- Comment #18 from rguenther at suse dot de <rguenther at suse dot de> --- > > Native builds for i686 on a x86_64 host are always a bit odd to produce. > > Indeed, probably worth documenting the procedure somewhere. I believe > what you need is to configure with > > * CC='gcc -m32' 'CXX='g++ -m32' > > * --enable-targets=all > > * --build=i686-pc-linux-gnu --host=i686-pc-linux-gnu > --target=i686-pc-linux-gnu > > so configure doesn't conclude it's a cross > > * using 32-bit gas and gld helps, but isn't required, I believe (it gets > a few assembler configure tests wrong where the 32-bit and 64-bit > results differ) In fact this isn't necessary: on a Fedora 25 system with bundled 64-bit binutils 2.26.1, there's only a single difference in gcc/auto-host.h from a self-compiled 32-bit binutils 2.28: -#define HAVE_AS_IX86_TLS_GET_ADDR_GOT 1 +#define HAVE_AS_IX86_TLS_GET_ADDR_GOT 0 This is benign since that feature only landed in binutils 2.27. > However, it's easier than having to have a separate machine with a > 32-bit kernel around (if such beasts still exist)... I've now successfully verified the procedure above on Fedora 25 and it worked fine with one addition (if one stays with the bundled 64-bit binutils): * One needs to configure with --disable-lto-plugin Otherwise, an attempt to load the just compiled 32-bit liblto_plugin.so into the 64-bit ld will fail... In such a build, I could reproduce again the ICE reported earlier. Rainer
Ok. So the ICEs are because we now have PHIs in the SLP tree but those can get re-allocated during early transform phase that does peeling by copy_bbs which uses duplicate_block which duplicates successor edges (thereby allocating new PHI args which eventually makes PHIs grow over capacity). We're later throwing away the stale entry edges (and the PHI args) but only from the re-allocated PHI which is not the one referenced. It seems wasteful to do this PHI arg handling in copy_bbs. A hack is possible (luckily gimple_cfg_hooks and the like are not const...). Doesn't work so nicely for the edges to out of bbs[] though :/ Eventually adding duplicate_block_raw and doing this more manually would work. Bah. Sanity checking of vinfo_for_stmt (stmt)->stmt == stmt would have uncovered this earlier for PHIs.
Created attachment 41612 [details] patch for the vectorizer ICE This fixes the slp-perm-8.c ICE but causes libgomp vla1.f90 to ICE with during RTL pass: loop2_done /tmp/trunk/libgomp/testsuite/libgomp.fortran/vla1.f90:40:0: internal compiler error: in patch_jump_insn, at cfgrtl.c:1264 0x86029db patch_jump_insn /tmp/trunk/gcc/cfgrtl.c:1264 0x8602ad4 redirect_branch_edge /tmp/trunk/gcc/cfgrtl.c:1298 0x8609146 cfg_layout_redirect_edge_and_branch /tmp/trunk/gcc/cfgrtl.c:4406 0x85efdc7 redirect_edge_and_branch(edge_def*, basic_block_def*) /tmp/trunk/gcc/cfghooks.c:356 0x952ff29 try_forward_edges /tmp/trunk/gcc/cfgcleanup.c:575 0x95361ed try_optimize_cfg /tmp/trunk/gcc/cfgcleanup.c:2992 0x95367a6 cleanup_cfg(int) /tmp/trunk/gcc/cfgcleanup.c:3203 0x89477a9 execute /tmp/trunk/gcc/loop-init.c:475 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. so sth goes wrong with labels when splitting the work :/ I suppose one of the side-effects of redirect_edge_and_branch_force was to duplicate the label appropriately which means we cannot simply use unchecked_make_edge. Probably the reason for using the weird add edge to old label and redirect code :(
Created attachment 41620 [details] patch that seems to work Patch in testing/posted.
Author: rguenth Date: Mon Jun 26 07:19:37 2017 New Revision: 249638 URL: https://gcc.gnu.org/viewcvs?rev=249638&root=gcc&view=rev Log: 2017-06-26 Richard Biener <rguenther@suse.de> PR tree-optimization/80928 * cfghooks.c (duplicate_block): Do not copy BB_DUPLICATED flag. (copy_bbs): Set BB_DUPLICATED flag early. (execute_on_growing_pred): Do not execute for BB_DUPLICATED marked blocks. (execute_on_shrinking_pred): Likewise. * tree-ssa.c (ssa_redirect_edge): Do not look for PHI args in BB_DUPLICATED blocks. * tree-ssa-phionlycoprop.c (eliminate_degenerate_phis_1): Properly iterate over all PHIs considering removal of *gsi. Modified: trunk/gcc/ChangeLog trunk/gcc/cfghooks.c trunk/gcc/tree-ssa-phionlycprop.c trunk/gcc/tree-ssa.c
Author: tnfchris Date: Thu Oct 05 15:17:39 2017 New Revision: 253452 URL: https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=253452 Log: gcc/testsuite/ 2017-10-05 Tamar Christina <tamar.christina@arm.com> * gcc.dg/vect/slp-perm-9.c: Use vect_sizes_16B_8B. * lib/target-supports.exp (vect_sizes_16B_8B): New. gcc/ 2017-10-05 Tamar Christina <tamar.christina@arm.com> * doc/sourcebuild.texi (vect_sizes_16B_8B, vect_sizes_32B_16B): New. Modified: trunk/gcc/ChangeLog trunk/gcc/doc/sourcebuild.texi trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/vect/slp-perm-9.c trunk/gcc/testsuite/lib/target-supports.exp
Is it still an issue?
Yes, the original issue is still present.
So a testcase for missed outer loop induction SLP (and nested cycle SLP) is for example int a[1024]; void foo (unsigned n) { for (int i = 0; i < 1020; i += 4) { int suma = a[i]; int sumb = a[i+1]; int sumc = a[i+2]; int sumd = a[i+3]; for (unsigned j = 0; j < 17; ++j) { suma = (suma ^ i) + 1; sumb = (sumb ^ i) + 2; sumc = (sumc ^ i) + 3; sumd = (sumd ^ i) + 4; } a[i] = suma; a[i+1] = sumb; a[i+2] = sumc; a[i+3] = sumd; } }
(In reply to Richard Biener from comment #29) > So a testcase for missed outer loop induction SLP (and nested cycle SLP) is > for example > > int a[1024]; > void foo (unsigned n) > { > for (int i = 0; i < 1020; i += 4) > { > int suma = a[i]; > int sumb = a[i+1]; > int sumc = a[i+2]; > int sumd = a[i+3]; > for (unsigned j = 0; j < 17; ++j) > { > suma = (suma ^ i) + 1; > sumb = (sumb ^ i) + 2; > sumc = (sumc ^ i) + 3; > sumd = (sumd ^ i) + 4; > } > a[i] = suma; > a[i+1] = sumb; > a[i+2] = sumc; > a[i+3] = sumd; > } > } Actually this is still not an inner loop induction in outer loop vectorization. But missed nested cycle SLP handling. I have a patch for this in testing.
The following is a testcase triggering the /* FORNOW: outer loop induction with SLP not supported. */ if (STMT_SLP_TYPE (stmt_info)) return false; test: double image[40]; void foo (void) { for (int i = 0; i < 20; i++) { double suma = 0; double sumb = 0; for (int j = 0; j < 40; j++) { suma += j+i; sumb += j+i; } image[2*i] = suma; image[2*i+1] = sumb; } }
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:ac6affba97130bcbffb21bd9f8ca53c7aac89551 commit r11-4652-gac6affba97130bcbffb21bd9f8ca53c7aac89551 Author: Richard Biener <rguenther@suse.de> Date: Tue Nov 3 11:52:47 2020 +0100 tree-optimization/80928 - SLP vectorize nested loop induction This adds SLP vectorization of nested inductions. 2020-11-03 Richard Biener <rguenther@suse.de> PR tree-optimization/80928 * tree-vect-loop.c (vectorizable_induction): SLP vectorize nested inductions. * gcc.dg/vect/vect-outer-slp-2.c: New testcase. * gcc.dg/vect/vect-outer-slp-3.c: Likewise.
Now fixed.