Split out from PR91154 comment#25 Biggest changes when benchmarking -mno-stv (base) against -mstv (peak): 7.28% 37789 twolf_peak.none twolf_peak.none [.] ucxx2 4.21% 25709 twolf_base.none twolf_base.none [.] ucxx2 3.72% 22584 twolf_base.none twolf_base.none [.] new_dbox 2.48% 22364 twolf_peak.none twolf_peak.none [.] new_dbox 1.49% 8270 twolf_base.none twolf_base.none [.] sub_penal 1.12% 7576 twolf_peak.none twolf_peak.none [.] sub_penal 1.36% 9314 twolf_peak.none twolf_peak.none [.] old_assgnto_new2 1.11% 5257 twolf_base.none twolf_base.none [.] old_assgnto_new2 and in ucxx2 I see 0.17 │ mov %eax,(%rsp) 3.55 │ vpmins (%rsp),%xmm0,%xmm1 │ test %eax,%eax 0.22 │ vmovd %xmm1,%r8d 0.80 │ cmovs %esi,%r8d ... Testcase: extern int numBins; extern int binOffst; extern int binWidth; extern int Trybin; void foo (int); void bar (int aleft, int axcenter) { int a1LoBin = (((Trybin=((axcenter + aleft)-binOffst)/binWidth)<0) ? 0 : ((Trybin>numBins) ? numBins : Trybin)); foo (a1LoBin); } where combine eliminates the reg-reg copies STV adds to split live-ranges between GPR and SSE uses (currently one plain move and one set via vec_merge/duplicate). Making STV of SI/DImode chains always happen after combine (in STV2) fixes the testcase above but regresses gcc.target/i386/minmax-6.c which ran into a very similar issue and was reduced from a SPEC CPU 2006 regression observed. In the end the issue is that as soon as the RA decides it needs to spill for a dual-use pseudo it ends up going through the stack because of, as HJ notices, the minimum cost of moves between SSE and integer units is 8: /* Moves between SSE and integer units are expensive. */ if (SSE_CLASS_P (class1) != SSE_CLASS_P (class2)) /* ??? By keeping returned value relatively high, we limit the number of moves between integer and SSE registers for all targets. Additionally, high value prevents problem with x86_modes_tieable_p(), where integer modes in SSE registers are not tieable because of missing QImode and HImode moves to, from or between MMX/SSE registers. */ return MAX (8, SSE_CLASS_P (class1) ? ix86_cost->hard_register.sse_to_integer : ix86_cost->hard_register.integer_to_sse);
Patch that causes gcc.target/i386/minmax-6.c to spill to the stack: https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01283.html This makes SI/DImode chains handled by STV2 only (after combine). We can take it from there if we think that's what we need to do anyways. In the end it's STV/RA interaction, I guess if we can take combine out of the equation it might simplify things.
(In reply to Richard Biener from comment #1) > Patch that causes gcc.target/i386/minmax-6.c to spill to the stack: > https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01283.html > > This makes SI/DImode chains handled by STV2 only (after combine). We can > take it from there if we think that's what we need to do anyways. In the > end it's STV/RA interaction, I guess if we can take combine out of the > equation it might simplify things. Yes, this is the intended way SI/DImode STV should be used.
Author: rguenth Date: Tue Aug 20 08:45:56 2019 New Revision: 274694 URL: https://gcc.gnu.org/viewcvs?rev=274694&root=gcc&view=rev Log: 2019-08-20 Richard Biener <rguenther@suse.de> PR target/91498 * config/i386/i386-features.c (general_scalar_chain::convert_op): Use (vec_merge (vec_duplicate..)) style vector from scalar move. (convert_scalars_to_vector): Add timode_p parameter and use it to guard TImode-only operation. (pass_stv::gate): Adjust so STV runs twice for TARGET_64BIT. (pass_stv::execute): Pass down timode_p. * gcc.target/i386/minmax-7.c: New testcase. Added: trunk/gcc/testsuite/gcc.target/i386/minmax-7.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386-features.c trunk/gcc/testsuite/ChangeLog
Following patch --cut here-- diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 49ab50ea41bf..11c75be113e0 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -18601,9 +18601,9 @@ ix86_register_move_cost (machine_mode mode, reg_class_t class1_i, where integer modes in SSE registers are not tieable because of missing QImode and HImode moves to, from or between MMX/SSE registers. */ - return MAX (8, SSE_CLASS_P (class1) - ? ix86_cost->hard_register.sse_to_integer - : ix86_cost->hard_register.integer_to_sse); + return (SSE_CLASS_P (class1) + ? ix86_cost->hard_register.sse_to_integer + : ix86_cost->hard_register.integer_to_sse); if (MAYBE_FLOAT_CLASS_P (class1)) return ix86_cost->hard_register.fp_move; --cut here-- fixes the regression. (BTW: QI/HImodes are not "missing" at all. We can use %k modifier to move QImode and HImode values using MOVD instruction between register sets.)
(In reply to Uroš Bizjak from comment #4) > Following patch HJ, can you please put the patch through some benchmarks? (I have no access to SPEC).
(In reply to Uroš Bizjak from comment #5) > (In reply to Uroš Bizjak from comment #4) > > Following patch > > HJ, can you please put the patch through some benchmarks? (I have no access > to SPEC). Sure. We will measure it. Thanks.
Our SPEC CPU 2017 failed with 39 util.c:205:1: error: invalid rtl sharing found in the insn 40 205 | } 41 | ^ 42 (insn 29 28 3 2 (set (subreg:V2DI (reg:DI 91) 0) 43 (vec_concat:V2DI (mem/c:DI (plus:DI (reg/f:DI 19 frame) 44 (const_int -8 [0xfffffffffffffff8])) [0 S8 A64]) 45 (const_int 0 [0]))) "util.c":134:1 -1 46 (nil)) 47 util.c:205:1: error: shared rtx 48 (mem/c:DI (plus:DI (reg/f:DI 19 frame) 49 (const_int -8 [0xfffffffffffffff8])) [0 S8 A64]) 50 during RTL pass: stv We have an invalid shared rtx.
The new testcase FAILs on both i386-pc-solaris2.11 and x86_64-pc-linux-gnu with -m32: +FAIL: gcc.target/i386/minmax-7.c scan-assembler pminsd
(In reply to H.J. Lu from comment #7) > Our SPEC CPU 2017 failed with > > 39 util.c:205:1: error: invalid rtl sharing found in the insn > 40 205 | } > 41 | ^ > 42 (insn 29 28 3 2 (set (subreg:V2DI (reg:DI 91) 0) > 43 (vec_concat:V2DI (mem/c:DI (plus:DI (reg/f:DI 19 frame) > 44 (const_int -8 [0xfffffffffffffff8])) [0 S8 A64]) > 45 (const_int 0 [0]))) "util.c":134:1 -1 > 46 (nil)) > 47 util.c:205:1: error: shared rtx > 48 (mem/c:DI (plus:DI (reg/f:DI 19 frame) > 49 (const_int -8 [0xfffffffffffffff8])) [0 S8 A64]) > 50 during RTL pass: stv > > We have an invalid shared rtx. And the testcase is as simple as: int ff (int jn) { return jn > 0 ? jn : 0; } % x86_64-unknown-linux-gnu-gcc-10.0.0-alpha20190818 -march=nano-3000 -Os -c f9272pqv.c f9272pqv.c: In function 'ff': f9272pqv.c:5:1: error: invalid rtl sharing found in the insn 5 | } | ^ (insn 18 17 3 2 (set (subreg:V4SI (reg:SI 87) 0) (vec_merge:V4SI (vec_duplicate:V4SI (mem/c:SI (plus:DI (reg/f:DI 19 frame) (const_int -4 [0xfffffffffffffffc])) [0 S4 A32])) (const_vector:V4SI [ (const_int 0 [0]) repeated x4 ]) (const_int 1 [0x1]))) "f9272pqv.c":3:1 -1 (nil)) f9272pqv.c:5:1: error: shared rtx (mem/c:SI (plus:DI (reg/f:DI 19 frame) (const_int -4 [0xfffffffffffffffc])) [0 S4 A32]) during RTL pass: stv f9272pqv.c:5:1: internal compiler error: internal consistency failure 0x9f3eca verify_rtx_sharing /var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190818/work/gcc-10-20190818/gcc/emit-rtl.c:2927 0x9f3dba verify_rtx_sharing /var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190818/work/gcc-10-20190818/gcc/emit-rtl.c:2942 0x9f3dba verify_rtx_sharing /var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190818/work/gcc-10-20190818/gcc/emit-rtl.c:2942 0x9f3dba verify_rtx_sharing /var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190818/work/gcc-10-20190818/gcc/emit-rtl.c:2942 0x9f4178 verify_insn_sharing /var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190818/work/gcc-10-20190818/gcc/emit-rtl.c:3013 0x9f79ed verify_rtl_sharing() /var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190818/work/gcc-10-20190818/gcc/emit-rtl.c:3036 0xc7a8f3 execute_function_todo /var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190818/work/gcc-10-20190818/gcc/passes.c:2004 0xc7b536 execute_todo /var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190818/work/gcc-10-20190818/gcc/passes.c:2037
On Tue, 20 Aug 2019, ro at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91498 > > Rainer Orth <ro at gcc dot gnu.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |ro at gcc dot gnu.org > > --- Comment #8 from Rainer Orth <ro at gcc dot gnu.org> --- > The new testcase FAILs on both i386-pc-solaris2.11 and x86_64-pc-linux-gnu with > -m32: > > +FAIL: gcc.target/i386/minmax-7.c scan-assembler pminsd Not for me. I've used -march=haswell in all of these testcases to rule out ISA and cost issues. So I really don't know what's the issue with these FAILs.
On Tue, 20 Aug 2019, asolokha at gmx dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91498 > > Arseny Solokha <asolokha at gmx dot com> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |asolokha at gmx dot com > > --- Comment #9 from Arseny Solokha <asolokha at gmx dot com> --- > (In reply to H.J. Lu from comment #7) > > Our SPEC CPU 2017 failed with > > > > 39 util.c:205:1: error: invalid rtl sharing found in the insn > > 40 205 | } > > 41 | ^ > > 42 (insn 29 28 3 2 (set (subreg:V2DI (reg:DI 91) 0) > > 43 (vec_concat:V2DI (mem/c:DI (plus:DI (reg/f:DI 19 frame) > > 44 (const_int -8 [0xfffffffffffffff8])) [0 S8 A64]) > > 45 (const_int 0 [0]))) "util.c":134:1 -1 > > 46 (nil)) > > 47 util.c:205:1: error: shared rtx > > 48 (mem/c:DI (plus:DI (reg/f:DI 19 frame) > > 49 (const_int -8 [0xfffffffffffffff8])) [0 S8 A64]) > > 50 during RTL pass: stv > > > > We have an invalid shared rtx. > > And the testcase is as simple as: > > int > ff (int jn) > { > return jn > 0 ? jn : 0; > } > > % x86_64-unknown-linux-gnu-gcc-10.0.0-alpha20190818 -march=nano-3000 -Os -c > f9272pqv.c > f9272pqv.c: In function 'ff': > f9272pqv.c:5:1: error: invalid rtl sharing found in the insn > 5 | } > | ^ > (insn 18 17 3 2 (set (subreg:V4SI (reg:SI 87) 0) > (vec_merge:V4SI (vec_duplicate:V4SI (mem/c:SI (plus:DI (reg/f:DI 19 > frame) > (const_int -4 [0xfffffffffffffffc])) [0 S4 A32])) > (const_vector:V4SI [ > (const_int 0 [0]) repeated x4 > ]) > (const_int 1 [0x1]))) "f9272pqv.c":3:1 -1 > (nil)) I have a patch.
(In reply to Arseny Solokha from comment #9) > (In reply to H.J. Lu from comment #7) > > Our SPEC CPU 2017 failed with > > > > 39 util.c:205:1: error: invalid rtl sharing found in the insn > > 40 205 | } > > 41 | ^ > > 42 (insn 29 28 3 2 (set (subreg:V2DI (reg:DI 91) 0) > > 43 (vec_concat:V2DI (mem/c:DI (plus:DI (reg/f:DI 19 frame) > > 44 (const_int -8 [0xfffffffffffffff8])) [0 S8 A64]) > > 45 (const_int 0 [0]))) "util.c":134:1 -1 > > 46 (nil)) > > 47 util.c:205:1: error: shared rtx > > 48 (mem/c:DI (plus:DI (reg/f:DI 19 frame) > > 49 (const_int -8 [0xfffffffffffffff8])) [0 S8 A64]) > > 50 during RTL pass: stv > > > > We have an invalid shared rtx. > > And the testcase is as simple as: > > int > ff (int jn) > { > return jn > 0 ? jn : 0; > } > > % x86_64-unknown-linux-gnu-gcc-10.0.0-alpha20190818 -march=nano-3000 -Os -c > f9272pqv.c > f9272pqv.c: In function 'ff': > f9272pqv.c:5:1: error: invalid rtl sharing found in the insn > 5 | } > | ^ > (insn 18 17 3 2 (set (subreg:V4SI (reg:SI 87) 0) > (vec_merge:V4SI (vec_duplicate:V4SI (mem/c:SI (plus:DI (reg/f:DI 19 > frame) > (const_int -4 [0xfffffffffffffffc])) [0 S4 A32])) > (const_vector:V4SI [ > (const_int 0 [0]) repeated x4 > ]) > (const_int 1 [0x1]))) "f9272pqv.c":3:1 -1 > (nil)) > f9272pqv.c:5:1: error: shared rtx > (mem/c:SI (plus:DI (reg/f:DI 19 frame) > (const_int -4 [0xfffffffffffffffc])) [0 S4 A32]) > during RTL pass: stv > f9272pqv.c:5:1: internal compiler error: internal consistency failure > 0x9f3eca verify_rtx_sharing > /var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190818/work/gcc-10-20190818/ > gcc/emit-rtl.c:2927 > 0x9f3dba verify_rtx_sharing > /var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190818/work/gcc-10-20190818/ > gcc/emit-rtl.c:2942 > 0x9f3dba verify_rtx_sharing > /var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190818/work/gcc-10-20190818/ > gcc/emit-rtl.c:2942 > 0x9f3dba verify_rtx_sharing > /var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190818/work/gcc-10-20190818/ > gcc/emit-rtl.c:2942 > 0x9f4178 verify_insn_sharing > /var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190818/work/gcc-10-20190818/ > gcc/emit-rtl.c:3013 > 0x9f79ed verify_rtl_sharing() > /var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190818/work/gcc-10-20190818/ > gcc/emit-rtl.c:3036 > 0xc7a8f3 execute_function_todo > /var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190818/work/gcc-10-20190818/ > gcc/passes.c:2004 > 0xc7b536 execute_todo > /var/tmp/portage/sys-devel/gcc-10.0.0_alpha20190818/work/gcc-10-20190818/ > gcc/passes.c:2037 sounds like PR91503
(In reply to rguenther@suse.de from comment #10) > > +FAIL: gcc.target/i386/minmax-7.c scan-assembler pminsd > > Not for me. I've used -march=haswell in all of these testcases to > rule out ISA and cost issues. So I really don't know what's the issue > with these FAILs. IIRC 32bit Solaris doesn't guarantee 16byte stack alignment, so STV is disabled.
Author: rguenth Date: Wed Aug 21 08:44:59 2019 New Revision: 274792 URL: https://gcc.gnu.org/viewcvs?rev=274792&root=gcc&view=rev Log: 2019-08-21 Richard Biener <rguenther@suse.de> PR target/91498 PR target/91503 * config/i386/i386-features.c (general_scalar_chain::make_vector_copies): Copy stack temporary rtx when using it multiple times. (general_scalar_chain::convert_reg): Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386-features.c
Created attachment 46737 [details] Patch for 32-bit Solaris/x86 failures
> --- Comment #13 from Uroš Bizjak <ubizjak at gmail dot com> --- > (In reply to rguenther@suse.de from comment #10) >> > +FAIL: gcc.target/i386/minmax-7.c scan-assembler pminsd >> >> Not for me. I've used -march=haswell in all of these testcases to >> rule out ISA and cost issues. So I really don't know what's the issue >> with these FAILs. > > IIRC 32bit Solaris doesn't guarantee 16byte stack alignment, so STV is > disabled. Indeed: adding -mno-stackrealign (as is done in a couple of other STV tests already) fixes this failure and the ones reported in PR rtl-optimization/91154. Tested with the appropriate runtest invocation on i386-pc-solaris2.11 and x86_64-pc-linux-gnu (both multilibs each).
Can the issue be closed as resolved?
(In reply to Martin Liška from comment #17) > Can the issue be closed as resolved? I've fixed a -fno-common issue on the tester and will report back after the next run.
Doesn't seem fixed.
The patch from comment#4 got installed so re-analysis is necessary I think.
GCC 10.1 has been released.
GCC 10.2 is released, adjusting target milestone.
GCC 10.3 is being released, retargeting bugs to GCC 10.4.
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
GCC 10 branch is being closed.
GCC 11 branch is being closed.