Created attachment 35268 [details] test case Revision 221867 is the fix to PR65647. But it causes another ICE for thumb1 target as below: $./install/bin/arm-none-eabi-gcc -mthumb -O2 -S regex.i regex.i: In function ‘byte_re_match_2_internal’: regex.i:10121:1: error: unable to find a register to spill } ^ regex.i:10121:1: error: this is the insn: (insn 12390 15093 12392 2 (set (reg:SI 8731 [8730]) (plus:SI (reg:SI 8731 [8730]) (reg:SI 10838))) regex.i:8483 718 {*thumb1_addsi3} (expr_list:REG_DEAD (reg:SI 10838) (nil))) regex.i:10121:1: internal compiler error: in assign_by_spills, at lra-assigns.c:1419 0xa36eea _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) ../../src/gcc/gcc/rtl-error.c:110 0x9521e7 assign_by_spills ../../src/gcc/gcc/lra-assigns.c:1419 0x952bd3 lra_assign() ../../src/gcc/gcc/lra-assigns.c:1594 0x94e919 lra(_IO_FILE*) ../../src/gcc/gcc/lra.c:2360 0x90cc09 do_reload ../../src/gcc/gcc/ira.c:5418 0x90cc09 execute ../../src/gcc/gcc/ira.c:5589 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <http://gcc.gnu.org/bugs.html> for instructions. This issue still happens to the latest trunk.
Needs confirming/reducing.
(In reply to Richard Biener from comment #1) > Needs confirming/reducing. Working on reducing.
Author: vmakarov Date: Thu Apr 9 19:40:09 2015 New Revision: 221956 URL: https://gcc.gnu.org/viewcvs?rev=221956&root=gcc&view=rev Log: 2015-04-09 Vladimir Makarov <vmakarov@redhat.com> PR target/65710 * lra-int.h (lra_bad_spill_regno_start): New. * lra.c (lra_bad_spill_regno_start): New. (lra): Set up lra_bad_spill_regno_start. Set up lra_constraint_new_regno_start unconditionally. * lra-assigns.c (spill_for): Use lra_bad_spill_regno_start for spill preferences. Modified: trunk/gcc/ChangeLog trunk/gcc/lra-assigns.c trunk/gcc/lra-int.h trunk/gcc/lra.c
Author: vmakarov Date: Thu Apr 9 19:42:24 2015 New Revision: 221957 URL: https://gcc.gnu.org/viewcvs?rev=221957&root=gcc&view=rev Log: 2015-04-09 Vladimir Makarov <vmakarov@redhat.com> PR target/65710 * lra-int.h (lra_bad_spill_regno_start): New. * lra.c (lra_bad_spill_regno_start): New. (lra): Set up lra_bad_spill_regno_start. Set up lra_constraint_new_regno_start unconditionally. * lra-assigns.c (spill_for): Use lra_bad_spill_regno_start for spill preferences. Modified: branches/gcc-4_9-branch/gcc/ChangeLog branches/gcc-4_9-branch/gcc/lra-assigns.c branches/gcc-4_9-branch/gcc/lra-int.h branches/gcc-4_9-branch/gcc/lra.c
Fixed, thanks.
Created attachment 35285 [details] A late reduced test case
The testcase is ok for trunk with proper ChangeLog and if placed into gcc.target/arm.
The patch most likely (currently bisecting to get exact commit) caused 15% regression on spec2000 164.gzip test compiled with "-Ofast -funroll-loops -flto -march=slm -m32 -fPIE -pie". (reproducible on Sandybridge).
I re-open the bug, since in the 4.9-branch the gcc.target/arm/pr65647-2.c test now times out after this patch, on arm* targets.
LRA is also stuck in a loop when building gcc.target/arm/pr65647-2.c with trunk.
Re-introducing the condition that stop updating lra_constraint_new_regno_start after switching off inheritance and remat, seems to fix the problem: gcc.target/arm/pr65647-2.c is ok and the reduced test case from Terry builds as well, at least with my quick test on 4.9 branch. I'll do it on trunk and pass a full validation. Is it the correct fix for you Vlad ?
(In reply to Stupachenko Evgeny from comment #8) > The patch most likely (currently bisecting to get exact commit) caused 15% > regression on spec2000 164.gzip test compiled with "-Ofast -funroll-loops > -flto -march=slm -m32 -fPIE -pie". > (reproducible on Sandybridge). Confirmed. The commit caused 15% regression on 164.gzip
The issue reproduced with -march=corei7 as well. Currently the hottest loop looks as following: L1. mov (%esp),%ebp /* potentially redundant. */ and $0x7fff,%ebp movzwl 0x400(%esi,%ebp,2),%ecx mov %ecx,(%esp) cmp %ecx,0x4(%esp) /* Why not reuse ebp? */ jae 380 subl $0x1,0x8(%esp) je 380 mov (%esp),%ebp /* ebp could contain correct value (if reused above). */ add %edi,%ebp cmp 0x0(%ebp,%ebx,1),%al jne L1 movzbl 0xf(%esp),%edx cmp -0x1(%ebp,%ebx,1),%dl jne L1 Unnecessary fills and spills of ebp added. Before the patch the loop was 2 instruction shorter. L1. and $0x7fff,%edx movzwl 0x400(%esi,%edx,2),%edx cmp %edx,0x4(%esp) jae 3c0 sub $0x1,%ebp je 3c0 lea (%edi,%edx,1),%ecx movzbl (%esp),%eax cmp (%ecx,%ebx,1),%al jne L1 movzbl 0xb(%esp),%eax cmp -0x1(%ecx,%ebx,1),%al jne L1. Please let me know if you need dumps.
(In reply to Stupachenko Evgeny from comment #13) > The issue reproduced with -march=corei7 as well. > > Currently the hottest loop looks as following: > L1. > mov (%esp),%ebp /* potentially redundant. */ > and $0x7fff,%ebp > movzwl 0x400(%esi,%ebp,2),%ecx > mov %ecx,(%esp) > cmp %ecx,0x4(%esp) /* Why not reuse ebp? */ > jae 380 > subl $0x1,0x8(%esp) > je 380 > mov (%esp),%ebp /* ebp could contain correct value (if reused above). */ > add %edi,%ebp > cmp 0x0(%ebp,%ebx,1),%al > jne L1 > movzbl 0xf(%esp),%edx > cmp -0x1(%ebp,%ebx,1),%dl > jne L1 > > Unnecessary fills and spills of ebp added. > Before the patch the loop was 2 instruction shorter. > > L1. > and $0x7fff,%edx > movzwl 0x400(%esi,%edx,2),%edx > cmp %edx,0x4(%esp) > jae 3c0 > sub $0x1,%ebp > je 3c0 > lea (%edi,%edx,1),%ecx > movzbl (%esp),%eax > cmp (%ecx,%ebx,1),%al > jne L1 > movzbl 0xb(%esp),%eax > cmp -0x1(%ecx,%ebx,1),%al > jne L1. > > Please let me know if you need dumps. Could you add a pre-processed file with the hot loop.
To be clear, I think we can postpone fixing the performance issue till 5.2/6. The reason this is a P1 is #c9, #c10, #c11. Vlad, can you please have a look at that?
I can't attach spec2000 benchmarks sources. The loop is in "longest_match" function in 164.gzip. Options to reproduce: "-Ofast -funroll-loops -flto -march=corei7 -m32 -fPIE -pie" (-march=slm reproduce the issue as well). I'm checking if the issue reproduced without "-flto".
(In reply to Stupachenko Evgeny from comment #16) > I can't attach spec2000 benchmarks sources. > The loop is in "longest_match" function in 164.gzip. > > Options to reproduce: "-Ofast -funroll-loops -flto -march=corei7 -m32 -fPIE > -pie" (-march=slm reproduce the issue as well). > > I'm checking if the issue reproduced without "-flto". But gzip is free software, so just attach preprocessed source of some free version from around the same time after checking it generates the same code for the loop?
(In reply to Jakub Jelinek from comment #15) > To be clear, I think we can postpone fixing the performance issue till 5.2/6. > The reason this is a P1 is #c9, #c10, #c11. Vlad, can you please have a > look at that? Ok. I'll be working on it for gcc-4.9. Thanks for the clarification.
(In reply to Vladimir Makarov from comment #18) > (In reply to Jakub Jelinek from comment #15) > > To be clear, I think we can postpone fixing the performance issue till 5.2/6. > > The reason this is a P1 is #c9, #c10, #c11. Vlad, can you please have a > > look at that? > > Ok. I'll be working on it for gcc-4.9. > > Thanks for the clarification. #c10 says it happens on the trunk too.
(In reply to Stupachenko Evgeny from comment #16) > I'm checking if the issue reproduced without "-flto". Unfortunately, "-flto" required to reproduce the issue.
(In reply to Jakub Jelinek from comment #19) > (In reply to Vladimir Makarov from comment #18) > > (In reply to Jakub Jelinek from comment #15) > > > To be clear, I think we can postpone fixing the performance issue till 5.2/6. > > > The reason this is a P1 is #c9, #c10, #c11. Vlad, can you please have a > > > look at that? > > > > Ok. I'll be working on it for gcc-4.9. > > > > Thanks for the clarification. > > #c10 says it happens on the trunk too. I can't check full validation results right now, but the suggested fix in #c11 seems to fix the issue
(In reply to Yvan Roux from comment #21) > (In reply to Jakub Jelinek from comment #19) > > (In reply to Vladimir Makarov from comment #18) > > > (In reply to Jakub Jelinek from comment #15) > > > > To be clear, I think we can postpone fixing the performance issue till 5.2/6. > > > > The reason this is a P1 is #c9, #c10, #c11. Vlad, can you please have a > > > > look at that? > > > > > > Ok. I'll be working on it for gcc-4.9. > > > > > > Thanks for the clarification. > > > > #c10 says it happens on the trunk too. > > I can't check full validation results right now, but the suggested fix in > #c11 seems to fix the issue Why is pr65648-2.c testcase missing from the trunk? I see it only on the 4.9 branch...
(In reply to Jakub Jelinek from comment #22) > Why is pr65648-2.c testcase missing from the trunk? I see it only on the > 4.9 branch... Yes my fault, I was dealing with this 4.9 regression when PR65648 was fixed on trunk, so I proposed the backporting fix for 4.9 branch with the testcase pr65648-2.c which was failing on 4.9 and not on trunk. I should have committed it on trunk as well.
(In reply to Yvan Roux from comment #23) > I should have committed it on trunk as well. Please do so now, thanks.
Author: vmakarov Date: Fri Apr 10 19:38:55 2015 New Revision: 221983 URL: https://gcc.gnu.org/viewcvs?rev=221983&root=gcc&view=rev Log: 2015-04-10 Vladimir Makarov <vmakarov@redhat.com> PR target/65710 * lra-assigns.c (spill_for): Update smallest_bad_spills_num. Print bad_spills_num and insn_pseudos_num. Modified: trunk/gcc/ChangeLog trunk/gcc/lra-assigns.c
Author: vmakarov Date: Fri Apr 10 19:43:28 2015 New Revision: 221984 URL: https://gcc.gnu.org/viewcvs?rev=221984&root=gcc&view=rev Log: 2015-04-10 Vladimir Makarov <vmakarov@redhat.com> PR target/65710 * lra-assigns.c (spill_for): Update smallest_bad_spills_num. Print bad_spills_num and insn_pseudos_num. Modified: branches/gcc-4_9-branch/gcc/ChangeLog branches/gcc-4_9-branch/gcc/lra-assigns.c
(In reply to Stupachenko Evgeny from comment #16) > I can't attach spec2000 benchmarks sources. > The loop is in "longest_match" function in 164.gzip. > > Options to reproduce: "-Ofast -funroll-loops -flto -march=corei7 -m32 -fPIE > -pie" (-march=slm reproduce the issue as well). > > I'm checking if the issue reproduced without "-flto". Evgeny, could you check the effect of my latest patch (c25) on gzip.
(In reply to Vladimir Makarov from comment #25) > Author: vmakarov > Date: Fri Apr 10 19:38:55 2015 > New Revision: 221983 > > URL: https://gcc.gnu.org/viewcvs?rev=221983&root=gcc&view=rev > Log: > 2015-04-10 Vladimir Makarov <vmakarov@redhat.com> > > PR target/65710 > * lra-assigns.c (spill_for): Update smallest_bad_spills_num. > Print bad_spills_num and insn_pseudos_num. That indeed makes more sense than my workaround, I'm progressing in LRA but there is still way to go ! ;) Thanks Vlad
>Evgeny, could you check the effect of my latest patch (c25) on gzip. The performance is back. Thanks.
So hopefully fixed now.
Author: xguo Date: Mon Apr 13 05:22:09 2015 New Revision: 222037 URL: https://gcc.gnu.org/viewcvs?rev=222037&root=gcc&view=rev Log: Add missing test case 2015-04-13 Terry Guo <terry.guo@arm.com> PR target/65710 * gcc.target/arm/pr65710.c: New. Added: trunk/gcc/testsuite/gcc.target/arm/pr65710.c Modified: trunk/gcc/testsuite/ChangeLog
> 2015-04-13 Terry Guo <terry.guo@arm.com> > > PR target/65710 > * gcc.target/arm/pr65710.c: New. > Terry, any particular reason you use -march=armv6-m instead of -march=armv6 ? Some of my test configurations add -marm to RUNTESTFLAGS, and they fail because: error: target CPU does not support ARM mode Can we switch to armv6, or do we need a few additional guards to avoid running this test in unsupported configurations?
(In reply to clyon from comment #32) > > 2015-04-13 Terry Guo <terry.guo@arm.com> > > > > PR target/65710 > > * gcc.target/arm/pr65710.c: New. > > > > Terry, any particular reason you use -march=armv6-m instead of -march=armv6 ? > > Some of my test configurations add -marm to RUNTESTFLAGS, and they fail > because: > error: target CPU does not support ARM mode > > Can we switch to armv6, or do we need a few additional guards to avoid > running this test in unsupported configurations? Thanks for reminding. I just made a stupid copy-paste error here. The correct options should be "-mthumb -O2 -mfloat-abi=soft" as shown in comment#1. I reused some test case template and forgot to update the options. I will fix this soon.