65710 – [4.9/5 Regression] Thumb1 ICE caused by no register to spill

Bug 65710 - [4.9/5 Regression] Thumb1 ICE caused by no register to spill

Summary: [4.9/5 Regression] Thumb1 ICE caused by no register to spill

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	5.0

Importance:	P1 normal
Target Milestone:	4.9.3
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2015-04-09 07:03 UTC by Terry Guo
Modified:	2018-10-15 02:16 UTC (History)
CC List:	6 users (show)

See Also:	78911
Host:
Target:	arm-none-eabi
Build:
Known to work:
Known to fail:
Last reconfirmed:	2015-04-10 00:00:00

Attachments
test case (25.75 KB, text/plain) 2015-04-09 07:03 UTC, Terry Guo	Details
A late reduced test case (964 bytes, text/x-csrc) 2015-04-10 05:59 UTC, Terry Guo	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Terry Guo 2015-04-09 07:03:26 UTC

Created attachment 35268 [details]
test case

Revision 221867 is the fix to PR65647. But it causes another ICE for thumb1 target as below:

$./install/bin/arm-none-eabi-gcc -mthumb -O2 -S regex.i 
regex.i: In function ‘byte_re_match_2_internal’:
regex.i:10121:1: error: unable to find a register to spill
 }
 ^
regex.i:10121:1: error: this is the insn:
(insn 12390 15093 12392 2 (set (reg:SI 8731 [8730])
        (plus:SI (reg:SI 8731 [8730])
            (reg:SI 10838))) regex.i:8483 718 {*thumb1_addsi3}
     (expr_list:REG_DEAD (reg:SI 10838)
        (nil)))
regex.i:10121:1: internal compiler error: in assign_by_spills, at lra-assigns.c:1419
0xa36eea _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
	../../src/gcc/gcc/rtl-error.c:110
0x9521e7 assign_by_spills
	../../src/gcc/gcc/lra-assigns.c:1419
0x952bd3 lra_assign()
	../../src/gcc/gcc/lra-assigns.c:1594
0x94e919 lra(_IO_FILE*)
	../../src/gcc/gcc/lra.c:2360
0x90cc09 do_reload
	../../src/gcc/gcc/ira.c:5418
0x90cc09 execute
	../../src/gcc/gcc/ira.c:5589
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.

This issue still happens to the latest trunk.

Comment 1 Richard Biener 2015-04-09 09:18:30 UTC

Needs confirming/reducing.

Comment 2 Terry Guo 2015-04-09 09:30:04 UTC

(In reply to Richard Biener from comment #1)
> Needs confirming/reducing.

Working on reducing.

Comment 3 Vladimir Makarov 2015-04-09 19:40:41 UTC

Author: vmakarov
Date: Thu Apr  9 19:40:09 2015
New Revision: 221956

URL: https://gcc.gnu.org/viewcvs?rev=221956&root=gcc&view=rev
Log:
2015-04-09  Vladimir Makarov  <vmakarov@redhat.com>

	PR target/65710
	* lra-int.h (lra_bad_spill_regno_start): New.
	* lra.c (lra_bad_spill_regno_start): New.
	(lra): Set up lra_bad_spill_regno_start.  Set up
	lra_constraint_new_regno_start unconditionally.
	* lra-assigns.c (spill_for): Use lra_bad_spill_regno_start for
	spill preferences.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/lra-assigns.c
    trunk/gcc/lra-int.h
    trunk/gcc/lra.c

Comment 4 Vladimir Makarov 2015-04-09 19:42:55 UTC

Author: vmakarov
Date: Thu Apr  9 19:42:24 2015
New Revision: 221957

URL: https://gcc.gnu.org/viewcvs?rev=221957&root=gcc&view=rev
Log:
2015-04-09  Vladimir Makarov  <vmakarov@redhat.com>

	PR target/65710
	* lra-int.h (lra_bad_spill_regno_start): New.
	* lra.c (lra_bad_spill_regno_start): New.
	(lra): Set up lra_bad_spill_regno_start.  Set up
	lra_constraint_new_regno_start unconditionally.
	* lra-assigns.c (spill_for): Use lra_bad_spill_regno_start for
	spill preferences.


Modified:
    branches/gcc-4_9-branch/gcc/ChangeLog
    branches/gcc-4_9-branch/gcc/lra-assigns.c
    branches/gcc-4_9-branch/gcc/lra-int.h
    branches/gcc-4_9-branch/gcc/lra.c

Comment 5 Jakub Jelinek 2015-04-09 19:47:52 UTC

Fixed, thanks.

Comment 6 Terry Guo 2015-04-10 05:59:49 UTC

Created attachment 35285 [details]
A late reduced test case

Comment 7 Jakub Jelinek 2015-04-10 06:29:48 UTC

The testcase is ok for trunk with proper ChangeLog and if placed into gcc.target/arm.

Comment 8 Stupachenko Evgeny 2015-04-10 11:30:33 UTC

The patch most likely (currently bisecting to get exact commit) caused 15% regression on spec2000 164.gzip test compiled with "-Ofast -funroll-loops -flto -march=slm -m32 -fPIE -pie".
(reproducible on Sandybridge).

Comment 9 Christophe Lyon 2015-04-10 11:44:41 UTC

I re-open the bug, since in the 4.9-branch the gcc.target/arm/pr65647-2.c test now times out after this patch, on arm* targets.

Comment 10 Yvan Roux 2015-04-10 13:05:29 UTC

LRA is also stuck in a loop when building gcc.target/arm/pr65647-2.c with trunk.

Comment 11 Yvan Roux 2015-04-10 13:16:54 UTC

Re-introducing the condition that stop updating lra_constraint_new_regno_start after switching off inheritance and remat, seems to fix the problem: gcc.target/arm/pr65647-2.c is ok and the reduced test case from Terry builds as well, at least with my quick test on 4.9 branch.  I'll do it on trunk and pass a full validation.  

Is it the correct fix for you Vlad ?

Comment 12 Stupachenko Evgeny 2015-04-10 14:05:36 UTC

(In reply to Stupachenko Evgeny from comment #8)
> The patch most likely (currently bisecting to get exact commit) caused 15%
> regression on spec2000 164.gzip test compiled with "-Ofast -funroll-loops
> -flto -march=slm -m32 -fPIE -pie".
> (reproducible on Sandybridge).

Confirmed. The commit caused 15% regression on 164.gzip

Comment 13 Stupachenko Evgeny 2015-04-10 15:25:21 UTC

The issue reproduced with -march=corei7 as well.

Currently the hottest loop looks as following:
L1.
mov    (%esp),%ebp /* potentially redundant.  */
and    $0x7fff,%ebp                                          
movzwl 0x400(%esi,%ebp,2),%ecx                               
mov    %ecx,(%esp) 
cmp    %ecx,0x4(%esp) /*  Why not reuse ebp?  */        
jae    380                                                   
subl   $0x1,0x8(%esp)                                        
je     380                                                   
mov    (%esp),%ebp /* ebp could contain correct value (if reused above).  */
add    %edi,%ebp                                             
cmp    0x0(%ebp,%ebx,1),%al                                  
jne    L1
movzbl 0xf(%esp),%edx                                       
cmp    -0x1(%ebp,%ebx,1),%dl                               
jne    L1

Unnecessary fills and spills of ebp added.
Before the patch the loop was 2 instruction shorter.

L1.
and    $0x7fff,%edx                                             
movzwl 0x400(%esi,%edx,2),%edx                                  
cmp    %edx,0x4(%esp)                                           
jae    3c0                                                      
sub    $0x1,%ebp                                                
je     3c0                                                      
lea    (%edi,%edx,1),%ecx                                       
movzbl (%esp),%eax                                              
cmp    (%ecx,%ebx,1),%al                                        
jne    L1
movzbl 0xb(%esp),%eax                                           
cmp    -0x1(%ecx,%ebx,1),%al                                    
jne    L1.

Please let me know if you need dumps.

Comment 14 Vladimir Makarov 2015-04-10 16:19:50 UTC

(In reply to Stupachenko Evgeny from comment #13)
> The issue reproduced with -march=corei7 as well.
> 
> Currently the hottest loop looks as following:
> L1.
> mov    (%esp),%ebp /* potentially redundant.  */
> and    $0x7fff,%ebp                                          
> movzwl 0x400(%esi,%ebp,2),%ecx                               
> mov    %ecx,(%esp) 
> cmp    %ecx,0x4(%esp) /*  Why not reuse ebp?  */        
> jae    380                                                   
> subl   $0x1,0x8(%esp)                                        
> je     380                                                   
> mov    (%esp),%ebp /* ebp could contain correct value (if reused above).  */
> add    %edi,%ebp                                             
> cmp    0x0(%ebp,%ebx,1),%al                                  
> jne    L1
> movzbl 0xf(%esp),%edx                                       
> cmp    -0x1(%ebp,%ebx,1),%dl                               
> jne    L1
> 
> Unnecessary fills and spills of ebp added.
> Before the patch the loop was 2 instruction shorter.
> 
> L1.
> and    $0x7fff,%edx                                             
> movzwl 0x400(%esi,%edx,2),%edx                                  
> cmp    %edx,0x4(%esp)                                           
> jae    3c0                                                      
> sub    $0x1,%ebp                                                
> je     3c0                                                      
> lea    (%edi,%edx,1),%ecx                                       
> movzbl (%esp),%eax                                              
> cmp    (%ecx,%ebx,1),%al                                        
> jne    L1
> movzbl 0xb(%esp),%eax                                           
> cmp    -0x1(%ecx,%ebx,1),%al                                    
> jne    L1.
> 
> Please let me know if you need dumps.

Could you add a pre-processed file with the hot loop.

Comment 15 Jakub Jelinek 2015-04-10 16:34:57 UTC

To be clear, I think we can postpone fixing the performance issue till 5.2/6.
The reason this is a P1 is #c9, #c10, #c11.  Vlad, can you please have a look at that?

Comment 16 Stupachenko Evgeny 2015-04-10 16:38:54 UTC

I can't attach spec2000 benchmarks sources.
The loop is in "longest_match" function in 164.gzip.

Options to reproduce: "-Ofast -funroll-loops -flto -march=corei7 -m32 -fPIE -pie" (-march=slm reproduce the issue as well).

I'm checking if the issue reproduced without "-flto".

Comment 17 Jakub Jelinek 2015-04-10 16:41:51 UTC

(In reply to Stupachenko Evgeny from comment #16)
> I can't attach spec2000 benchmarks sources.
> The loop is in "longest_match" function in 164.gzip.
> 
> Options to reproduce: "-Ofast -funroll-loops -flto -march=corei7 -m32 -fPIE
> -pie" (-march=slm reproduce the issue as well).
> 
> I'm checking if the issue reproduced without "-flto".

But gzip is free software, so just attach preprocessed source of some free version from around the same time after checking it generates the same code for the loop?

Comment 18 Vladimir Makarov 2015-04-10 16:42:45 UTC

(In reply to Jakub Jelinek from comment #15)
> To be clear, I think we can postpone fixing the performance issue till 5.2/6.
> The reason this is a P1 is #c9, #c10, #c11.  Vlad, can you please have a
> look at that?

Ok.  I'll be working on it for gcc-4.9.

Thanks for the clarification.

Comment 19 Jakub Jelinek 2015-04-10 16:45:23 UTC

(In reply to Vladimir Makarov from comment #18)
> (In reply to Jakub Jelinek from comment #15)
> > To be clear, I think we can postpone fixing the performance issue till 5.2/6.
> > The reason this is a P1 is #c9, #c10, #c11.  Vlad, can you please have a
> > look at that?
> 
> Ok.  I'll be working on it for gcc-4.9.
> 
> Thanks for the clarification.

#c10 says it happens on the trunk too.

Comment 20 Stupachenko Evgeny 2015-04-10 16:47:09 UTC

(In reply to Stupachenko Evgeny from comment #16)

> I'm checking if the issue reproduced without "-flto".

Unfortunately, "-flto" required to reproduce the issue.

Comment 21 Yvan Roux 2015-04-10 16:55:29 UTC

(In reply to Jakub Jelinek from comment #19)
> (In reply to Vladimir Makarov from comment #18)
> > (In reply to Jakub Jelinek from comment #15)
> > > To be clear, I think we can postpone fixing the performance issue till 5.2/6.
> > > The reason this is a P1 is #c9, #c10, #c11.  Vlad, can you please have a
> > > look at that?
> > 
> > Ok.  I'll be working on it for gcc-4.9.
> > 
> > Thanks for the clarification.
> 
> #c10 says it happens on the trunk too.

I can't check full validation results right now, but the suggested fix in #c11 seems to fix the issue

Comment 22 Jakub Jelinek 2015-04-10 17:08:51 UTC

(In reply to Yvan Roux from comment #21)
> (In reply to Jakub Jelinek from comment #19)
> > (In reply to Vladimir Makarov from comment #18)
> > > (In reply to Jakub Jelinek from comment #15)
> > > > To be clear, I think we can postpone fixing the performance issue till 5.2/6.
> > > > The reason this is a P1 is #c9, #c10, #c11.  Vlad, can you please have a
> > > > look at that?
> > > 
> > > Ok.  I'll be working on it for gcc-4.9.
> > > 
> > > Thanks for the clarification.
> > 
> > #c10 says it happens on the trunk too.
> 
> I can't check full validation results right now, but the suggested fix in
> #c11 seems to fix the issue

Why is pr65648-2.c testcase missing from the trunk?  I see it only on the 4.9 branch...

Comment 23 Yvan Roux 2015-04-10 18:23:31 UTC

(In reply to Jakub Jelinek from comment #22)

> Why is pr65648-2.c testcase missing from the trunk?  I see it only on the
> 4.9 branch...

Yes my fault, I was dealing with this 4.9 regression when PR65648 was fixed on trunk, so I proposed the backporting fix for 4.9 branch with the testcase pr65648-2.c which was failing on 4.9 and not on trunk.  I should have committed it on trunk as well.

Comment 24 Jakub Jelinek 2015-04-10 18:53:15 UTC

(In reply to Yvan Roux from comment #23)
> I should have committed it on trunk as well.

Please do so now, thanks.

Comment 25 Vladimir Makarov 2015-04-10 19:39:26 UTC

Author: vmakarov
Date: Fri Apr 10 19:38:55 2015
New Revision: 221983

URL: https://gcc.gnu.org/viewcvs?rev=221983&root=gcc&view=rev
Log:
2015-04-10  Vladimir Makarov  <vmakarov@redhat.com>

	PR target/65710
	* lra-assigns.c (spill_for): Update smallest_bad_spills_num.
	Print bad_spills_num and insn_pseudos_num.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/lra-assigns.c

Comment 26 Vladimir Makarov 2015-04-10 19:43:59 UTC

Author: vmakarov
Date: Fri Apr 10 19:43:28 2015
New Revision: 221984

URL: https://gcc.gnu.org/viewcvs?rev=221984&root=gcc&view=rev
Log:
2015-04-10  Vladimir Makarov  <vmakarov@redhat.com>

	PR target/65710
	* lra-assigns.c (spill_for): Update smallest_bad_spills_num.
	Print bad_spills_num and insn_pseudos_num.


Modified:
    branches/gcc-4_9-branch/gcc/ChangeLog
    branches/gcc-4_9-branch/gcc/lra-assigns.c

Comment 27 Vladimir Makarov 2015-04-10 19:46:29 UTC

(In reply to Stupachenko Evgeny from comment #16)
> I can't attach spec2000 benchmarks sources.
> The loop is in "longest_match" function in 164.gzip.
> 
> Options to reproduce: "-Ofast -funroll-loops -flto -march=corei7 -m32 -fPIE
> -pie" (-march=slm reproduce the issue as well).
> 
> I'm checking if the issue reproduced without "-flto".

Evgeny, could you check the effect of my latest patch (c25) on gzip.

Comment 28 Yvan Roux 2015-04-10 20:44:45 UTC

(In reply to Vladimir Makarov from comment #25)
> Author: vmakarov
> Date: Fri Apr 10 19:38:55 2015
> New Revision: 221983
> 
> URL: https://gcc.gnu.org/viewcvs?rev=221983&root=gcc&view=rev
> Log:
> 2015-04-10  Vladimir Makarov  <vmakarov@redhat.com>
> 
> 	PR target/65710
> 	* lra-assigns.c (spill_for): Update smallest_bad_spills_num.
> 	Print bad_spills_num and insn_pseudos_num.

That indeed makes more sense than my workaround, I'm progressing in LRA but there is still way to go ! ;) Thanks Vlad

Comment 29 Stupachenko Evgeny 2015-04-10 20:49:51 UTC

>Evgeny, could you check the effect of my latest patch (c25) on gzip.

The performance is back. Thanks.

Comment 30 Jakub Jelinek 2015-04-11 07:31:59 UTC

So hopefully fixed now.

Comment 31 xuepeng guo 2015-04-13 05:22:41 UTC

Author: xguo
Date: Mon Apr 13 05:22:09 2015
New Revision: 222037

URL: https://gcc.gnu.org/viewcvs?rev=222037&root=gcc&view=rev
Log:
Add missing test case

2015-04-13  Terry Guo  <terry.guo@arm.com>

        PR target/65710
        * gcc.target/arm/pr65710.c: New.

Added:
    trunk/gcc/testsuite/gcc.target/arm/pr65710.c
Modified:
    trunk/gcc/testsuite/ChangeLog

Comment 32 Christophe Lyon 2015-04-13 12:15:52 UTC

> 2015-04-13  Terry Guo  <terry.guo@arm.com>
> 
>         PR target/65710
>         * gcc.target/arm/pr65710.c: New.
> 

Terry, any particular reason you use -march=armv6-m instead of -march=armv6 ?

Some of my test configurations add -marm to RUNTESTFLAGS, and they fail because:
error: target CPU does not support ARM mode

Can we switch to armv6, or do we need a few additional guards to avoid running this test in unsupported configurations?

Comment 33 Terry Guo 2015-04-14 03:15:12 UTC

(In reply to clyon from comment #32)
> > 2015-04-13  Terry Guo  <terry.guo@arm.com>
> > 
> >         PR target/65710
> >         * gcc.target/arm/pr65710.c: New.
> > 
> 
> Terry, any particular reason you use -march=armv6-m instead of -march=armv6 ?
> 
> Some of my test configurations add -marm to RUNTESTFLAGS, and they fail
> because:
> error: target CPU does not support ARM mode
> 
> Can we switch to armv6, or do we need a few additional guards to avoid
> running this test in unsupported configurations?

Thanks for reminding. I just made a stupid copy-paste error here. The correct options should be "-mthumb -O2 -mfloat-abi=soft" as shown in comment#1. I reused some test case template and forgot to update the options. I will fix this soon.