Bug 48678 - unable to find a register to spill in class ‘GENERAL_REGS’
Summary: unable to find a register to spill in class ‘GENERAL_REGS’
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.6.0
: P3 normal
Target Milestone: 4.7.0
Assignee: Uroš Bizjak
URL: http://gcc.gnu.org/ml/gcc-patches/201...
Keywords:
Depends on:
Blocks:
 
Reported: 2011-04-19 06:56 UTC by Matthias Kretz
Modified: 2011-04-20 20:01 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2011-04-19 07:28:19


Attachments
testcase (934 bytes, text/x-c++src)
2011-04-19 06:56 UTC, Matthias Kretz
Details
gcc46-pr48678.patch (699 bytes, patch)
2011-04-19 12:13 UTC, Jakub Jelinek
Details | Diff
Expand movstrict pattern as pinsr for vector subregs. (1.44 KB, patch)
2011-04-20 13:29 UTC, Uroš Bizjak
Details | Diff
Expand movstrict pattern as pinsr for vector subregs, v2. (1.36 KB, patch)
2011-04-20 17:41 UTC, Uroš Bizjak
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Matthias Kretz 2011-04-19 06:56:50 UTC
Created attachment 24040 [details]
testcase

compile the attached testcase with
g++ -msse4 -O1 no_register_error.cpp -c

to get the problem:
no_register_error.cpp: In function ‘int main()’:
no_register_error.cpp:77:1: error: unable to find a register to spill in class ‘GENERAL_REGS’
no_register_error.cpp:77:1: error: this is the insn:
(insn 63 62 64 4 (set (strict_low_part (subreg:HI (reg/v:V2DI 128 [ b ]) 0))
        (mem/s/j:HI (plus:DI (plus:DI (mult:DI (reg:DI 38 r9 [orig:159 indexes ] [159])
                        (const_int 2 [0x2]))
                    (reg/f:DI 20 frame))
                (const_int -64 [0xffffffffffffffc0])) [0 array S2 A16])) no_register_error.cpp:45 82 {*movstricthi_1}
     (expr_list:REG_DEAD (reg:DI 38 r9 [orig:159 indexes ] [159])
        (nil)))
no_register_error.cpp:77: confused by earlier errors, bailing out

The same code worked fine with previous version. I also tested the latest GCC 4.6.1 snapshot and the problem remains.
Comment 1 Jakub Jelinek 2011-04-19 07:28:19 UTC
ICE started with http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=161655
Comment 2 Jakub Jelinek 2011-04-19 07:57:37 UTC
Having (strict_low_part (subreg:HI (reg:V2DI ...) 0)) on the LHS doesn't sound like a good idea to me (and this is created already during expansion).  Maybe reload should be supposed to handle that (it could copy the V2DI reg into a general reg, then do the movstrict on the general reg and then copy it back), but IMHO it is never going to lead to efficient code.  The following patch fixes the ICE for me, but dunno if that is how we want to fix it.
Before the r161655 change arbitrary VCEs (in form of MEM_EXPR) weren't allowed on
the LHS, so this wasn't a problem.

--- gcc/config/i386/i386.md.jj	2011-04-08 13:40:40.000000000 +0200
+++ gcc/config/i386/i386.md	2011-04-19 09:43:39.000000000 +0200
@@ -2408,6 +2408,9 @@ (define_expand "movstrict<mode>"
 {
   if (TARGET_PARTIAL_REG_STALL && optimize_function_for_speed_p (cfun))
     FAIL;
+  if (GET_CODE (operands[0]) == SUBREG
+      && GET_MODE_CLASS (GET_MODE (SUBREG_REG (operands[0]))) != MODE_INT)
+    FAIL;
   /* Don't generate memory->memory moves, go through a register */
   if (MEM_P (operands[0]) && MEM_P (operands[1]))
     operands[1] = force_reg (<MODE>mode, operands[1]);
Comment 3 Uroš Bizjak 2011-04-19 08:33:49 UTC
(In reply to comment #2)
> Having (strict_low_part (subreg:HI (reg:V2DI ...) 0)) on the LHS doesn't sound
> like a good idea to me (and this is created already during expansion).  Maybe
> reload should be supposed to handle that (it could copy the V2DI reg into a
> general reg, then do the movstrict on the general reg and then copy it back),
> but IMHO it is never going to lead to efficient code.  The following patch
> fixes the ICE for me, but dunno if that is how we want to fix it.
> Before the r161655 change arbitrary VCEs (in form of MEM_EXPR) weren't allowed
> on the LHS, so this wasn't a problem.

We can perhaps implement S_L_P inserts with pinsr instructions (pinsrw in this case). Since SSE2 implements only pinsrw (other widths are implemented in SSE4_1) this would slightly complicate movstrict expander, but IMO, it is worth a try.
Comment 4 Jakub Jelinek 2011-04-19 09:22:37 UTC
Might be a good idea, though IMHO just for 4.7, not for 4.6.
Would you be ok with this patch (plus testcase obviously) for 4.6?  If yes, is it ok temporarily for 4.7 too and you'd then go ahead and adjust it for pinsr*?
The movstrict{hi,qi}_1 pattern then would probably need to have variant with x and either have # for that alternative or something similar.  The expander would still need to reject 256-bit vector modes and XFmode, other floating point modes for non-SSE math, etc.
Comment 5 Uroš Bizjak 2011-04-19 09:59:59 UTC
(In reply to comment #4)
> Might be a good idea, though IMHO just for 4.7, not for 4.6.
> Would you be ok with this patch (plus testcase obviously) for 4.6?  If yes, is
> it ok temporarily for 4.7 too and you'd then go ahead and adjust it for pinsr*?
> The movstrict{hi,qi}_1 pattern then would probably need to have variant with x
> and either have # for that alternative or something similar.  The expander
> would still need to reject 256-bit vector modes and XFmode, other floating
> point modes for non-SSE math, etc.

Yes, IMO this is OK ATM.

BTW: While experimenting with pinsr pattern, I got following expansion:

(insn 117 116 0 (set (strict_low_part (subreg:HI (reg/v:V2DI 129 [ b ]) 8))
        (mem/s/j:HI (plus:DI (plus:DI (mult:DI (reg:DI 189)
                        (const_int 2 [0x2]))
                    (reg/f:DI 54 virtual-stack-vars))
                (const_int -64 [0xffffffffffffffc0])) [0 array S2 A16])) no_register_error.cpp:49 -1
     (nil))

Not exactly a "strict_low_part" with 8-byte offset...
Comment 6 Jakub Jelinek 2011-04-19 12:13:17 UTC
Created attachment 24041 [details]
gcc46-pr48678.patch

Patch with reduced testcase I'm going to bootstrap/regtest.
Comment 7 Jakub Jelinek 2011-04-19 16:47:09 UTC
Author: jakub
Date: Tue Apr 19 16:47:06 2011
New Revision: 172721

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=172721
Log:
	PR target/48678
	* config/i386/i386.md (movstrict<mode>): FAIL if operands[0]
	is a SUBREG with non-MODE_INT mode inside of it.

	* gcc.target/i386/pr48678.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/pr48678.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.md
    trunk/gcc/testsuite/ChangeLog
Comment 8 Jakub Jelinek 2011-04-19 16:49:22 UTC
Author: jakub
Date: Tue Apr 19 16:49:19 2011
New Revision: 172723

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=172723
Log:
	PR target/48678
	* config/i386/i386.md (movstrict<mode>): FAIL if operands[0]
	is a SUBREG with non-MODE_INT mode inside of it.

	* gcc.target/i386/pr48678.c: New test.

Added:
    branches/gcc-4_6-branch/gcc/testsuite/gcc.target/i386/pr48678.c
Modified:
    branches/gcc-4_6-branch/gcc/ChangeLog
    branches/gcc-4_6-branch/gcc/config/i386/i386.md
    branches/gcc-4_6-branch/gcc/testsuite/ChangeLog
Comment 9 Jakub Jelinek 2011-04-19 16:51:28 UTC
Fixed.
Comment 10 Uroš Bizjak 2011-04-20 13:29:58 UTC
Created attachment 24061 [details]
Expand movstrict pattern as pinsr for vector subregs.

For the testcase, attached patch generates:

	movdqa	(%rsi), %xmm0
	pinsrw	$0, (%rdi), %xmm0
	pcmpeqw	(%rdx), %xmm0
	ret
Comment 11 Jakub Jelinek 2011-04-20 13:32:01 UTC
Removing regression flag, as it is no longer a regression, just an enhancement.
Comment 12 Uroš Bizjak 2011-04-20 14:27:30 UTC
Hm, if line 14 in the testcase is changed to:

-  ((T *) &s.d)[0] = *x;
+  ((T *) &s.d)[1] = *x;

then gcc does not touch movstrict pattern at all and generates following code:

	movdqa	(%rsi), %xmm0
	movabsq	$-4294901761, %rsi
	movzwl	(%rdi), %eax
	movdqa	%xmm0, -24(%rsp)
	movq	-24(%rsp), %rcx
	salq	$16, %rax
	andq	%rsi, %rcx
	orq	%rax, %rcx
	movq	%rcx, -24(%rsp)
	movdqa	-24(%rsp), %xmm0
	pcmpeqw	(%rdx), %xmm0
	ret

However, when byte offset reaches sizeof (*void), i.e. 8 bytes on 64bit target as in:

-  ((T *) &s.d)[0] = *x;
+  ((T *) &s.d)[4] = *x;

then we again get:

	movdqa	(%rsi), %xmm0
	pinsrw	$4, (%rdi), %xmm0
	pcmpeqw	(%rdx), %xmm0
	ret

I didn't investigate this in detail, but perhaps someone can shed some light here?
Comment 13 Uroš Bizjak 2011-04-20 17:41:59 UTC
Created attachment 24064 [details]
Expand movstrict pattern as pinsr for vector subregs, v2.
Comment 14 Uroš Bizjak 2011-04-20 17:42:45 UTC
(In reply to comment #12)
> Hm, if line 14 in the testcase is changed to:
> 
> -  ((T *) &s.d)[0] = *x;
> +  ((T *) &s.d)[1] = *x;

We should go through insv pattern.  Patch v2 attached above.
Comment 15 uros 2011-04-20 19:58:27 UTC
Author: uros
Date: Wed Apr 20 19:58:23 2011
New Revision: 172792

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=172792
Log:
	PR target/48678
	* config/i386/i386.md (insv): Change operand 0 constraint to
	"register_operand".  Change operand 1 and 2 constraint to
	"const_int_operand".  Expand to pinsr{b,w,d,q} * when appropriate.
	* config/i386/sse.md (sse4_1_pinsrb): Export.
	(sse2_pinsrw): Ditto.
	(sse4_1_pinsrd): Ditto.
	(sse4_1_pinsrq): Ditto.
	* config/i386/i386-protos.h (ix86_expand_pinsr): Add prototype.
	* config/i386/i386.c (ix86_expand_pinsr): New.

testsuite/ChangeLog:

	PR target/48678
	* gcc.target/i386/sse2-pinsrw.c: New test.
	* gcc.target/i386/avx-vpinsrw.c: Ditto.
	* gcc.target/i386/sse4_1-insvqi.c: Ditto.
	* gcc.target/i386/sse2-insvhi.c: Ditto.
	* gcc.target/i386/sse4_1-insvsi.c: Ditto.
	* gcc.target/i386/sse4_1-insvdi.c: Ditto.


Added:
    trunk/gcc/testsuite/gcc.target/i386/avx-vpinsrw-1.c
    trunk/gcc/testsuite/gcc.target/i386/sse2-insvhi.c
    trunk/gcc/testsuite/gcc.target/i386/sse2-pinsrw.c
    trunk/gcc/testsuite/gcc.target/i386/sse4_1-insvdi.c
    trunk/gcc/testsuite/gcc.target/i386/sse4_1-insvqi.c
    trunk/gcc/testsuite/gcc.target/i386/sse4_1-insvsi.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386-protos.h
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/i386/i386.md
    trunk/gcc/config/i386/sse.md
    trunk/gcc/testsuite/ChangeLog
Comment 16 Uroš Bizjak 2011-04-20 20:01:26 UTC
Done.