Bug 35160 - local-alloc introduces sharing of the same pseudo/hard reg between different output regs in inline asm
Summary: local-alloc introduces sharing of the same pseudo/hard reg between different ...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: inline-asm (show other bugs)
Version: 4.3.0
: P3 normal
Target Milestone: 4.4.0
Assignee: Not yet assigned to anyone
URL:
Keywords: ice-on-valid-code, wrong-code
Depends on:
Blocks:
 
Reported: 2008-02-11 08:30 UTC by Jakub Jelinek
Modified: 2018-04-28 15:02 UTC (History)
10 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.4.0
Known to fail: 4.3.2
Last reconfirmed: 2008-02-15 12:20:23


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jelinek 2008-02-11 08:30:23 UTC
On x86_64 with -O1 and higher, 4.3 miscompiles:
extern void abort (void);

void
__attribute__((noinline))
foo (unsigned long *y)
{
   unsigned long b[3], c0, c1, c2, d0, d1, d2;
   d0 = 0; d1 = 0; d2 = 0; c0 = c1 = c2 = 0;

   __asm__ ("movl $7, %k0; movl $8, %k1; movl $9, %k2"
            : "+r" (d0), "+r" (d1), "+r" (d2));
   __asm__ ("movq %3, %0; movq %4, %1; movq %5, %2"
            : "+r" (c0), "+r" (c1), "+r" (c2), "+r" (d0), "+r" (d1), "+r" (d2));
   y[0] = c0;
   y[1] = c1;
   y[2] = c2;
}

int
main (void)
{
  unsigned long y[3];
  foo (y);
  if (y[0] != 7 || y[1] != 8 || y[2] != 9)
    abort ();
  return 0;
}
First CSE pass assigns the same pseudo to different output regs.  It would be fine if those were input only registers.
Happens both with "+r" and "=r" ... "0" syntax.
Comment 1 Jakub Jelinek 2008-02-11 08:33:15 UTC
Silent miscompilation of nss.
Comment 2 Andrew Pinski 2008-02-11 11:53:40 UTC
Hmm, I think the inline-asm is missing an early clobber too.
Comment 3 Jakub Jelinek 2008-02-11 12:49:21 UTC
Why?  When there are multiple output register constraints, that says they must be allocated in different output registers.  The first asm is a silly way to
write the same as "=r" (d0), "=r" (d1), "=r" (d2), as the original values are
never referenced in the asm.
Comment 4 Jakub Jelinek 2008-02-11 13:26:34 UTC
BTW, it fails even when all "+r"'s are replaced with "+&r"'s, though that doesn't
make the testcase more valid.
Comment 5 Jakub Jelinek 2008-02-11 14:16:22 UTC
Regression since http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=128864
Comment 6 Andreas Schwab 2008-02-11 15:46:44 UTC
The same change is causing an ICE on ia64.

internal compiler error: in rws_insn_set, at config/ia64/ia64.c:5336
Comment 7 Jakub Jelinek 2008-02-12 09:46:31 UTC
I suspect a reload bug, what asmcons is just unnecessary, not wrong.
Reload shouldn't ever reload different asm output registers into the same
hard registers.
That said, following patch serves as a workaround as well as improvement in asmcons:

--- function.c.jj2      2008-02-11 14:48:12.000000000 +0100
+++ function.c  2008-02-12 10:16:10.000000000 +0100
@@ -1,6 +1,6 @@
 /* Expands front end tree to back end RTL for GCC.
    Copyright (C) 1987, 1988, 1989, 1991, 1992, 1993, 1994, 1995, 1996, 1997,
-   1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007
+   1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008
    Free Software Foundation, Inc.
 
 This file is part of GCC.
@@ -5683,8 +5683,10 @@ match_asm_constraints_1 (rtx insn, rtx *
   rtx op = SET_SRC (p_sets[0]);
   int ninputs = ASM_OPERANDS_INPUT_LENGTH (op);
   rtvec inputs = ASM_OPERANDS_INPUT_VEC (op);
+  bool *output_matched = alloca (noutputs * sizeof (bool));
 
-  for (i = 0; i < ninputs; i++)
+  memset (output_matched, 0, noutputs * sizeof (bool));
+  for (i = ninputs - 1; i >= 0; i--)
     {
       rtx input, output, insns;
       const char *constraint = ASM_OPERANDS_INPUT_CONSTRAINT (op, i);
@@ -5713,6 +5715,20 @@ match_asm_constraints_1 (rtx insn, rtx *
       if (j != ninputs)
        continue;
 
+      /* Avoid changing the same input several times.  For
+        asm ("" : "=mr" (out1), "=mr" (out2) : "0" (in), "1" (in));
+        only change in once (to out1), rather than changing it
+        first to out1 and afterwards to out2.  */
+      if (i < ninputs - 1)
+       {
+         for (j = 0; j < noutputs; j++)
+           if (output_matched[j] && input == SET_DEST (p_sets[j]))
+             break;
+         if (j != noutputs)
+           continue;
+       }
+      output_matched[match] = true;
+
       start_sequence ();
       emit_move_insn (output, input);
       insns = get_insns ();

If the same pseudo is used in multiple inputs, where they have different matching output registers, asmcons replaces first time the original input pseudo to first matching output, then first matching output to second matching output, etc., each time adding new SET in front of the ASM_OPERANDS.  But the second and following replacements don't buy us anything.

For testing I've also modified the patched match_asm_constraints_1 to iterate from ninputs - 1 down to 0 rather than from 0 up to ninputs - 1 to make the replacements on this testcase more similar to what was produced without the patch.

*.asmcons difference between vanilla gcc and gcc patched with above patch plus iterating downwards is:
@@ -45,16 +41,10 @@ Dataflow summary:
 (insn:HI 6 3 25 2 M.i:10 (set (reg:DI 65)
         (const_int 0 [0x0])) 89 {*movdi_1_rex64} (nil))
 
-(insn 25 6 26 2 M.i:10 (set (reg/v:DI 60 [ d0 ])
+(insn 25 6 9 2 M.i:10 (set (reg/v:DI 58 [ d2 ])
         (reg:DI 65)) -1 (nil))
 
-(insn 26 25 27 2 M.i:10 (set (reg/v:DI 59 [ d1 ])
-        (reg/v:DI 60 [ d0 ])) -1 (nil))
-
-(insn 27 26 9 2 M.i:10 (set (reg/v:DI 58 [ d2 ])
-        (reg/v:DI 59 [ d1 ])) -1 (nil))
-
-(insn:HI 9 27 28 2 M.i:10 (parallel [
+(insn:HI 9 25 26 2 M.i:10 (parallel [
             (set (reg/v:DI 60 [ d0 ])
                 (asm_operands:DI ("movl $7, %k0; movl $8, %k1; movl $9, %k2") ("=r") 0 [
                         (reg/v:DI 58 [ d2 ])
@@ -94,16 +84,10 @@ Dataflow summary:
         (expr_list:REG_UNUSED (reg:QI 17 flags)
             (nil))))
 
-(insn 28 9 29 2 M.i:12 (set (reg/v:DI 63 [ c0 ])
+(insn 26 9 13 2 M.i:12 (set (reg/v:DI 61 [ c2 ])
         (reg:DI 65)) -1 (nil))
 
-(insn 29 28 30 2 M.i:12 (set (reg/v:DI 62 [ c1 ])
-        (reg/v:DI 63 [ c0 ])) -1 (nil))
-
-(insn 30 29 13 2 M.i:12 (set (reg/v:DI 61 [ c2 ])
-        (reg/v:DI 62 [ c1 ])) -1 (nil))
-
-(insn:HI 13 30 14 2 M.i:12 (parallel [
+(insn:HI 13 26 14 2 M.i:12 (parallel [
             (set (reg/v:DI 63 [ c0 ])
                 (asm_operands:DI ("movq %3, %0; movq %4, %1; movq %5, %2") ("=r") 0 [
                         (reg/v:DI 61 [ c2 ])

i.e. just the extra 2 moves in front of each of the asms.  With vanilla gcc reload uses the same register for d0/d1, but without the extra 4 moves it works properly.
Comment 8 Jakub Jelinek 2008-02-12 18:32:39 UTC
Subject: Bug 35160

Author: jakub
Date: Tue Feb 12 18:31:53 2008
New Revision: 132263

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=132263
Log:
	PR inline-asm/35160
	* function.c (match_asm_constraints_1): Don't replace the same input
	multiple times.

	* gcc.target/i386/pr35160.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/pr35160.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/function.c
    trunk/gcc/testsuite/ChangeLog

Comment 9 Jakub Jelinek 2008-02-12 18:38:11 UTC
Workaround applied, though there is still some, pressumably local-alloc.c, bug lurking around.  Downgrading to P2, as this is no longer critical.
To reproduce revert the r132263 commit.
Comment 10 Paolo Bonzini 2008-02-13 08:44:58 UTC
I doubt it is a 4.3 regression.
Comment 11 Dominique d'Humieres 2008-02-13 09:33:02 UTC
At rev. 132272 gcc.target/i386/pr35160.c fails on i686-apple-darwin9 with:

[ibook-dhum] dominiq/Desktop% /opt/gcc/gcc4.3w/bin/gcc /opt/gcc/_gcc_clean/gcc/testsuite/gcc.target/i386/pr35160.c
/opt/gcc/_gcc_clean/gcc/testsuite/gcc.target/i386/pr35160.c: In function 'foo':
/opt/gcc/_gcc_clean/gcc/testsuite/gcc.target/i386/pr35160.c:16: error: can't find a register in class 'GENERAL_REGS' while reloading 'asm'
/opt/gcc/_gcc_clean/gcc/testsuite/gcc.target/i386/pr35160.c:16: error: 'asm' operand has impossible constraints

or

[ibook-dhum] dominiq/Desktop% /opt/gcc/gcc4.3w/bin/gcc -O2 /opt/gcc/_gcc_clean/gcc/testsuite/gcc.target/i386/pr35160.c
/opt/gcc/_gcc_clean/gcc/testsuite/gcc.target/i386/pr35160.c: In function 'foo':
/opt/gcc/_gcc_clean/gcc/testsuite/gcc.target/i386/pr35160.c:16: error: can't find a register in class 'GENERAL_REGS' while reloading 'asm'
/opt/gcc/_gcc_clean/gcc/testsuite/gcc.target/i386/pr35160.c:14: error: 'asm' operand has impossible constraints
/opt/gcc/_gcc_clean/gcc/testsuite/gcc.target/i386/pr35160.c:16: error: 'asm' operand has impossible constraints

Comment 12 Uroš Bizjak 2008-02-13 16:42:31 UTC
(In reply to comment #11)
> At rev. 132272 gcc.target/i386/pr35160.c fails on i686-apple-darwin9 with:

i686-apple-darwin-9 implies -fpic, so one register less is available.
This is a testsuite failure, following patch fixes it:

Index: pr35160.c
===================================================================
--- pr35160.c   (revision 132285)
+++ pr35160.c   (working copy)
@@ -1,5 +1,6 @@
 /* PR inline-asm/35160 */
 /* { dg-do run } */
+/* { dg-skip-if "" { ilp32 && { ! nonpic } } { "*" } { "" } } */
 /* { dg-options "-O2" } */
Comment 13 Jakub Jelinek 2008-02-13 22:25:45 UTC
Can you please submit it (or even commit as obvious)?
Sorry for not thinking about this asm needing 6 regs on i?86, where it is sometimes too much.
Comment 14 Uroš Bizjak 2008-02-15 12:20:23 UTC
Testcase is fixed by http://gcc.gnu.org/ml/gcc-cvs/2008-02/msg00353.html:

Author: uros
Date: Fri Feb 15 12:19:00 2008
New Revision: 132339

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=132339
Log:
	* gcc.target/i386/pr35160.c: Skip if !nonpic for 32bit x86 targets.

Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/i386/pr35160.c
Comment 15 Richard Biener 2008-03-14 17:10:24 UTC
Hm, did this bug mutate into a non-regression one?
Comment 16 Paolo Bonzini 2008-03-14 17:32:48 UTC
yes, there's actually no testcase.
Comment 17 Nelson Bolyard (NSS Developer) 2008-03-14 18:59:16 UTC
Would an attached copy of the relevant NSS source file suffice for a test case?
Comment 18 Paolo Bonzini 2008-03-14 20:05:37 UTC
The NSS testcase has been fixed, but we found that the committed fix just made the bug latent.  We don't have a failing testcase.
Comment 19 Richard Biener 2008-06-06 14:58:43 UTC
4.3.1 is being released, adjusting target milestone.
Comment 20 Joseph S. Myers 2008-07-04 19:45:27 UTC
Not currently marked as regression, removing milestone.
Comment 21 Andrew Pinski 2008-12-29 07:30:03 UTC
local-alloc will likely be removed for 4.4, do you know if IRA has the same issue?
Comment 22 Jakub Jelinek 2008-12-29 08:26:28 UTC
IRA gets it right, after backing out that match_asm_constraints_1 change the registers are still allocated correctly and the testcase doesn't abort.
So fixed in 4.4.