On x86_64 with -O1 and higher, 4.3 miscompiles: extern void abort (void); void __attribute__((noinline)) foo (unsigned long *y) { unsigned long b[3], c0, c1, c2, d0, d1, d2; d0 = 0; d1 = 0; d2 = 0; c0 = c1 = c2 = 0; __asm__ ("movl $7, %k0; movl $8, %k1; movl $9, %k2" : "+r" (d0), "+r" (d1), "+r" (d2)); __asm__ ("movq %3, %0; movq %4, %1; movq %5, %2" : "+r" (c0), "+r" (c1), "+r" (c2), "+r" (d0), "+r" (d1), "+r" (d2)); y[0] = c0; y[1] = c1; y[2] = c2; } int main (void) { unsigned long y[3]; foo (y); if (y[0] != 7 || y[1] != 8 || y[2] != 9) abort (); return 0; } First CSE pass assigns the same pseudo to different output regs. It would be fine if those were input only registers. Happens both with "+r" and "=r" ... "0" syntax.
Silent miscompilation of nss.
Hmm, I think the inline-asm is missing an early clobber too.
Why? When there are multiple output register constraints, that says they must be allocated in different output registers. The first asm is a silly way to write the same as "=r" (d0), "=r" (d1), "=r" (d2), as the original values are never referenced in the asm.
BTW, it fails even when all "+r"'s are replaced with "+&r"'s, though that doesn't make the testcase more valid.
Regression since http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=128864
The same change is causing an ICE on ia64. internal compiler error: in rws_insn_set, at config/ia64/ia64.c:5336
I suspect a reload bug, what asmcons is just unnecessary, not wrong. Reload shouldn't ever reload different asm output registers into the same hard registers. That said, following patch serves as a workaround as well as improvement in asmcons: --- function.c.jj2 2008-02-11 14:48:12.000000000 +0100 +++ function.c 2008-02-12 10:16:10.000000000 +0100 @@ -1,6 +1,6 @@ /* Expands front end tree to back end RTL for GCC. Copyright (C) 1987, 1988, 1989, 1991, 1992, 1993, 1994, 1995, 1996, 1997, - 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007 + 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008 Free Software Foundation, Inc. This file is part of GCC. @@ -5683,8 +5683,10 @@ match_asm_constraints_1 (rtx insn, rtx * rtx op = SET_SRC (p_sets[0]); int ninputs = ASM_OPERANDS_INPUT_LENGTH (op); rtvec inputs = ASM_OPERANDS_INPUT_VEC (op); + bool *output_matched = alloca (noutputs * sizeof (bool)); - for (i = 0; i < ninputs; i++) + memset (output_matched, 0, noutputs * sizeof (bool)); + for (i = ninputs - 1; i >= 0; i--) { rtx input, output, insns; const char *constraint = ASM_OPERANDS_INPUT_CONSTRAINT (op, i); @@ -5713,6 +5715,20 @@ match_asm_constraints_1 (rtx insn, rtx * if (j != ninputs) continue; + /* Avoid changing the same input several times. For + asm ("" : "=mr" (out1), "=mr" (out2) : "0" (in), "1" (in)); + only change in once (to out1), rather than changing it + first to out1 and afterwards to out2. */ + if (i < ninputs - 1) + { + for (j = 0; j < noutputs; j++) + if (output_matched[j] && input == SET_DEST (p_sets[j])) + break; + if (j != noutputs) + continue; + } + output_matched[match] = true; + start_sequence (); emit_move_insn (output, input); insns = get_insns (); If the same pseudo is used in multiple inputs, where they have different matching output registers, asmcons replaces first time the original input pseudo to first matching output, then first matching output to second matching output, etc., each time adding new SET in front of the ASM_OPERANDS. But the second and following replacements don't buy us anything. For testing I've also modified the patched match_asm_constraints_1 to iterate from ninputs - 1 down to 0 rather than from 0 up to ninputs - 1 to make the replacements on this testcase more similar to what was produced without the patch. *.asmcons difference between vanilla gcc and gcc patched with above patch plus iterating downwards is: @@ -45,16 +41,10 @@ Dataflow summary: (insn:HI 6 3 25 2 M.i:10 (set (reg:DI 65) (const_int 0 [0x0])) 89 {*movdi_1_rex64} (nil)) -(insn 25 6 26 2 M.i:10 (set (reg/v:DI 60 [ d0 ]) +(insn 25 6 9 2 M.i:10 (set (reg/v:DI 58 [ d2 ]) (reg:DI 65)) -1 (nil)) -(insn 26 25 27 2 M.i:10 (set (reg/v:DI 59 [ d1 ]) - (reg/v:DI 60 [ d0 ])) -1 (nil)) - -(insn 27 26 9 2 M.i:10 (set (reg/v:DI 58 [ d2 ]) - (reg/v:DI 59 [ d1 ])) -1 (nil)) - -(insn:HI 9 27 28 2 M.i:10 (parallel [ +(insn:HI 9 25 26 2 M.i:10 (parallel [ (set (reg/v:DI 60 [ d0 ]) (asm_operands:DI ("movl $7, %k0; movl $8, %k1; movl $9, %k2") ("=r") 0 [ (reg/v:DI 58 [ d2 ]) @@ -94,16 +84,10 @@ Dataflow summary: (expr_list:REG_UNUSED (reg:QI 17 flags) (nil)))) -(insn 28 9 29 2 M.i:12 (set (reg/v:DI 63 [ c0 ]) +(insn 26 9 13 2 M.i:12 (set (reg/v:DI 61 [ c2 ]) (reg:DI 65)) -1 (nil)) -(insn 29 28 30 2 M.i:12 (set (reg/v:DI 62 [ c1 ]) - (reg/v:DI 63 [ c0 ])) -1 (nil)) - -(insn 30 29 13 2 M.i:12 (set (reg/v:DI 61 [ c2 ]) - (reg/v:DI 62 [ c1 ])) -1 (nil)) - -(insn:HI 13 30 14 2 M.i:12 (parallel [ +(insn:HI 13 26 14 2 M.i:12 (parallel [ (set (reg/v:DI 63 [ c0 ]) (asm_operands:DI ("movq %3, %0; movq %4, %1; movq %5, %2") ("=r") 0 [ (reg/v:DI 61 [ c2 ]) i.e. just the extra 2 moves in front of each of the asms. With vanilla gcc reload uses the same register for d0/d1, but without the extra 4 moves it works properly.
Subject: Bug 35160 Author: jakub Date: Tue Feb 12 18:31:53 2008 New Revision: 132263 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=132263 Log: PR inline-asm/35160 * function.c (match_asm_constraints_1): Don't replace the same input multiple times. * gcc.target/i386/pr35160.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr35160.c Modified: trunk/gcc/ChangeLog trunk/gcc/function.c trunk/gcc/testsuite/ChangeLog
Workaround applied, though there is still some, pressumably local-alloc.c, bug lurking around. Downgrading to P2, as this is no longer critical. To reproduce revert the r132263 commit.
I doubt it is a 4.3 regression.
At rev. 132272 gcc.target/i386/pr35160.c fails on i686-apple-darwin9 with: [ibook-dhum] dominiq/Desktop% /opt/gcc/gcc4.3w/bin/gcc /opt/gcc/_gcc_clean/gcc/testsuite/gcc.target/i386/pr35160.c /opt/gcc/_gcc_clean/gcc/testsuite/gcc.target/i386/pr35160.c: In function 'foo': /opt/gcc/_gcc_clean/gcc/testsuite/gcc.target/i386/pr35160.c:16: error: can't find a register in class 'GENERAL_REGS' while reloading 'asm' /opt/gcc/_gcc_clean/gcc/testsuite/gcc.target/i386/pr35160.c:16: error: 'asm' operand has impossible constraints or [ibook-dhum] dominiq/Desktop% /opt/gcc/gcc4.3w/bin/gcc -O2 /opt/gcc/_gcc_clean/gcc/testsuite/gcc.target/i386/pr35160.c /opt/gcc/_gcc_clean/gcc/testsuite/gcc.target/i386/pr35160.c: In function 'foo': /opt/gcc/_gcc_clean/gcc/testsuite/gcc.target/i386/pr35160.c:16: error: can't find a register in class 'GENERAL_REGS' while reloading 'asm' /opt/gcc/_gcc_clean/gcc/testsuite/gcc.target/i386/pr35160.c:14: error: 'asm' operand has impossible constraints /opt/gcc/_gcc_clean/gcc/testsuite/gcc.target/i386/pr35160.c:16: error: 'asm' operand has impossible constraints
(In reply to comment #11) > At rev. 132272 gcc.target/i386/pr35160.c fails on i686-apple-darwin9 with: i686-apple-darwin-9 implies -fpic, so one register less is available. This is a testsuite failure, following patch fixes it: Index: pr35160.c =================================================================== --- pr35160.c (revision 132285) +++ pr35160.c (working copy) @@ -1,5 +1,6 @@ /* PR inline-asm/35160 */ /* { dg-do run } */ +/* { dg-skip-if "" { ilp32 && { ! nonpic } } { "*" } { "" } } */ /* { dg-options "-O2" } */
Can you please submit it (or even commit as obvious)? Sorry for not thinking about this asm needing 6 regs on i?86, where it is sometimes too much.
Testcase is fixed by http://gcc.gnu.org/ml/gcc-cvs/2008-02/msg00353.html: Author: uros Date: Fri Feb 15 12:19:00 2008 New Revision: 132339 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=132339 Log: * gcc.target/i386/pr35160.c: Skip if !nonpic for 32bit x86 targets. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/i386/pr35160.c
Hm, did this bug mutate into a non-regression one?
yes, there's actually no testcase.
Would an attached copy of the relevant NSS source file suffice for a test case?
The NSS testcase has been fixed, but we found that the committed fix just made the bug latent. We don't have a failing testcase.
4.3.1 is being released, adjusting target milestone.
Not currently marked as regression, removing milestone.
local-alloc will likely be removed for 4.4, do you know if IRA has the same issue?
IRA gets it right, after backing out that match_asm_constraints_1 change the registers are still allocated correctly and the testcase doesn't abort. So fixed in 4.4.