This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug inline-asm/35160] [4.3 regression] CSE introduces sharing of the same pseudo/hard reg between different output regs in inline asm



------- Comment #7 from jakub at gcc dot gnu dot org  2008-02-12 09:46 -------
I suspect a reload bug, what asmcons is just unnecessary, not wrong.
Reload shouldn't ever reload different asm output registers into the same
hard registers.
That said, following patch serves as a workaround as well as improvement in
asmcons:

--- function.c.jj2      2008-02-11 14:48:12.000000000 +0100
+++ function.c  2008-02-12 10:16:10.000000000 +0100
@@ -1,6 +1,6 @@
 /* Expands front end tree to back end RTL for GCC.
    Copyright (C) 1987, 1988, 1989, 1991, 1992, 1993, 1994, 1995, 1996, 1997,
-   1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007
+   1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008
    Free Software Foundation, Inc.

 This file is part of GCC.
@@ -5683,8 +5683,10 @@ match_asm_constraints_1 (rtx insn, rtx *
   rtx op = SET_SRC (p_sets[0]);
   int ninputs = ASM_OPERANDS_INPUT_LENGTH (op);
   rtvec inputs = ASM_OPERANDS_INPUT_VEC (op);
+  bool *output_matched = alloca (noutputs * sizeof (bool));

-  for (i = 0; i < ninputs; i++)
+  memset (output_matched, 0, noutputs * sizeof (bool));
+  for (i = ninputs - 1; i >= 0; i--)
     {
       rtx input, output, insns;
       const char *constraint = ASM_OPERANDS_INPUT_CONSTRAINT (op, i);
@@ -5713,6 +5715,20 @@ match_asm_constraints_1 (rtx insn, rtx *
       if (j != ninputs)
        continue;

+      /* Avoid changing the same input several times.  For
+        asm ("" : "=mr" (out1), "=mr" (out2) : "0" (in), "1" (in));
+        only change in once (to out1), rather than changing it
+        first to out1 and afterwards to out2.  */
+      if (i < ninputs - 1)
+       {
+         for (j = 0; j < noutputs; j++)
+           if (output_matched[j] && input == SET_DEST (p_sets[j]))
+             break;
+         if (j != noutputs)
+           continue;
+       }
+      output_matched[match] = true;
+
       start_sequence ();
       emit_move_insn (output, input);
       insns = get_insns ();

If the same pseudo is used in multiple inputs, where they have different
matching output registers, asmcons replaces first time the original input
pseudo to first matching output, then first matching output to second matching
output, etc., each time adding new SET in front of the ASM_OPERANDS.  But the
second and following replacements don't buy us anything.

For testing I've also modified the patched match_asm_constraints_1 to iterate
from ninputs - 1 down to 0 rather than from 0 up to ninputs - 1 to make the
replacements on this testcase more similar to what was produced without the
patch.

*.asmcons difference between vanilla gcc and gcc patched with above patch plus
iterating downwards is:
@@ -45,16 +41,10 @@ Dataflow summary:
 (insn:HI 6 3 25 2 M.i:10 (set (reg:DI 65)
         (const_int 0 [0x0])) 89 {*movdi_1_rex64} (nil))

-(insn 25 6 26 2 M.i:10 (set (reg/v:DI 60 [ d0 ])
+(insn 25 6 9 2 M.i:10 (set (reg/v:DI 58 [ d2 ])
         (reg:DI 65)) -1 (nil))

-(insn 26 25 27 2 M.i:10 (set (reg/v:DI 59 [ d1 ])
-        (reg/v:DI 60 [ d0 ])) -1 (nil))
-
-(insn 27 26 9 2 M.i:10 (set (reg/v:DI 58 [ d2 ])
-        (reg/v:DI 59 [ d1 ])) -1 (nil))
-
-(insn:HI 9 27 28 2 M.i:10 (parallel [
+(insn:HI 9 25 26 2 M.i:10 (parallel [
             (set (reg/v:DI 60 [ d0 ])
                 (asm_operands:DI ("movl $7, %k0; movl $8, %k1; movl $9, %k2")
("=r") 0 [
                         (reg/v:DI 58 [ d2 ])
@@ -94,16 +84,10 @@ Dataflow summary:
         (expr_list:REG_UNUSED (reg:QI 17 flags)
             (nil))))

-(insn 28 9 29 2 M.i:12 (set (reg/v:DI 63 [ c0 ])
+(insn 26 9 13 2 M.i:12 (set (reg/v:DI 61 [ c2 ])
         (reg:DI 65)) -1 (nil))

-(insn 29 28 30 2 M.i:12 (set (reg/v:DI 62 [ c1 ])
-        (reg/v:DI 63 [ c0 ])) -1 (nil))
-
-(insn 30 29 13 2 M.i:12 (set (reg/v:DI 61 [ c2 ])
-        (reg/v:DI 62 [ c1 ])) -1 (nil))
-
-(insn:HI 13 30 14 2 M.i:12 (parallel [
+(insn:HI 13 26 14 2 M.i:12 (parallel [
             (set (reg/v:DI 63 [ c0 ])
                 (asm_operands:DI ("movq %3, %0; movq %4, %1; movq %5, %2")
("=r") 0 [
                         (reg/v:DI 61 [ c2 ])

i.e. just the extra 2 moves in front of each of the asms.  With vanilla gcc
reload uses the same register for d0/d1, but without the extra 4 moves it works
properly.


-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |uweigand at gcc dot gnu dot
                   |                            |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35160


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]