This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFA:] Regmove replaces asm-declared register in asm. Syscall breakage. 3.2, 3.3, 3.4


Consider this test-case (to be committed as gcc.dg/asmreg-1.c), a
getdents64 Linux syscall, cut down for the test-case:

-------------------------------------------------
/* { dg-do compile { target cris-*-* } } */
/* { dg-options "-O2" } */
/* { dg-final { scan-assembler "\\\.ifnc \\\$r9-\\\$r10-\\\$r11-\\\$r12" } } */

/* Sanity check for asm register operands in syscall failed for
   cris-axis-linux-gnu due to regmove bug.
   Hans-Peter Nilsson <hp at axis dot com>.  */

extern void lseek64 (int, long long, int);
extern int *__errno_location (void);
struct dirent64
{
  long long d_off;
  unsigned short int d_reclen;
  char d_name[256];
};
struct kernel_dirent64
{   
  long long d_off;
  unsigned short d_reclen;
  char d_name[256];
};

static inline int __attribute__ ((__always_inline__))
__syscall_getdents64 (int fd, unsigned char * dirp, unsigned count)
{
  register unsigned long __sys_res asm ("r10");
  register unsigned long __r10 __asm__ ("r10") = (unsigned long) fd;
  register unsigned long __r11 __asm__ ("r11") = (unsigned long) dirp;
  register unsigned long __r12 __asm__ ("r12") = (unsigned long) count;
  register unsigned long __callno asm ("r9") = (220);
  asm volatile (".ifnc %1-%0-%3-%4,$r9-$r10-$r11-$r12\n\t"
		".err\n\t"
		".endif\n\t"
		"break 13"
		: "=r" (__sys_res)
		: "r" (__callno), "0" (__r10), "r" (__r11), "r" (__r12)
		: "memory");
  if (__sys_res >= (unsigned long) -4096)
    {
      (*__errno_location ()) = - __sys_res;
      __sys_res = -1;
    }
  return __sys_res;
}

int
__getdents64 (int fd, char *buf, unsigned nbytes)
{   
  struct dirent64 *dp;
  long long last_offset = -1;
  int retval;
  struct kernel_dirent64 *skdp, *kdp;
  dp = (struct dirent64 *) buf;
  skdp = kdp = __builtin_alloca (nbytes);
  retval = __syscall_getdents64(fd, (char *)kdp, nbytes);
  if (retval == -1)
    return -1;
  while ((char *) kdp < (char *) skdp + retval)
    {
      if ((char *) dp > buf + nbytes)
	{
	  lseek64(fd, last_offset, 0);
	  break;
	}
      last_offset = kdp->d_off;
      __builtin_memcpy (dp->d_name, kdp->d_name, kdp->d_reclen - 10);
      dp = (struct dirent64 *) ((char *) dp + sizeof (*dp));
      kdp = (struct kernel_dirent64 *) (((char *) kdp) + kdp->d_reclen);
    }

  return (char *) dp - buf;
}
-------------------------------------------------

Note the non-SMALL_REGISTER_CLASSES syscall construct in which
asm-declared register variables carry the syscall parameters.
For the doubtful, the asm-reg-into-asm construct is blessed in
gcc documentation; it promises that variables as appearing in
the asm parameters arrive in the asm-declared registers.
Nevertheless, there's a paranoid (ahem) sanity check in the asm,
to make sure that parameters *do* arrive in the right registers.
Refer to the GAS documentation for that construct.  Compare to
the usual SMALL_REGISTER_CLASSES (like i386) construct, where
asm operand constraint letters for a single-register class are
usually used to get the right parameter in a single right
register.  This isn't a definite truth; the reason to chose
either construct is (I think) guided by existing GCC register
class constraint letters rather than whether the port is
SMALL_REGISTER_CLASSES.  GCC has more register allocation
freedom by using constraint letters in the asm, but it shouldn't
matter in practice unless the machine has really few and
special-purpose visible registers (like i386) or really large
syscall constructs (code distance between the register
declarations and the asm).  For CRIS, I'd rather not codify the
Linux ABI in register constraint letters; asm-reg-into-asm
should work nicely.  (But I do recommend that ABI for *any* OS
on CRIS. :-)

For the test-case, you will see this rtl in 22.ce2 (the pass
before regmove).
ce2:

...

(insn 5 4 6 0 (nil) (set (reg/v:SI 26 [ nbytes ])
        (reg:SI 12 r12)) 32 {*movsi_internal} (nil)
    (nil))

...

(insn 21 19 22 0 0x4012b3c8 (set (reg:SI 32)
        (plus:SI (reg/v:SI 26 [ nbytes ])
            (const_int 1 [0x1]))) 66 {addsi3} (insn_list 5 (nil))
    (nil))

...

(insn 47 45 49 0 0x4012b44c (parallel [
            (set (reg/v:SI 10 r10 [ __sys_res ])
                (asm_operands/v:SI (".ifnc %1-%0-%3-%4,$r9-$r10-$r11-$r12
	.err
	.endif
	break 13") ("=r") 0 [
                        (reg/v:SI 9 r9 [ __callno ])
                        (reg/v:SI 10 r10 [ __r10 ])
                        (reg/v:SI 11 r11 [ __r11 ])
                        (reg/v:SI 12 r12 [ __r12 ])
                    ]
                     [
                        (asm_input:SI ("r"))
                        (asm_input:SI ("0"))
                        (asm_input:SI ("r"))
                        (asm_input:SI ("r"))
                    ] ("/home/hp/cvs_areas/combined/cvs_write/gcc/testsuite/gcc.dg/asmreg-1.c") 32))
            (clobber (mem:BLK (scratch) [0 A8]))
        ]) -1 (nil)
    (expr_list:REG_DEAD (reg/v:SI 12 r12 [ __r12 ])
        (expr_list:REG_DEAD (reg/v:SI 11 r11 [ __r11 ])
            (expr_list:REG_DEAD (reg/v:SI 9 r9 [ __callno ])
                (nil)))))

That is, the last parameter-register for incoming count and
nbytes (r12) is copied to a pseudo (26), which is used in other
rtl.  Then r12 is used in the asm above.  Alas, in 23.regmove,
you'll notice that r12 and pseudo 26 are "optimized"; r12 is
replaced by pseudo 26.  Too bad that pseudo 26 is later
allocated r0, not r12, so the asm sanity check fails.  If it
wasn't there, the syscall would still fail in perhaps unobvious
ways, because r12 no longer contains nbytes, but has been
overwritten with the value -1 (not shown).

regmove:

(insn 5 4 6 0 (nil) (set (reg/v:SI 26 [ nbytes ])
        (reg:SI 12 r12)) 32 {*movsi_internal} (nil)
    (expr_list:REG_DEAD (reg/v:SI 12 r12 [ __r12 ])
        (nil)))
...

(insn 21 19 22 0 0x4012b3c8 (set (reg:SI 32)
        (plus:SI (reg/v:SI 26 [ nbytes ])
            (const_int 1 [0x1]))) 66 {addsi3} (insn_list 5 (nil))
    (nil))

...

(insn 47 45 49 0 0x4012b44c (parallel [
            (set (reg/v:SI 10 r10 [ __sys_res ])
                (asm_operands/v:SI (".ifnc %1-%0-%3-%4,$r9-$r10-$r11-$r12
	.err
	.endif
	break 13") ("=r") 0 [
                        (reg/v:SI 9 r9 [ __callno ])
                        (reg/v:SI 10 r10 [ __r10 ])
                        (reg/v:SI 11 r11 [ __r11 ])
                        (reg/v:SI 26 [ nbytes ])
                    ]
                     [
                        (asm_input:SI ("r"))
                        (asm_input:SI ("0"))
                        (asm_input:SI ("r"))
                        (asm_input:SI ("r"))
                    ] ("/home/hp/cvs_areas/combined/cvs_write/gcc/testsuite/gcc.dg/asmreg-1.c") 32))
            (clobber (mem:BLK (scratch) [0 A8]))
        ]) -1 (nil)
    (expr_list:REG_DEAD (reg/v:SI 11 r11 [ __r11 ])
        (expr_list:REG_DEAD (reg/v:SI 9 r9 [ __callno ])
            (nil))))

A curious thing is that the original code did not trigger for
the main trunk, but the cut-down test-case does.  (*Whew!* :-)

Observation: -fno-regmove does not turn off all of regmove,
particularly it does not turn off the failing code.  I think
that's a bug too.

Note that, as the code says, the recently introduced REG_EXPR
field wasn't set for src, so there was no apparent way to check
the register declaration (I checked); seemingly because r12
contains nbytes *before* the register declaration, it being the
(incoming) parameter register.  Regarding checking for the
actual asm register declaration, I don't think the missing
optimization opportunities are important enough to scavenge the
declaration with more complicated code; checking for a hard reg
is stricter, but not that much stricter.  The stack pointer is
excluded from this optimization higher up in
optimize_reg_copy_1, and for SMALL_REGISTER_CLASSES ports, hard
registers are similarly excluded from this optimization.  (Which
in turn means the i686-pc-linux-gnu is of limited value.)

If you know of optimizations suspected of similar breakage,
please tell.

Bootstrapped and checked on main trunk for i686-pc-linux-gnu.
Tested with no regressions on cris-axis-elf (cris-axis-linux-gnu
fails for sysroot-related reasons already reported),
mmix-knuth-mmixware and h8300-hitachi-hms.

Ok for 3.2?  (Pretty please?  No regression as such; 3.1/3.2 is the first.)
Ok for 3.3?
Ok for main trunk?

ChangeLog:

	* regmove.c (optimize_reg_copy_1): Do not replace a hard register
	in an asm.

Index: regmove.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/regmove.c,v
retrieving revision 1.140
diff -p -c -r1.140 regmove.c
*** regmove.c	17 Jan 2003 03:28:09 -0000	1.140
--- regmove.c	21 Feb 2003 10:29:20 -0000
*************** optimize_reg_copy_1 (insn, dest, src)
*** 433,438 ****
--- 433,448 ----
  	continue;
  
        if (reg_set_p (src, p) || reg_set_p (dest, p)
+ 	  /* If SRC is an asm-declared register, it must not be replaced
+ 	     in any asm.  Unfortunately, the REG_EXPR tree for the asm
+ 	     variable may be absent in the SRC rtx, so we can't check the
+ 	     actual register declaration easily (the asm operand will have
+ 	     it, though).  To avoid complicating the test for a rare case,
+ 	     we just don't perform register replacement for a hard reg
+ 	     mentioned in an asm.  */
+ 	  || (sregno < FIRST_PSEUDO_REGISTER
+ 	      && asm_noperands (PATTERN (p)) >= 0
+ 	      && reg_overlap_mentioned_p (src, PATTERN (p)))
  	  /* Don't change a USE of a register.  */
  	  || (GET_CODE (PATTERN (p)) == USE
  	      && reg_overlap_mentioned_p (src, XEXP (PATTERN (p), 0))))

brgds, H-P
PS.  Still almost a month behind on GCC mailing lists.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]