[PATCH]: Add special code generation for SGI IP28/IP32-R10k.

peter fuerst pf@net.alphadv.de
Mon Apr 3 17:28:00 GMT 2006



Hello !

Attached you find the updated IP28-cache-barrier patch (First version
submitted on Thu, 2 Mar 2006 01:16:08).

Briefly summarized, what this code aims at:
- it MUST detect each (first) critical store (in a bb), and protect it.
  (if in doubt, emit a cache-barrier)
- it should avoid to emit too many unnecessary cache-barriers.
- it MUST insert the cache-barriers at the right place: Not into a
  delay-slot. Not between other insns, which subsequent passes expect
  being adjacent. ...

The diffs are against gcc-4.2-20060225

UNSPEC_CACHE in mips.md probably needs a different value.


***
Would some maintainer please send me the relevant form(s) for copyright-
assignment, as described in contribute.html, maintain.* ?
***

Thank you for reviewing this patch.

pf


------------------------------------------------------------------------

As an addendum to the description above, i'd like to take up the discussion,
where we left:

On Tue, 7 Mar 2006, Richard Sandiford wrote:

> Date: Tue, 07 Mar 2006 11:09:47 +0000
> From: Richard Sandiford <richard@codesourcery.com>
> To: pf@net.alphadv.de
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH]: Add special code generation for SGI IP28. (fwd)
>
> ...
> OK, just thought I'd check.  Could you clarify how you've arrived at
> those conditions then?  If I've understood the problem correctly,
> a load or store is potentially problematic if you could be DMAing
> to the same address.  Is that right?

Yes, corrupting DMA-areas is the main problem.  Invalid, but legal,
addresses are also problematic.  The documentation, mentioned in my
first e-mail, is appropriate for explaining details.

>                                       And the assumption is that
> you would never DMA to stack addresses, so no insns are needed there?

Yes.  Moreover $sp does not provide any invalid address, even if taken
(mis-)speculatively.

> If so, why are constant addresses exempted too?
>
The DMA-buffers are allocated at runtime.
However, with ".nomacro" these addresses are not recognized as constant
(this seems to be the main contribution to the increase from 2.32% to 2.68%
cache-barriers in kernel-code, when switching from 3.4.2 to 4.x), so the
check for them may be omitted to reduce code size.

(Other candidates for code reduction are the two checks for alredy existing
cache-ops in check_insn_for_store().  These avoid some rare cases, where
duplicate cache-ops would be created, and thus are not absolutely necessary)


> ...
> >> > +                                                     If we could know in
> >> > +             advance that this jump is in `.reorder' mode, where gas will
> >> > +             insert a `nop' into the delay-slot, we could skip this test.
> >> > + ...
> >> I don't follow this comment.  If the call or jump is in "reorder" mode,
> >> it won't have any delay slots.
> >
> > That's, what this sentence (maybe superfluously) says.
>
> Not to me it doesn't.  This comment is in a block that handles
> SEQUENCEs.  If "this jump is .reorder mode", it won't _be_ a SEQUENCE,
> so the code will never be reached.
>
> Maybe we're in violent agreement about what the code should do.  But
> this comment implies to me that there's something to fix or clean up,
> whereas I don't think there is.
>

This comment is now gone, together with the check-loop, that it preceded
(not in a block, that handles sequences).  The reasons for dropping this
check-loop are the same, as stated in the comment in r10k_insert_..()
before emit_store..().

> ...
>> just walk from get_insns () to NULL and check for labels, much
>> like the mips_avoid_hazards and vr4130_align_insns functions do.
> ...
> Um, well, you don't seem to have really responded to my last paragraph
> above, where I tried to explain the problem with using basic block info
> this late.  So I'm a little unsure what you're going to do.  Are you
> going to follow the suggestion I made, or do something else?
>
> Richard
>

Indeed, a simple traversal of the list of all instructions results in much
clearer and shorter code, than trying to utilize the basic block info.
The only prerequisite is, that we can rely on the assumptions that are stated
in the comment in r10k_insert_cache_barriers(), when looking for the right
place to insert a cache-barrier.


------------------------------------------------------------------------

2006-03-26  Peter Fuerst  <pf@net.alphadv.de>

	* doc/invoke.texi: -mr10k-cache-barrier
	* config/mips/mips.opt: -mr10k-cache-barrier
	* config/mips/mips.md: r4k-style cache-operation
	* config/mips/mips.c (mips_reorg): r10k_insert_cache_barriers()
	  and auxiliary functions.
	* testsuite/gcc.target/mips/r10kcb.c, r10kcb-[012].c: New files.
	  Simple testcase.

------------------------------------------------------------------------

--- orig/gcc/doc/invoke.texi	2006-02-24 23:56:57.000000000 +0100
+++ ip28/gcc/doc/invoke.texi	2006-03-26 19:06:51.000000000 +0200
@@ -608,7 +608,8 @@
 -mflush-func=@var{func}  -mno-flush-func @gol
 -mbranch-likely  -mno-branch-likely @gol
 -mfp-exceptions -mno-fp-exceptions @gol
--mvr4130-align -mno-vr4130-align}
+-mvr4130-align -mno-vr4130-align @gol
+-mr10k-cache-barrier[=@var{level}]}

 @emph{MMIX Options}
 @gccoptlist{-mlibfuncs  -mno-libfuncs  -mepsilon  -mno-epsilon  -mabi=gnu @gol
@@ -10690,6 +10691,15 @@
 This option only has an effect when optimizing for the VR4130.
 It normally makes code faster, but at the expense of making it bigger.
 It is enabled by default at optimization level @option{-O3}.
+
+@item -mr10k-cache-barrier[=@var{level}]
+@opindex mr10k-cache-barrier
+Generate special cache barriers to avoid dangerous side-effects of speculative
+execution in kernel code to be run on SGI's Indigo2 or O2 with R10000 (IP28 or
+IP32/R10k).
+@var{level} @option{1} enables cache-barriers before critical stores only,
+@option{2} additionally enables cache-barriers before loads.
+If no @var{level} is specified, @option{1} is assumed.
 @end table

 @node MMIX Options
--- orig/gcc/config/mips/mips.opt	2005-07-23 10:36:54.000000000 +0200
+++ ip28/gcc/config/mips/mips.opt	2006-03-26 21:44:35.000000000 +0200
@@ -216,3 +216,10 @@
 mxgot
 Target Report Var(TARGET_XGOT)
 Lift restrictions on GOT size
+
+mr10k-cache-barrier=
+Target Report Joined UInteger Var(TARGET_R10K_SPECEX)
+-mr10k-cache-barrier[=1|2]	Generate cache barriers for SGI Indigo2/O2 R10k
+
+mr10k-cache-barrier
+Target Undocumented Var(TARGET_R10K_SPECEX) VarExists
--- orig/gcc/config/mips/mips.md	2006-01-29 04:08:38.000000000 +0100
+++ ip28/gcc/config/mips/mips.md	2006-03-26 21:42:01.000000000 +0200
@@ -49,7 +49,7 @@
    (UNSPEC_TLS_GET_TP		28)

    (UNSPEC_ADDRESS_FIRST	100)
-
+   (UNSPEC_CACHE		77)
    (FAKE_CALL_REGNO		79)

    ;; For MIPS Paired-Singled Floating Point Instructions.
@@ -5453,3 +5453,12 @@
 ; The MIPS DSP Instructions.

 (include "mips-dsp.md")
+
+; Cache operations for R4000-style caches (R4x00,R5000,R1x000,...).
+(define_insn "cacheop"
+  [(unspec_volatile [(match_operand 0 "const_int_operand" "")
+		     (match_operand 1 "memory_operand" "m")]
+		    UNSPEC_CACHE)]
+  ""
+  "cache\t%x0,%1"
+  [(set (attr "length") (const_int 4))])
--- orig/gcc/config/mips/mips.c	2006-02-06 19:20:47.000000000 +0100
+++ ip28/gcc/config/mips/mips.c	2006-03-26 23:54:51.000000000 +0200
@@ -409,6 +409,7 @@ static rtx mips_expand_builtin_compare (
 static rtx mips_expand_builtin_bposge (enum mips_builtin_type, rtx);
 static void mips_encode_section_info (tree, rtx, int);
 static void mips_extra_live_on_entry (bitmap);
+static void r10k_insert_cache_barriers (void);

 /* Structure to be filled in by compute_frame_size with register
    save masks, and offsets for the current function.  */
@@ -8927,6 +8928,8 @@ mips_reorg (void)
       if (TUNE_MIPS4130 && TARGET_VR4130_ALIGN)
 	vr4130_align_insns ();
     }
+  if (TARGET_R10K_SPECEX)
+    r10k_insert_cache_barriers ();
 }

 /* This function does three things:
@@ -10782,6 +10785,210 @@ mips_extra_live_on_entry (bitmap regs)
   if (!TARGET_ABICALLS)
     bitmap_set_bit (regs, PIC_FUNCTION_ADDR_REGNUM);
 }
+
+/* Subroutines used for MIPS code generation: generate cache-barriers
+   for SiliconGraphics IP28 and IP32/R10000 kernel-code. */
+
+/* Check, whether an instruction is a possibly harmful store instruction,
+   i.e. a store which might cause damage, if speculatively executed. */
+
+/* Return truth value whether the expression *MEMX instantiates
+   (mem:M (not (stackpointer_address or constant))). */
+
+static int
+is_stack_pointer (rtx *x, void *data)
+{
+  return (*x == stack_pointer_rtx);
+}
+
+static int
+check_p_mem_expr (rtx *memx, void *data)
+{
+  if (!MEM_P (*memx) || for_each_rtx (memx, is_stack_pointer, 0))
+    return 0;
+
+  /* Stores/Loads to/from constant addresses can be considered
+     harmless, since:
+     1)  the address is always valid, even when taken speculatively.
+     2a) the location is (hopefully) never used as a dma-target, thus
+         there is no danger of cache-inconsistency.
+     2b) uncached loads/stores are guaranteed to be non-speculative. */
+  if ( CONSTANT_P(XEXP (*memx, 0)) )
+    return 0;
+
+  return 1;
+}
+
+/* Return truth value whether we find (set (mem:M (non_stackpointer_address)
+   ...)) in instruction-pattern BODY.
+   Here we assume, that addressing with the stackpointer accesses neither
+   uncached-aliased nor invalid memory.
+   (May be, this applies to the global pointer and frame pointer also,
+   but its saver not to assume it. And probably it's not worthwile to
+   regard these registers)
+
+   Speculative loads from invalid addresses also cause bus errors...
+   So check for (set (reg:M ...) (mem:M (non_stackpointer_address)))
+   too, unless there is an enhanced bus-error handler. */
+
+static int
+check_p_pattern_for_store (rtx *body, void *data)
+{
+  if (*body && GET_CODE (*body) == SET)
+    {
+      /* Cache-barriers for SET_SRC may be requested as well. */
+      if (!(TARGET_R10K_SPECEX & 2))
+        body = &SET_DEST(*body);
+
+      if (for_each_rtx (body, check_p_mem_expr, 0))
+        return 1;
+
+      /* Don't traverse sub-expressions again. */
+      return -1;
+    }
+  return 0;
+}
+
+/* Check for (ins (set (mem:M (dangerous_address)) ...)) or end of the
+   current basic block in instruction INSN.
+   Criteria to recognize end-of/next basic-block are reduplicated here
+   from final_scan_insn.
+   return >0: INSN is critical.
+   return <0: stop scan at INSN (usually end of current basic-block).
+   return 0:  INSN can be ignored. */
+
+static int
+check_insn_for_store (rtx insn)
+{
+  if (INSN_DELETED_P (insn))
+    return 0;
+
+  if (LABEL_P (insn))
+    return -1;
+
+  if (CALL_P (insn) || JUMP_P (insn) || NONJUMP_INSN_P (insn))
+    {
+      rtx body = PATTERN (insn);
+      if (GET_CODE (body) == SEQUENCE)
+        {
+          /* A delayed-branch sequence. */
+          rtx insq;
+          FOR_EACH_SUBINSN(insq, insn)
+            if (! INSN_DELETED_P (insq))
+              {
+                /* |1: delay-slot completely contained in sequence. */
+                if (check_insn_for_store (insq) > 0)
+                  return 1;
+              }
+          insq = SEQ_BEGIN(insn);
+          /* Following a (conditional) branch sequence, we have a new
+             basic block.  */
+          if (JUMP_P (insq))
+            return -1;
+          /* Handle a call sequence like a conditional branch sequence. */
+          if (CALL_P (insq))
+            return -1;
+        }
+      if (GET_CODE (body) == PARALLEL)
+        if (for_each_rtx (&body, check_p_pattern_for_store, 0))
+          return 1;
+
+      /* Now, only a `simple' INSN or JUMP_INSN remains to be checked. */
+      if (NONJUMP_INSN_P (insn))
+        {
+          /* If there's already a cache-op, nothing is left to do. */
+          if (INSN_CODE (insn) == CODE_FOR_cacheop)
+            return -1;
+
+          /* Since we don't know what's inside, we must take inline
+             assembly to be dangerous. */
+          if (GET_CODE (body) == ASM_INPUT)
+            {
+              const char *t = XSTR (body, 0);
+              /* But if it starts with a cache-op, nothing is left to do. */
+              if (t && !strncmp(t, "cache", 5))
+                return -1;
+              return 1;
+            }
+
+          if (check_p_pattern_for_store (&body, 0) > 0)
+            return 1;
+        }
+      /* Following a (conditional) branch, we have a new basic block.
+         Handle a CALL_INSN instruction like a conditional branch. */
+      if (JUMP_P (insn) || CALL_P (insn))
+        return -1;
+    }
+  return 0;
+}
+
+
+/* Scan a basic block, starting after HEAD, for a possibly harmful store
+   instruction.  If found, output a cache barrier at the start of this
+   block.  */
+
+static rtx
+emit_store_cache_barrier (rtx head)
+{
+  rtx insn;
+
+  for (insn = NEXT_INSN (head); insn; insn = NEXT_INSN (insn))
+    {
+      int found;
+
+      found = check_insn_for_store (0, insn);
+      if (found < 0)
+        break;
+      if (found > 0)
+        {
+          /* Found critical store instruction. */
+
+          /* Skip to the (last) note, if any, adjacent to head. */
+          for (insn = NEXT_INSN (head); insn && NOTE_P (insn);
+            insn = NEXT_INSN (insn))
+          head = insn;
+
+          insn = gen_cacheop (gen_rtx_CONST_INT (VOIDmode, 0x14),
+                              gen_rtx_MEM (Pmode, stack_pointer_rtx));
+          return emit_insn_after (insn, head);
+        }
+    }
+  return 0;
+}
+
+/* Scan list of instructions for the begin of "basic blocks" (handle any
+   CALL_INSN instruction like a conditional branch here), and scan each
+   of these basic blocks for a possibly harmful store instruction.
+   If found, insert a cache barrier at its start. */
+
+static void
+r10k_insert_cache_barriers (void)
+{
+  rtx head = get_insns ();
+  gcc_assert (head);
+
+  /* Start scan from function entry. */
+  emit_store_cache_barrier (head);
+
+  for (head = NEXT_INSN (head); head; head = NEXT_INSN (head))
+    {
+      rtx insn;
+
+      if (INSN_DELETED_P (head))
+        continue;
+
+      insn = SEQ_BEGIN (head);
+
+      if (JUMP_P (insn) || CALL_P (insn) || LABEL_P (insn))
+        /* We can (and should) emit the cache-barrier immediately
+           after the CALL/JUMP. There's no danger, that we might
+           insert it into the delay-slot, the delay-slot is always
+           implicit: where the CALL/JUMP occurs outside a sequence
+           either the output-routine (.noreorder) or the assembler
+           (.reorder) will emit an adjacent NOP. */
+        emit_store_cache_barrier (head);
+    }
+}


 #include "gt-mips.h"
--- null/gcc/testsuite/gcc.target/mips/r10kcb.c	1970-01-01 00:00:00.000000000 +0100
+++ ip28/gcc/testsuite/gcc.target/mips/r10kcb.c	2006-03-29 16:59:46.000000000 +0200
@@ -0,0 +1,27 @@
+extern long *func ( long, long );
+
+long fnoreorder ( long *x, void *y )
+{
+	long  a, b;
+	if (x+1 != y)
+		*x = b;
+	a = *func (a, b);
+	if (x+2 != y)
+		a = *x;
+	*func (a, b) = a;
+	return a;
+}
+
+long freorder ( long *x, void *y )
+{
+	long a, b;
+	if (x+1 != y)
+		*x = b;
+	a = *func (a, b);
+	if (x+2 != y)
+		a = *x;
+	*func (a, b) = a;
+	if (x+3 != y)
+		__asm__ __volatile__("# dummy");
+	return a;
+}
--- null/gcc/testsuite/gcc.target/mips/r10kcb-0.c	1970-01-01 00:00:00.000000000 +0100
+++ ip28/gcc/testsuite/gcc.target/mips/r10kcb-0.c	2006-03-29 16:59:46.000000000 +0200
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-mips-options "-O2" } */
+/* { dg-final { scan-assembler-not "cache\[ \t\]0x14" } } */
+
+#include "r10kcb.c"
--- null/gcc/testsuite/gcc.target/mips/r10kcb-1.c	1970-01-01 00:00:00.000000000 +0100
+++ ip28/gcc/testsuite/gcc.target/mips/r10kcb-1.c	2006-03-29 16:59:46.000000000 +0200
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-mips-options "-mr10k-cache-barrier -O2" } */
+/* { dg-final { scan-assembler-times "cache\[ \t\]0x14" 5 } } */
+
+#include "r10kcb.c"
--- null/gcc/testsuite/gcc.target/mips/r10kcb-2.c	1970-01-01 00:00:00.000000000 +0100
+++ ip28/gcc/testsuite/gcc.target/mips/r10kcb-2.c	2006-03-29 16:59:46.000000000 +0200
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-mips-options "-mr10k-cache-barrier=2 -O2 -mno-abicalls" } */
+/* -mno-abicalls: avoid excess loads (resulting in 4 cache-barriers more). */
+/* { dg-final { scan-assembler-times "cache\[ \t\]0x14" 9 } } */
+
+#include "r10kcb.c"



More information about the Gcc-patches mailing list