This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH]: Add special code generation for SGI IP28/IP32-R10k.


Hello !

On Sat, 8 Apr 2006, Richard Sandiford wrote:

> Date: Sat, 08 Apr 2006 14:53:56 +0100
> From: Richard Sandiford <richard@codesourcery.com>
> To: pf@net.alphadv.de
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH]: Add special code generation for SGI IP28/IP32-R10k.
>
> > ...
>
> Unless I'm missing something, I don't think it did.  It listed a set
> of conditions, but it didn't really seem to explain why those conditions
> were the right ones.

Assuming, that
MIPS R10000 Microprocessor User's Manual, Version 2.0,
"Side Effects of Speculative Execution"
and
http://mail-index.netbsd.org/port-sgimips/2000/06/29/0006.html
are already carefully read, i'll use my own words trying to explain the
issue shortly:

A memory-access, like "sd $M,n($N)" will affect the cacheline, corresponding
to n($N): it may be reloaded and in the store-case will be marked dirty, with
the effect of an unpredictable write-back.
Now, this also is done, when the instruction is executed speculatively. And
there is no rollback, even, when the speculation turns out wrong!
Arriving on a misspeculated path, $N may contain a pointer into a area, where
a DMA-read is initiated, so at some time the cache and corresponding memory
contents may be different.
A read-cacheline can be cured by invalidating it, when DMA is done, but
the write-back must be avoided in the first place. This is done by a cache-
barrier (providing the cache-ops' side-effect of cutting off speculation
paths without touching the cache) at the beginning of the edge leading to
this instruction.
What cases of $N can be exempted from this measure?
- Stack-addresses and constant (static) addresses ("sd $M,symbol+n") will not
  be used for DMA, since DMA-buffers are allocated at runtime.
- Uncached accesses will not be done speculatively, but they fall under the
  "constant"-case already or will not be recognized at compile-time.

Besides the DMA-problem, depending on the mis-speculation path (up to four
branches deep), one of the frequently reused multi-purpose registers $N
will contain some "random" value, which may be a legal but invalid kernel-
address (say a800000061234567), reaching the memory-controller...
However, there are cases where a register $N's content is well defined, no
matter what (mis-)speculation path took us to this instruction:
- The stack-pointer points to the stack from kernel-initializtion on.
- Constant addresses ("symbol+n") are well defined "per se".
(Luckily, legal-but-invalid doesn't occur in user mode, where no cache-
barriers can be used. There we get either an address-error or a TLB-miss,
leaving memory/bus untouched.)

>
> For example, the compiler was no intrinsic knowledge of what is and
> isn't a DMA area.  You seem to be assuming that certain addresses
> aren't DMA areas based on properties of a particular package (i.e.,
> the linux kernel).  That's certainly not a problem, but I think the
> documentation should say what these assumptions are.
>
> > ...
>
> You're talking about hardware rather than software speculation here
> right?  Playing language lawyer for a second, my understanding is that:
>
>      void foo (int x)
>      {
>        int array[1];
>        if (x)
>          bar (array[0x3fff]);
>      }
>
> is valid code if x is never true, so you're assuming either (a) the
> kernel doesn't use code like that or (b) that addresses in the range
> [$sp, $sp + 0x7fff] are always valid.
>
> (Obviously the chances of seeing code written exactly like that are
> rare, at least outside of compiler conformance testsuites.  But it's
> more likely when the size of "array" and the condition in the "if"
> involve configuration macros.)
>
> I'm not trying to be awkward here.  I'm just trying to pin down
> something along the lines of "this option is designed for code
> which follows these guidelines: ...".  This should be included
> in the documentation of the option, since it requires the user
> to write code in a certain way, or make certain guarantees.

"since it requires the user to write code in a certain way, or make certain
guarantees." sounds a bit misleading, like: do not use otherwise!
I guess you wanted to say "To guarantee full efficacy of this option, the user
has to write..."
Btw. "option" is a bit of a misnomer, "prerequisite" is more accurate.

The option is not meant to fix questionable code, this is completely
out of its scope. It just addresses the situation in correct (Linux-,
*BSD-, whatever...) kernel code as is.

Of course it's possible to deliberately subvert the checks.

More typical candidates are:

  void foo (int x)
  {
    int array[1];
    if (y)
      bar (array[x]);
  }
or
  void foo (int array[N])
  {
    int i;
    for (i = 0; i < N; ++i)
      array[i] = i;
  }
where array[N] (really the corresponding cache-line) might (likely will)
be touched speculatively, with array+N pointing into a DMA-buffer or
beyond RAM ...

> > ...
>
> Right.  I think I mentioned this earlier.
>
> Anyway, the patch is looking much better thanks, but there are still
> a few things I'd like to see changed.
>
> > +@item -mr10k-cache-barrier[=@var{level}]
> > +@opindex mr10k-cache-barrier
> > +Generate special cache barriers to avoid dangerous side-effects of speculative
> > +execution in kernel code to be run on SGI's Indigo2 or O2 with R10000 (IP28 or
> > +IP32/R10k).
> > +@var{level} @option{1} enables cache-barriers before critical stores only,
> > +@option{2} additionally enables cache-barriers before loads.
> > +If no @var{level} is specified, @option{1} is assumed.
> >  @end table
>
> As I mentioned above, this needs to be a bit more descriptive.

A short description, which stores are considered critical/non-critical,
and references to the documentation could be helpful indeed, i think.

>
> > +static int
> > +check_p_mem_expr (rtx *memx, void *data)
> > +{
> > +  if (!MEM_P (*memx) || for_each_rtx (memx, is_stack_pointer, 0))
> > +    return 0;
> > +
> > +  /* Stores/Loads to/from constant addresses can be considered
> > +     harmless, since:
> > +     1)  the address is always valid, even when taken speculatively.
> > +     2a) the location is (hopefully) never used as a dma-target, thus
> > +         there is no danger of cache-inconsistency.
> > +     2b) uncached loads/stores are guaranteed to be non-speculative. */
> > +  if ( CONSTANT_P(XEXP (*memx, 0)) )
> > +    return 0;
>
> As I think I said before, this should really use mips_classify_address.

I guess, you prefer to see something like this:

check_p_mem_expr (rtx *memx, void *data)
{
  struct mips_address_info info;

  if (!MEM_P (*memx))
    return 0;

  if (mips_classify_address (&info, *memx, GET_MODE (*memx), true))
    switch (info.type)
      {
      case ADDRESS_REG:
        if (info.reg == stack_pointer_rtx)
          return 0;
        break;

      case ADDRESS_SYMBOLIC:
      case ADDRESS_CONST_INT:
        /* Stores/Loads to/from constant addresses can be considered
           harmless, since:
           1)  the address is always valid, even when taken speculatively.
           2a) the location is (hopefully) never used as a dma-target, thus
               there is no danger of cache-inconsistency.
           2b) uncached loads/stores are guaranteed to be non-speculative. */
        return 0;

      case ADDRESS_LO_SUM:
        ;/* unsafe... !? */
      }
  return 1;
}

(besides, here "gcc_assert (mips_classify...);" would be more logical
 than "if (mips_classify...)")

> Would you consider variable indexes from symbolic addresses to be
> OK too?  Things like global[X], I mean?  If so, you could extend
> the constant check to LO_SUM addresses and avoid the nomacro problem
> you describe above.

Things like global[X] (or stack[X]) are OK only for constant X (see "invalid
address").
Although some simple examples suggest, that at most constant subscripts would
generate (set(mem(lo_sum))), others (set(reg(lo_sum)) (set(mem(reg))) instead,
i'm not quite sure, if this is reliable and we can accept LO_SUM without doing
ex[pt]ensive backward analysis.  (accepting 2.68% instead of 2.32%
cache-barriers seems easier)

>
> > +/* Return truth value whether we find (set (mem:M (non_stackpointer_address)
> > +   ...)) in instruction-pattern BODY.
>
> The comment isn't quite accurate: you exempt more than stack pointer
> addresses.  It's OK to reference check_p_mem_expr, etc.
>
> > +static int
> > +check_p_pattern_for_store (rtx *body, void *data)
> > +{
> > +  if (*body && GET_CODE (*body) == SET)
> > +    {
> > +      /* Cache-barriers for SET_SRC may be requested as well. */
> > +      if (!(TARGET_R10K_SPECEX & 2))
> > +        body = &SET_DEST(*body);
> > +
> > +      if (for_each_rtx (body, check_p_mem_expr, 0))
> > +        return 1;
> > +
> > +      /* Don't traverse sub-expressions again. */
> > +      return -1;
> > +    }
> > +  return 0;
> > +}
>
> I think we've probably gone round something similar before,
> but I think you should:
>
>   (1) use for_each_rtx (..., check_p_mem_expr, 0) on the whole
>       body of an insn if !(TARGET_R10K_SPECEX & 2)
>
>   (2) use a note_stores callback otherwise.  The note_stores callback
>       could itself use for_each_rtx (..., check_p_mem_expr, 0).
>
> This is more robust because things like unspec_volatile can appear
> outside a SET, and yet still read memory.

Good point (although currently there seems to be no such stuff in *.md).
Another possibility would be to always take unspec_volatile as dangerous,
like inline assembly.

I guess, you prefer to see something like this:

static void
cb_check_pattern_for_store (rtx dest, rtx insn, void *critical)
{
  if (check_p_mem_expr (&dest, 0))
    *(int*) critical |= 1;
}

static int
check_p_pattern_for_store (rtx *body, void *data)
{
  if (*body)
    {
      /* Cache-barriers for SET_SRC may be requested as well. */
      if (TARGET_R10K_SPECEX & 2)
        {
          if (for_each_rtx (body, check_p_mem_expr, 0))
            return 1;
        }
      else
        {
          int critical = 0;
          note_stores (*body, cb_check_pattern_for_store, &critical);
          if (critical)
            return 1;
        }
      /* Don't traverse sub-expressions again. */
      if (GET_CODE (*body) == SET)
        return -1;
    }
  return 0;
}

but i still consider the following being more transparent:

static int
check_p_pattern_for_store (rtx *body, void *data)
{
  if (*body)
    {
      /* Cache-barriers for SET_SRC may be requested as well. */
      if (TARGET_R10K_SPECEX & 2)
        {
          if (for_each_rtx (body, check_p_mem_expr, 0))
            return 1;
        }
      else if (GET_CODE (*body) == SET
               && check_p_mem_expr (&SET_DEST(*body), 0))
        return 1;

      /* Don't traverse sub-expressions again. */
      if (GET_CODE (*body) == SET)
        return -1;
    }
  return 0;
}

>
> > ...
>
> I suggested earlier that the kernel use a builtin function that
> expands to the cacheop instruction.  Is that completely out of
> the question?  I really don't like the idea of matching text
> in inline asms.

Of course it is not out of question, but there's the problem of the hen
and the egg. The kernel code can not use the built-in functions before
they are available, and usually the compiler-version used is not the
latest one.  Thus these changes will be done at the earliest a fair amount
of time later.  I don't *like* matching text either...

As i (believe) did hint, we could just take inline assembly dangerous,
disregarding, what it contains, since it is rare and a few excessive cache
barriers (maybe ironically triggered by another cache op)  won't hurt
performance noticably.

>
> > ...
>
> This function could still be made a lot tidier.  Perhaps rather than all

The simplest way would be to avoid the (positive and negative) checks
alltogether and to slap a cache-barrier after each label, after each
branch. This works of course, it has the only disadvantage, that one out
of eight instructions would be a cache-barrier, flushing the pipeline
every two cycles...

> this backseat driving on my part -- trying to get the code to be written
> in a certain way -- I should take the patch and send back a revision.
> I can do that once the copyright stuff is done.
>
> Do you have any evidence that all this INSN_DELETED_P stuff is actually
> needed in current sources?  It shouldn't be.
>

Indeed, the only INSN_DELETED_P should be the one on function entry
- +          FOR_EACH_SUBINSN(insq, insn)
- +            if (! INSN_DELETED_P (insq))
- +              {
- +                if (check_insn_for_store (insq) > 0)
- +                  return 1;
- +              }
- +          insq = SEQ_BEGIN(insn);
+ +          FOR_EACH_SUBINSN(insq, insn)
+ +            if (check_insn_for_store (insq) > 0)
+ +              return 1;
+ +          insq = SEQ_BEGIN(insn);

> Richard
>

kind regards

peter


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]