This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH]: Add special code generation for SGI IP28/IP32-R10k.


peter fuerst <pf@net.alphadv.de> writes:
> ***
> Would some maintainer please send me the relevant form(s) for copyright-
> assignment, as described in contribute.html, maintain.* ?
> ***

I think the procedure described here:

    http://gcc.gnu.org/ml/gcc/2003-06/msg02298.html

is still the right one.  Hopefully someone will correct me if I'm wrong.

> On Tue, 7 Mar 2006, Richard Sandiford wrote:
>> OK, just thought I'd check.  Could you clarify how you've arrived at
>> those conditions then?  If I've understood the problem correctly,
>> a load or store is potentially problematic if you could be DMAing
>> to the same address.  Is that right?
>
> Yes, corrupting DMA-areas is the main problem.  Invalid, but legal,
> addresses are also problematic.  The documentation, mentioned in my
> first e-mail, is appropriate for explaining details.

Unless I'm missing something, I don't think it did.  It listed a set
of conditions, but it didn't really seem to explain why those conditions
were the right ones.

For example, the compiler was no intrinsic knowledge of what is and
isn't a DMA area.  You seem to be assuming that certain addresses
aren't DMA areas based on properties of a particular package (i.e.,
the linux kernel).  That's certainly not a problem, but I think the
documentation should say what these assumptions are.

>>                                       And the assumption is that
>> you would never DMA to stack addresses, so no insns are needed there?
>
> Yes.  Moreover $sp does not provide any invalid address, even if taken
> (mis-)speculatively.

You're talking about hardware rather than software speculation here
right?  Playing language lawyer for a second, my understanding is that:

     void foo (int x)
     {
       int array[1];
       if (x)
         bar (array[0x3fff]);
     }

is valid code if x is never true, so you're assuming either (a) the
kernel doesn't use code like that or (b) that addresses in the range
[$sp, $sp + 0x7fff] are always valid.

(Obviously the chances of seeing code written exactly like that are
rare, at least outside of compiler conformance testsuites.  But it's
more likely when the size of "array" and the condition in the "if"
involve configuration macros.)

I'm not trying to be awkward here.  I'm just trying to pin down
something along the lines of "this option is designed for code
which follows these guidelines: ...".  This should be included
in the documentation of the option, since it requires the user
to write code in a certain way, or make certain guarantees.

>> If so, why are constant addresses exempted too?
>>
> The DMA-buffers are allocated at runtime.
> However, with ".nomacro" these addresses are not recognized as constant

Right.  I think I mentioned this earlier.

Anyway, the patch is looking much better thanks, but there are still
a few things I'd like to see changed.

> +@item -mr10k-cache-barrier[=@var{level}]
> +@opindex mr10k-cache-barrier
> +Generate special cache barriers to avoid dangerous side-effects of speculative
> +execution in kernel code to be run on SGI's Indigo2 or O2 with R10000 (IP28 or
> +IP32/R10k).
> +@var{level} @option{1} enables cache-barriers before critical stores only,
> +@option{2} additionally enables cache-barriers before loads.
> +If no @var{level} is specified, @option{1} is assumed.
>  @end table

As I mentioned above, this needs to be a bit more descriptive.

> +static int
> +check_p_mem_expr (rtx *memx, void *data)
> +{
> +  if (!MEM_P (*memx) || for_each_rtx (memx, is_stack_pointer, 0))
> +    return 0;
> +
> +  /* Stores/Loads to/from constant addresses can be considered
> +     harmless, since:
> +     1)  the address is always valid, even when taken speculatively.
> +     2a) the location is (hopefully) never used as a dma-target, thus
> +         there is no danger of cache-inconsistency.
> +     2b) uncached loads/stores are guaranteed to be non-speculative. */
> +  if ( CONSTANT_P(XEXP (*memx, 0)) )
> +    return 0;

As I think I said before, this should really use mips_classify_address.

Would you consider variable indexes from symbolic addresses to be
OK too?  Things like global[X], I mean?  If so, you could extend
the constant check to LO_SUM addresses and avoid the nomacro problem
you describe above.

> +/* Return truth value whether we find (set (mem:M (non_stackpointer_address)
> +   ...)) in instruction-pattern BODY.

The comment isn't quite accurate: you exempt more than stack pointer
addresses.  It's OK to reference check_p_mem_expr, etc.

> +static int
> +check_p_pattern_for_store (rtx *body, void *data)
> +{
> +  if (*body && GET_CODE (*body) == SET)
> +    {
> +      /* Cache-barriers for SET_SRC may be requested as well. */
> +      if (!(TARGET_R10K_SPECEX & 2))
> +        body = &SET_DEST(*body);
> +
> +      if (for_each_rtx (body, check_p_mem_expr, 0))
> +        return 1;
> +
> +      /* Don't traverse sub-expressions again. */
> +      return -1;
> +    }
> +  return 0;
> +}

I think we've probably gone round something similar before,
but I think you should:

  (1) use for_each_rtx (..., check_p_mem_expr, 0) on the whole
      body of an insn if !(TARGET_R10K_SPECEX & 2)

  (2) use a note_stores callback otherwise.  The note_stores callback
      could itself use for_each_rtx (..., check_p_mem_expr, 0).

This is more robust because things like unspec_volatile can appear
outside a SET, and yet still read memory.

> +/* Check for (ins (set (mem:M (dangerous_address)) ...)) or end of the
> +   current basic block in instruction INSN.
> +   Criteria to recognize end-of/next basic-block are reduplicated here
> +   from final_scan_insn.
> +   return >0: INSN is critical.
> +   return <0: stop scan at INSN (usually end of current basic-block).
> +   return 0:  INSN can be ignored. */
> +
> +static int
> +check_insn_for_store (rtx insn)
> +{
> +  if (INSN_DELETED_P (insn))
> +    return 0;
> +
> +  if (LABEL_P (insn))
> +    return -1;
> +
> +  if (CALL_P (insn) || JUMP_P (insn) || NONJUMP_INSN_P (insn))
> +    {
> +      rtx body = PATTERN (insn);
> +      if (GET_CODE (body) == SEQUENCE)
> +        {
> +          /* A delayed-branch sequence. */
> +          rtx insq;
> +          FOR_EACH_SUBINSN(insq, insn)
> +            if (! INSN_DELETED_P (insq))
> +              {
> +                /* |1: delay-slot completely contained in sequence. */
> +                if (check_insn_for_store (insq) > 0)
> +                  return 1;
> +              }
> +          insq = SEQ_BEGIN(insn);
> +          /* Following a (conditional) branch sequence, we have a new
> +             basic block.  */
> +          if (JUMP_P (insq))
> +            return -1;
> +          /* Handle a call sequence like a conditional branch sequence. */
> +          if (CALL_P (insq))
> +            return -1;
> +        }
> +      if (GET_CODE (body) == PARALLEL)
> +        if (for_each_rtx (&body, check_p_pattern_for_store, 0))
> +          return 1;
> +
> +      /* Now, only a `simple' INSN or JUMP_INSN remains to be checked. */
> +      if (NONJUMP_INSN_P (insn))
> +        {
> +          /* If there's already a cache-op, nothing is left to do. */
> +          if (INSN_CODE (insn) == CODE_FOR_cacheop)
> +            return -1;
> +
> +          /* Since we don't know what's inside, we must take inline
> +             assembly to be dangerous. */
> +          if (GET_CODE (body) == ASM_INPUT)
> +            {
> +              const char *t = XSTR (body, 0);
> +              /* But if it starts with a cache-op, nothing is left to do. */
> +              if (t && !strncmp(t, "cache", 5))
> +                return -1;
> +              return 1;
> +            }

I suggested earlier that the kernel use a builtin function that
expands to the cacheop instruction.  Is that completely out of
the question?  I really don't like the idea of matching text
in inline asms.

> +          if (check_p_pattern_for_store (&body, 0) > 0)
> +            return 1;
> +        }
> +      /* Following a (conditional) branch, we have a new basic block.
> +         Handle a CALL_INSN instruction like a conditional branch. */
> +      if (JUMP_P (insn) || CALL_P (insn))
> +        return -1;
> +    }
> +  return 0;
> +}

This function could still be made a lot tidier.  Perhaps rather than all
this backseat driving on my part -- trying to get the code to be written
in a certain way -- I should take the patch and send back a revision.
I can do that once the copyright stuff is done.

Do you have any evidence that all this INSN_DELETED_P stuff is actually
needed in current sources?  It shouldn't be.

Richard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]