This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH]: Add special code generation for SGI IP28/IP32-R10k.
- From: peter fuerst <pf at net dot alphadv dot de>
- To: gcc-patches at gcc dot gnu dot org
- Cc: Richard Sandiford <richard at codesourcery dot com>
- Date: Sun, 28 May 2006 23:41:44 +0200 (CEST)
- Subject: Re: [PATCH]: Add special code generation for SGI IP28/IP32-R10k.
- References: <Pine.LNX.4.58.0604022356250.271@Indigo2.Peter> <873bgob2bv.fsf@talisman.home>
- Reply-to: pf at net dot alphadv dot de
Hello !
On Sat, 8 Apr 2006, Richard Sandiford wrote:
> Date: Sat, 08 Apr 2006 14:53:56 +0100
> From: Richard Sandiford <richard@codesourcery.com>
> To: pf@net.alphadv.de
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH]: Add special code generation for SGI IP28/IP32-R10k.
>
> > ...
>
> Unless I'm missing something, I don't think it did. It listed a set
> of conditions, but it didn't really seem to explain why those conditions
> were the right ones.
Assuming, that
MIPS R10000 Microprocessor User's Manual, Version 2.0,
"Side Effects of Speculative Execution"
and
http://mail-index.netbsd.org/port-sgimips/2000/06/29/0006.html
are already carefully read, i'll use my own words trying to explain the
issue shortly:
A memory-access, like "sd $M,n($N)" will affect the cacheline, corresponding
to n($N): it may be reloaded and in the store-case will be marked dirty, with
the effect of an unpredictable write-back.
Now, this also is done, when the instruction is executed speculatively. And
there is no rollback, even, when the speculation turns out wrong!
Arriving on a misspeculated path, $N may contain a pointer into a area, where
a DMA-read is initiated, so at some time the cache and corresponding memory
contents may be different.
A read-cacheline can be cured by invalidating it, when DMA is done, but
the write-back must be avoided in the first place. This is done by a cache-
barrier (providing the cache-ops' side-effect of cutting off speculation
paths without touching the cache) at the beginning of the edge leading to
this instruction.
What cases of $N can be exempted from this measure?
- Stack-addresses and constant (static) addresses ("sd $M,symbol+n") will not
be used for DMA, since DMA-buffers are allocated at runtime.
- Uncached accesses will not be done speculatively, but they fall under the
"constant"-case already or will not be recognized at compile-time.
Besides the DMA-problem, depending on the mis-speculation path (up to four
branches deep), one of the frequently reused multi-purpose registers $N
will contain some "random" value, which may be a legal but invalid kernel-
address (say a800000061234567), reaching the memory-controller...
However, there are cases where a register $N's content is well defined, no
matter what (mis-)speculation path took us to this instruction:
- The stack-pointer points to the stack from kernel-initializtion on.
- Constant addresses ("symbol+n") are well defined "per se".
(Luckily, legal-but-invalid doesn't occur in user mode, where no cache-
barriers can be used. There we get either an address-error or a TLB-miss,
leaving memory/bus untouched.)
>
> For example, the compiler was no intrinsic knowledge of what is and
> isn't a DMA area. You seem to be assuming that certain addresses
> aren't DMA areas based on properties of a particular package (i.e.,
> the linux kernel). That's certainly not a problem, but I think the
> documentation should say what these assumptions are.
>
> > ...
>
> You're talking about hardware rather than software speculation here
> right? Playing language lawyer for a second, my understanding is that:
>
> void foo (int x)
> {
> int array[1];
> if (x)
> bar (array[0x3fff]);
> }
>
> is valid code if x is never true, so you're assuming either (a) the
> kernel doesn't use code like that or (b) that addresses in the range
> [$sp, $sp + 0x7fff] are always valid.
>
> (Obviously the chances of seeing code written exactly like that are
> rare, at least outside of compiler conformance testsuites. But it's
> more likely when the size of "array" and the condition in the "if"
> involve configuration macros.)
>
> I'm not trying to be awkward here. I'm just trying to pin down
> something along the lines of "this option is designed for code
> which follows these guidelines: ...". This should be included
> in the documentation of the option, since it requires the user
> to write code in a certain way, or make certain guarantees.
"since it requires the user to write code in a certain way, or make certain
guarantees." sounds a bit misleading, like: do not use otherwise!
I guess you wanted to say "To guarantee full efficacy of this option, the user
has to write..."
Btw. "option" is a bit of a misnomer, "prerequisite" is more accurate.
The option is not meant to fix questionable code, this is completely
out of its scope. It just addresses the situation in correct (Linux-,
*BSD-, whatever...) kernel code as is.
Of course it's possible to deliberately subvert the checks.
More typical candidates are:
void foo (int x)
{
int array[1];
if (y)
bar (array[x]);
}
or
void foo (int array[N])
{
int i;
for (i = 0; i < N; ++i)
array[i] = i;
}
where array[N] (really the corresponding cache-line) might (likely will)
be touched speculatively, with array+N pointing into a DMA-buffer or
beyond RAM ...
> > ...
>
> Right. I think I mentioned this earlier.
>
> Anyway, the patch is looking much better thanks, but there are still
> a few things I'd like to see changed.
>
> > +@item -mr10k-cache-barrier[=@var{level}]
> > +@opindex mr10k-cache-barrier
> > +Generate special cache barriers to avoid dangerous side-effects of speculative
> > +execution in kernel code to be run on SGI's Indigo2 or O2 with R10000 (IP28 or
> > +IP32/R10k).
> > +@var{level} @option{1} enables cache-barriers before critical stores only,
> > +@option{2} additionally enables cache-barriers before loads.
> > +If no @var{level} is specified, @option{1} is assumed.
> > @end table
>
> As I mentioned above, this needs to be a bit more descriptive.
A short description, which stores are considered critical/non-critical,
and references to the documentation could be helpful indeed, i think.
>
> > +static int
> > +check_p_mem_expr (rtx *memx, void *data)
> > +{
> > + if (!MEM_P (*memx) || for_each_rtx (memx, is_stack_pointer, 0))
> > + return 0;
> > +
> > + /* Stores/Loads to/from constant addresses can be considered
> > + harmless, since:
> > + 1) the address is always valid, even when taken speculatively.
> > + 2a) the location is (hopefully) never used as a dma-target, thus
> > + there is no danger of cache-inconsistency.
> > + 2b) uncached loads/stores are guaranteed to be non-speculative. */
> > + if ( CONSTANT_P(XEXP (*memx, 0)) )
> > + return 0;
>
> As I think I said before, this should really use mips_classify_address.
I guess, you prefer to see something like this:
check_p_mem_expr (rtx *memx, void *data)
{
struct mips_address_info info;
if (!MEM_P (*memx))
return 0;
if (mips_classify_address (&info, *memx, GET_MODE (*memx), true))
switch (info.type)
{
case ADDRESS_REG:
if (info.reg == stack_pointer_rtx)
return 0;
break;
case ADDRESS_SYMBOLIC:
case ADDRESS_CONST_INT:
/* Stores/Loads to/from constant addresses can be considered
harmless, since:
1) the address is always valid, even when taken speculatively.
2a) the location is (hopefully) never used as a dma-target, thus
there is no danger of cache-inconsistency.
2b) uncached loads/stores are guaranteed to be non-speculative. */
return 0;
case ADDRESS_LO_SUM:
;/* unsafe... !? */
}
return 1;
}
(besides, here "gcc_assert (mips_classify...);" would be more logical
than "if (mips_classify...)")
> Would you consider variable indexes from symbolic addresses to be
> OK too? Things like global[X], I mean? If so, you could extend
> the constant check to LO_SUM addresses and avoid the nomacro problem
> you describe above.
Things like global[X] (or stack[X]) are OK only for constant X (see "invalid
address").
Although some simple examples suggest, that at most constant subscripts would
generate (set(mem(lo_sum))), others (set(reg(lo_sum)) (set(mem(reg))) instead,
i'm not quite sure, if this is reliable and we can accept LO_SUM without doing
ex[pt]ensive backward analysis. (accepting 2.68% instead of 2.32%
cache-barriers seems easier)
>
> > +/* Return truth value whether we find (set (mem:M (non_stackpointer_address)
> > + ...)) in instruction-pattern BODY.
>
> The comment isn't quite accurate: you exempt more than stack pointer
> addresses. It's OK to reference check_p_mem_expr, etc.
>
> > +static int
> > +check_p_pattern_for_store (rtx *body, void *data)
> > +{
> > + if (*body && GET_CODE (*body) == SET)
> > + {
> > + /* Cache-barriers for SET_SRC may be requested as well. */
> > + if (!(TARGET_R10K_SPECEX & 2))
> > + body = &SET_DEST(*body);
> > +
> > + if (for_each_rtx (body, check_p_mem_expr, 0))
> > + return 1;
> > +
> > + /* Don't traverse sub-expressions again. */
> > + return -1;
> > + }
> > + return 0;
> > +}
>
> I think we've probably gone round something similar before,
> but I think you should:
>
> (1) use for_each_rtx (..., check_p_mem_expr, 0) on the whole
> body of an insn if !(TARGET_R10K_SPECEX & 2)
>
> (2) use a note_stores callback otherwise. The note_stores callback
> could itself use for_each_rtx (..., check_p_mem_expr, 0).
>
> This is more robust because things like unspec_volatile can appear
> outside a SET, and yet still read memory.
Good point (although currently there seems to be no such stuff in *.md).
Another possibility would be to always take unspec_volatile as dangerous,
like inline assembly.
I guess, you prefer to see something like this:
static void
cb_check_pattern_for_store (rtx dest, rtx insn, void *critical)
{
if (check_p_mem_expr (&dest, 0))
*(int*) critical |= 1;
}
static int
check_p_pattern_for_store (rtx *body, void *data)
{
if (*body)
{
/* Cache-barriers for SET_SRC may be requested as well. */
if (TARGET_R10K_SPECEX & 2)
{
if (for_each_rtx (body, check_p_mem_expr, 0))
return 1;
}
else
{
int critical = 0;
note_stores (*body, cb_check_pattern_for_store, &critical);
if (critical)
return 1;
}
/* Don't traverse sub-expressions again. */
if (GET_CODE (*body) == SET)
return -1;
}
return 0;
}
but i still consider the following being more transparent:
static int
check_p_pattern_for_store (rtx *body, void *data)
{
if (*body)
{
/* Cache-barriers for SET_SRC may be requested as well. */
if (TARGET_R10K_SPECEX & 2)
{
if (for_each_rtx (body, check_p_mem_expr, 0))
return 1;
}
else if (GET_CODE (*body) == SET
&& check_p_mem_expr (&SET_DEST(*body), 0))
return 1;
/* Don't traverse sub-expressions again. */
if (GET_CODE (*body) == SET)
return -1;
}
return 0;
}
>
> > ...
>
> I suggested earlier that the kernel use a builtin function that
> expands to the cacheop instruction. Is that completely out of
> the question? I really don't like the idea of matching text
> in inline asms.
Of course it is not out of question, but there's the problem of the hen
and the egg. The kernel code can not use the built-in functions before
they are available, and usually the compiler-version used is not the
latest one. Thus these changes will be done at the earliest a fair amount
of time later. I don't *like* matching text either...
As i (believe) did hint, we could just take inline assembly dangerous,
disregarding, what it contains, since it is rare and a few excessive cache
barriers (maybe ironically triggered by another cache op) won't hurt
performance noticably.
>
> > ...
>
> This function could still be made a lot tidier. Perhaps rather than all
The simplest way would be to avoid the (positive and negative) checks
alltogether and to slap a cache-barrier after each label, after each
branch. This works of course, it has the only disadvantage, that one out
of eight instructions would be a cache-barrier, flushing the pipeline
every two cycles...
> this backseat driving on my part -- trying to get the code to be written
> in a certain way -- I should take the patch and send back a revision.
> I can do that once the copyright stuff is done.
>
> Do you have any evidence that all this INSN_DELETED_P stuff is actually
> needed in current sources? It shouldn't be.
>
Indeed, the only INSN_DELETED_P should be the one on function entry
- + FOR_EACH_SUBINSN(insq, insn)
- + if (! INSN_DELETED_P (insq))
- + {
- + if (check_insn_for_store (insq) > 0)
- + return 1;
- + }
- + insq = SEQ_BEGIN(insn);
+ + FOR_EACH_SUBINSN(insq, insn)
+ + if (check_insn_for_store (insq) > 0)
+ + return 1;
+ + insq = SEQ_BEGIN(insn);
> Richard
>
kind regards
peter