This is the mail archive of the
mailing list for the GCC project.
Re: Request for discussion: Rewrite of inline assembler docs
- From: Richard Sandiford <rdsandiford at googlemail dot com>
- To: dw <limegreensocks at yahoo dot com>
- Cc: gcc at gcc dot gnu dot org
- Date: Sat, 22 Mar 2014 10:56:29 +0000
- Subject: Re: Request for discussion: Rewrite of inline assembler docs
- Authentication-results: sourceware.org; auth=none
- References: <530F1C69 dot 5050305 at redhat dot com> <87eh2oah7l dot fsf at sandifor-thinkpad dot stglab dot manchester dot uk dot ibm dot com> <530FE6CE dot 1000001 at yahoo dot com> <87ha7jd75c dot fsf at talisman dot default> <53144EC6 dot 2080600 at yahoo dot com> <87fvmzy0n9 dot fsf at sandifor-thinkpad dot stglab dot manchester dot uk dot ibm dot com> <53181FBE dot 1010306 at yahoo dot com>
Sorry for the slow response.
dw <firstname.lastname@example.org> writes:
> On 3/3/2014 3:36 AM, Richard Sandiford wrote:
>> Well, like you say, things can be moved across branches. So, although
>> this is a very artificial example:
>> asm ("x");
>> asm ("y");
>> could become:
>> goto bar;
>> asm ("y");
>> asm ("x");
>> goto foo;
>> This has reordered the instructions in the sense that they have a
>> different order in memory. But they are still _executed_ in the same
>> order. Actually reordering the execution would be a serious bug.
>> So I just want to avoid anything that gives the impression that "y" can
>> be executed before "x" in this example. I still think:
>>> Since the existing docs say "GCC's optimizer can move asm statements
>>> relative to other code", how would you feel about:
>>> "Do not expect a sequence of |asm| statements to remain perfectly
>>> consecutive after compilation. If you want to stop the compiler from
>>> reordering or inserting anything into a sequence of assembler
>>> instructions, use a single |asm| statement containing multiple
>>> instructions. Note that GCC's optimizer can move |asm| statements
>>> relative to other code, including across jumps."
>> ...this gives the impression that we might try to execute volatiles
>> in a different order.
> Ahh! Ok, I see what you mean. Hmm. Based on the description of
> "no-toplevel-reorder", I assumed that it actually *might* re-order them.
Well, -fno-toplevel-reorder only applies to asms outside functions,
where there's no execution order as such. Top-level asms are treated
more like function definitions.
> "GCC's optimizer can move asm statements relative to other
> code, including across jumps. This has implications for code
> that contains a sequence of asm statements. While the execution
> order of asm statements will be preserved, do not expect the sequence of asm
> statements to remain perfectly consecutive in the compiler's output.
This part sounds good to me FWIW.
> To ensure that assembler instructions maintain their order, use a
> single asm statement containing multiple instructions."
I'm still unsure about "maintain their order" here though. How about
If certain instructions need to remain consecutive in the output,
put them in a single multi-instruction asm statement.
>>>>>> In the extended section:
>>>>>> Unless an output operand has the '&' constraint modifier (see
>>>>>> Modifiers), GCC may allocate it in the same register as an unrelated
>>>>>> input operand, [...]
>>>>>> It could also use it for addresses in other (memory) outputs.
>>>>> Ok. But I'm not sure this really adds anything. Having warned people
>>>>> that the register may be re-used unless '&' is used seems sufficient.
>>>> It matters where it can be reused though. If you talk about input
>>>> operands only, people might think it is OK to write asms of the form:
>>>> foo tmp,[input0]
>>>> bar [output0],tmp
>>>> frob [output1],tmp
>>>> where output0 is a register and output1 is a memory. This safely avoids
>>>> using the input operand after assigning to output0, but the address in
>>>> output1 is still live and could be changed by bar.
>>> I'm not sure we're talking about the same problem. I'm borrowing this
>>> x86 example from someone else:
>>> static inline char *
>>> lcopy( char *dst, const char *src, long len )
>>> "shr $3,%2; " /* how many qwords to copy */
>>> "rep movsq; " /* copy that many */
>>> "mov %3,%2; " /* how many bytes to copy */
>>> "rep movsb" /* copy that many */
>>> : "+D" (dst), "+S" (src), "+c" (len)
>>> : "r" (len & 7)
>>> : "memory");
>>> return dst;
>>> You might expect that "len" and "len & 7" are two different things.
>>> However if the function is called with a constant less than 8, the
>>> compiler knows that they are actually the same and uses rcx for both,
>>> giving mov rcx,rcx for mov %3,%2 and of course by then rcx is zero.
>>> Using & on len forces the use of two separate registers.
>>> This seems to me to be a different kind of problem than:
>>> asm ("xxx": "=r" (x), "=m" (x));
>>> Or am I missing your point?
>> Well, that code is just one instance of (and a good example of)
>> the principle that GCC assumes all inputs are consumed before any
>> outputs are written. And the point is that the "inputs" in that
>> description aren't restricted to input operands: they apply to any
>> rvalues in the output operands too.
>> E.g. the same thing could occur for an artificial case like:
>> asm ("...." : "+r" (ptr), "=m" (*x));
>> if GCC realises that x==ptr. Then the address in operand 1 might
>> be the same as operand 0. The same goes for:
>> asm ("...." : "=r" (ptr), "=m" (*x) : "0" (ptr));
>> which is really just another way of writing the same thing.
> So, more like this:
> "Unless an output operand has the '|&|' constraint modifier (see
> Modifiers <cid:email@example.com>), GCC may allocate it
> in the same register as an unrelated input operand, on the assumption
> that the assembler code will consume its inputs before producing
> outputs. This assumption may be false if the assembler code consists of
> more than one instruction. Further, if the compiler determines that two
> output operands refer to the same object, output operands can also be
> combined to use the same register. To prevent this combining, use '|&|'
> for each output operand that must not overlap another operand.
That makes it sound like two output operands writing to the same place,
which isn't valid. In the example above, %1 would produce the assembly
syntax for the memory at address x. Similarly %a1 would produce just
address x, without the deference syntax. My point was that if x==ptr
before the asm, the register that holds x in those addresses could be
the same as the register chosen for %0, even though you might think that
using "+" would make the register exclusive to %0. So if the asm wrote
to operand 0 first, it might also change the address used when storing
to memory operand 1. A & would be needed on operand 0 to prevent that.
The more general point is that, by default, an instruction is assumed to
read all its inputs before writing to its outputs. And "input" in this
case includes both input operands and addresses used in output operands.
Whenever you have:
asm ("" : "=m" (*x), "=r" (y));
you have to assume that the address in %0 might use the same register as %1,
(If "y" isn't an input to the asm then the same register can be used for the
incoming "x" and outgoing "y" even when x!=y before the asm.)