This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Asm volatile causing performance regressions on ARM


On 03/03/14 11:49, Richard Biener wrote:
> On Mon, Mar 3, 2014 at 11:41 AM, David Brown <david@westcontrol.com> wrote:
>> On 28/02/14 13:19, Richard Sandiford wrote:
>>> Georg-Johann Lay <avr@gjlay.de> writes:
>>>> Notice that in code1, func might contain such asm-pairs to implement
>>>> atomic operations, but moving costly_func across func does *not*
>>>> affect the interrupt respond times in such a disastrous way.
>>>>
>>>> Thus you must be *very* careful w.r.t. optimizing against asm volatile
>>>> + memory clobber.  It's too easy to miss some side effects of *real*
>>>> code.
>>>
>>> I understand the example, but I don't think volatile asms guarantee
>>> what you want here.
>>>
>>>> Optimizing code to scrap and pointing to some GCC internal reasoning or some
>>>> standard's wording does not help with real code.
>>>
>>> But how else can a compiler work?  It doesn't just regurgitate canned code,
>>> so it can't rely on human intuition as to what "makes sense".  We have to
>>> have specific rules and guarantees and say that anything outside those
>>> rules and guarantees is undefined.
>>>
>>> It sounds like you want an asm with an extra-strong ordering guarantee.
>>> I think that would need to be an extension, since it would need to consider
>>> cases where the asm is used in a function.  (Shades of carries_dependence
>>> or whatever in the huge atomic thread.)  I think anything where:
>>>
>>>   void foo (void) { X; }
>>>   void bar (void) { Y1; foo (); Y2; }
>>>
>>> has different semantics from:
>>>
>>>   void bar (void) { Y1; X; Y2; }
>>>
>>> is very dangerous.  And assuming that any function call could enable
>>> or disable interrupts, and therefore that nothing can be moved across
>>> a non-const function call, would limit things a bit too much.
>>>
>>> Thanks,
>>> Richard
>>>
>>>
>>
>> I think the problem stems from "volatile" being a barrier to /data flow/
>> changes,
> 
> What kind of /data flow/ changes?  It certainly isn't that currently,
> only two volatiles always conflict but not a volatile and a non-volatile mem:
> 
> static int
> true_dependence_1 (const_rtx mem, enum machine_mode mem_mode, rtx mem_addr,
>                    const_rtx x, rtx x_addr, bool mem_canonicalized)
> {
> ...
>   if (MEM_VOLATILE_P (x) && MEM_VOLATILE_P (mem))
>     return 1;
> 
> bool
> refs_may_alias_p_1 (ao_ref *ref1, ao_ref *ref2, bool tbaa_p)
> {
> ...
>   /* Two volatile accesses always conflict.  */
>   if (ref1->volatile_p
>       && ref2->volatile_p)
>     return true;
> 
>> but what is needed in this case is a barrier to /control flow/
>> changes.  To my knowledge, C does not provide any way of doing this, nor
>> are there existing gcc extensions to guarantee the ordering.  But it
>> certainly is the case that control flow ordering like this is important
>> - it can be critical in embedded systems (such as in the example here by
>> Georg-Johann), but it can also be important for non-embedded systems
>> (such as to minimise the time spend while holding a lock).
> 
> Can you elaborate on this?  I have a hard time thinking of a
> control flow transform that affects volatiles.
> 
> Richard.
> 

I am perhaps not expressing myself very clearly here (and I don't know
the internals of gcc well enough to use the source to help).

Normal (i.e., not "asm") volatile accesses force an order on those
volatile data accesses - if the source code says a volatile read of "x"
then a volatile read of "y", then the compiler has to issue those reads
in that order.  It can't re-order them, or hoist them out of a loop, or
do any other re-ordering optimisations.  Clobbers, inputs and outputs in
inline assembly give a similar ordering on the data flow.  But none of
this affects the /control/ flow.  So the __attribute__((const))
"costly_func" described by Georg-Johann can be moved freely by the
compiler amongst these volatile /data/ accesses.

The C abstract machine does not have any concept of timings, only of
observable accesses (volatile accesses, calls to external code, and
entry/exit from main()).  So it does not distinguish between the sequences:

	volX = 1;
	y = costly_func(z);
	volX = 2;

and

	y = costly_func(z);
	volX = 1;
	volX = 2;

and
	volX = 1;
	volX = 2;
	y = costly_func(z);

(This assumes that costly_func is __attribute__((const)), and y and z
are non-volatile.)

For some real-world usage, however, these sequences are very different.
 In "big" systems, it is unlikely to change correctness.  If "volX" were
part of a locking mechanism, for example, then each version of this code
would be correct - but they might differ in the length of time that the
locks were held, and that could seriously affect performance.  In
embedded systems, low performance could mean failure.  The problem is
exasperated by small cpus that need library functions for seemingly
simple operations - gcc might happily move a division operation around
without realising the massive time cost on an 8-bit processor.

In particular, I have seen code like this:

extern volatile int v1, v2;
extern volatile bool interruptEnable;
int c;
void foo(int a) {
	int b = a / c;
	interruptEnable = 0;
	v1 = b;
	v2 = b;
	interruptEnable = 1;
}

get transformed to move the division in between the interrupt disable
and writing to "v1".  This is a valid transform from C's viewpoint.
Putting a "volatile asm("" ::: "memory");" before disabling the
interrupts usually helps, but AFAIK it is not guaranteed by gcc.  Making
"b" volatile /will/ help, but means extra memory and instructions -
something you often want to avoid in embedded systems.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]