[RFC] Do not consider volatile asms as optimization barriers #1
Richard Sandiford
rsandifo@linux.vnet.ibm.com
Mon Mar 3 13:12:00 GMT 2014
Eric Botcazou <ebotcazou@adacore.com> writes:
>>> ...it's so loosely defined. If non-local labels are the specific problem,
>> I think it'd be better to limit the flush to that.
>
> No, there was "e.g." written so non-local labels are not the only problem.
What are the others though? As discussed in the other subthread,
I don't think prologue and epilogue barriers are quite the same.
>> I'm back to throwing examples around, sorry, but take the MIPS testcase:
>>
>> volatile int x = 1;
>>
>> void foo (void)
>> {
>> x = 1;
>> __builtin_mips_set_fcsr (0);
>> x = 2;
>> }
>>
>> where __builtin_mips_set_fcsr is a handy way of getting unspec_volatile.
>> (I'm not interested in what the function does here.) Even at -O2,
>> the cse.c code successfully prevents %hi(x) from being shared,
>> as you'd expect:
>>
>> li $3,1 # 0x1
>> lui $2,%hi(x)
>> sw $3,%lo(x)($2)
>> move $2,$0
>> ctc1 $2,$31
>> li $3,2 # 0x2
>> lui $2,%hi(x)
>> sw $3,%lo(x)($2)
>> j $31
>> nop
>>
>> But put it in a loop:
>>
>> void frob (void)
>> {
>> for (;;)
>> {
>> x = 1;
>> __builtin_mips_set_fcsr (0);
>> x = 2;
>> }
>> }
>>
>> and we get the rather bizarre code:
>>
>> lui $2,%hi(x)
>> li $6,1 # 0x1
>> move $5,$0
>> move $4,$2
>> li $3,2 # 0x2
>> .align 3
>> .L3:
>> sw $6,%lo(x)($2)
>> ctc1 $5,$31
>> sw $3,%lo(x)($4)
>> j .L3
>> lui $2,%hi(x)
>>
>> Here the _second_ %hi(x), the 1 and the 2 have been hoisted but the first
>> %hi(x) is reloaded each time. So what's the correct behaviour here?
>> Should the hoisting of the second %hi(x) have been disabled because the
>> loop contains an unspec_volatile? What about the 1 (from the first store)
>> and the 2?
>
> Well, I personally wouldn't spend much time on the code generated in a loop
> containing an UNSPEC_VOLATILE. If an instruction or a builtin is supposed to
> be performance-sensitive, then don't use an UNSPEC_VOLATILE by all means and
> properly model it instead!
That doesn't really answer the question though. What's the correct
behaviour for an unspec volatile in a loop? I don't think it's what
we did in the example above, since it doesn't seem self-consistent.
And "not spending too much time" is again a bit vague in terms of
saying what's right and what's wrong.
My point is that if the construct is well-defined enough to handle the
important things we want it to handle, the answer should be known to somebody,
even if it isn't to me. :-)
Thanks,
Richard
More information about the Gcc-patches
mailing list