builtin fe[gs]etround

Mon Feb 24 13:27:00 GMT 2014

On Mon, Feb 24, 2014 at 1:43 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
> On Mon, 24 Feb 2014, Richard Biener wrote:
>
>> On Sun, Feb 23, 2014 at 12:09 PM, Marc Glisse <marc.glisse@inria.fr>
>> wrote:
>>>
>>> Hello,
>>>
>>> a natural first step to optimize changes of rounding modes seems to be
>>> making these 2 functions builtins. I don't know exactly how far
>>> optimizations will be able to go (the fact that fesetround can fail
>>> complicates things a lot). What is included here:
>>>
>>> 1) fegetround is pure.
>>>
>>> 2) Neither function aliases (use or clobber) any memory. I expect this is
>>> likely not true on all platforms, some probably store the rounding mode
>>> in a
>>> global variable that is accessible through other means (though mixing
>>> direct
>>> accesses with calls to fe*etround seems a questionable style). Any
>>> opinion
>>> or advice here?
>>>
>>> Regtested on x86_64-linux-gnu, certainly not for 4.9.
>>
>>
>> Hohumm ... before making any of these functions less of a barrier than
>> they
>> are (at least for loads and stores), shouldn't we think of, and fix, the
>> lack of
>> any dependences between FP status word changes and actual arithmetic
>> instructions?
>
>
> I'd welcome such change, but it is beyond my gcc-foo (and my free time) for
> now.
>
>
>> In fact, using 'pure' or 'not use/clobber memory' here is exactly walking
>> on shaking grounds.
>
>
> I have a hard time seeing how making fegetround pure can break anything that
> accidentally works now. fegetround really is pure, it is fine to move it
> across float operations.

You mean "const" ;)  But yes, if you declare FP state to be "global memory"
then pure works (and is needed - you have to retain the dependency on a
fesetround).

Then if you assume that the "global memory" the FP state is in cannot be
addressed directly but has to go through fe* routines then declaring all
of them not clobbering/using a ref you can "name" would work.  Luckily
we don't have any predicates that disambiguate calls against each other
(yet).

What would break that works accidentially now is memory CSE across
fesetround/fegetround calls that exposes non-memory dependence chains
in arithmetic.  That's probably what makes most cases work that are
isolated into separate functions and that work on memory.

>> Simply because we lack of a way to say that this stmt uses/clobbers the FP
>> state (fegetround would be 'const' when following your logic in 2)).
>
>
> Not exactly, the logic in 2 is to say that the FP rounding mode is still a
> global variable, but not one that is accessible directly, so the alias
> oracle can never be called on a ref to it.
>
> (note that we probably don't want a single FP state but separate rounding
> mode on one side and exception flags on the other, since they are
> preserved/modified very differently)
>
>
>> Otherwise, what is it worth optimizing^breaking things even more than
>> we do now?
>
>
> With just my patch, probably not much. For someone interested, the kind of
> thing that I would like:
>
> #include <fenv.h>
> double protect(double x){asm volatile("":"+mx"(x));return x;}
> double add(double x,double y){
>   int old=fegetround();
>   fesetround(FE_UPWARD);
>   double res = protect(protect(x)+protect(y));
>   fesetround(old);
>   return res;
> }
> double f(double x,double y,double z){
>   return add(add(x,y),z);
> }
>
> (in practice I might add: if(old!=FE_UPWARD) in front of both fesetround)
> becomes:
>
>   old_9 = fegetround ();
>   fesetround (2048);
>   __asm__ __volatile__("" : "=mx" x_10 : "0" x_2(D));
>   __asm__ __volatile__("" : "=mx" x_11 : "0" y_3(D));
>   _12 = x_10 + x_11;
>   __asm__ __volatile__("" : "=mx" res_13 : "0" _12);
>   fesetround (old_9);
>   old_14 = fegetround ();
>   fesetround (2048);
>   __asm__ __volatile__("" : "=mx" x_15 : "0" res_13);
>   __asm__ __volatile__("" : "=mx" x_16 : "0" z_6(D));
>   _17 = x_15 + x_16;
>   __asm__ __volatile__("" : "=mx" res_18 : "0" _17);
>   fesetround (old_14);
>   return res_18;
>
> The interesting part is the 3 instructions in the middle. It is "easy" to
> replace old_14 with old_9: the vuse has a def_stmt which is an fesetround,
> and that fesetround must have succeeded because its argument has a def_stmt
> which is an fegetround. We are left with:
>
>   fesetround (old_9);
>   fesetround (2048);
>
> If we know somehow that the second fesetround can't fail (hardcode a list of
> safe values per platform?), we can remove fesetround (old_9). If we also
> assume that only fesetround can modify the rounding mode, we can prove that
> the second fesetround is redundant and remove it. We could also imagine
> saying that in both blocks the rounding mode is what you get when it was
> old_9 and you try to set it to 2048, and thus remove both middle fesetround
> at once. In any case, that brings the desired state of both additions
> sharing a single pre/post fesetround.
>
> Obviously that's wrong since the inline asm can modify the rounding mode
> (though why would you mix that with calls to fe*etround?), so that would
> probably require a nicer "protect", or even the special additions you
> mention in the next email.

Well, your asm cannot modify it as you don't have a use or clobber for FP
state (but there isn't any...).  Also I think the 'volatile' in the
asms isn't needed.

I see what you are after though.  I'm still not decided if we want to start
optimizing FP state modification when we can't even honor dependence on it ...
(optimizing FP state inspection might be another thing and somewhat more
obvious).

Let's revisit this during stage1.  I'd like to see the full set of C99 FP state
handling as builtins though, not a piecemail addition.

Thanks,
Richard.

> --
> Marc Glisse