Marc Glisse marc.glisse@inria.fr
Fri Aug 7 16:45:59 GMT 2020

On Fri, 7 Aug 2020, Richard Biener wrote:

>> I was mostly thinking of storing information like:
>> * don't care about the rounding mode for this operation
>> * may drop exceptions produced by this operation
>> * may produce extra exceptions
>> * don't care about signed zero
>> * may contract into FMA
>> * don't care about errno (for sqrt?)
>> etc
> So we could leverage the same mechanism for inlining a non-ffast-math
> function into a -ffast-math function, rewriting operations to IFNs?


> Though the resulting less optimization might offset any benefit we get 
> from the inlining...

I was hoping enough optimizations would still be possible. With the right 
flags, the function could be marked pure or const, could be vectorized, 
etc. We could go through the transformations in match.pd and copy each one 
for the IFN, checking the relevant set of flags (although they might need 
to be more manual in forwprop if match.pd cannot handle them).

> At least the above list somewhat suggests it want's to capture the 
> various -f*-math options.

Originally I only wanted rounding and exceptions, but this looked like a 
sensible generalization after a previous discussion.

>>> One complication with tracking data-flow is "unknown" stuff, I'd suggest
>>> to invent a mediator between memory state and FP state which would
>>> semantically be load and store operations of the FP state from/to memory.
>> All I can think of is make FP state a particular variable in memory, and
>> teach alias analysis that those functions only read/write to this
>> variable. What do you have in mind, splitting operations as:
>> fenv0 = read_fenv()
>> (res, fenv1) = oper(arg0, arg1, fenv0)
>> store_fenv(fenv1)
>> so that "oper" itself is const? (and hopefully simplify consecutive
>> read_fenv/store_fenv so there are fewer of them) I wonder if lying about
>> the constness of the operation may be problematic.
> Kind-of.  I thought to do this around "unknown" operations like function
> calls only:
> store_fenv(fenv0);
> foo ();
> fenv0 = read_fenv();

In what I described a few lines above, that's roughly what would remain 
after simplification, but instead you would generate it directly, saving 
some compile time if there are more floating point operations than 
unknown. It may help to add them also for branch/join. And even then it 
may not be sufficient. If 2 branches start with read_fenv or end with 
store_fenv, we don't want an optimizer to move them into a single call 
outside of the branches, because then the operation itself, being const, 
could move outside of the branch. ISTR that there are ways to avoid this 
kind of transformation (mostly meant to avoid duplicating an inline asm 
containing a hardcoded label).

At expansion, I guess read_fenv/store_fenv would expand to nothing, they 
were mostly there to protect the true operation, and we could still expand
(res, fenv1) = oper(arg0, arg1, fenv0)
if we don't want to also model things in RTL for every target (at least to 
begin with).

> I guess there's nothing else but to try ...
> Suppose for example you have
> _3 = .IFN_PLUS (_1, _2, 0);
> _4 = .IFN_PLUS (_1, _2, 0);
> the first plus may alter FP state (set inexact) but since the second plus
> computes the same value we'd want to elide it(?).

Assuming there is nothing in between, I think so, yes.

> Now if there's a feclearexcept() inbetween we can't elide it - and that 
> works as proposed because the memory state is inspected by 
> feclearexcept().

The exact effect of feclearexcept depends on how we model things. It could 
be considered write-only. If the argument is FE_ALL_EXCEPT, things may 
also be easier.

In some cases, with

_3 = .IFN_PLUS (_1, _2, 0);
feclearexcept (...);
_4 = .IFN_PLUS (_1, _2, 0);

we may want to elide the first IFN...

> But I can't see how we can convince FRE that we can elide the second 
> plus when both are modifying memory.

Yes, that's certainly harder.

Actually, for optimization purposes, I would distinguish the case where we 
care about exceptions and the case where we don't. The few times I've used 
exceptions, it was only for a single operation, and I didn't expect any 
optimization. On the other hand, I often use hundreds of rounded 
operations where I don't care about exceptions. Those can be marked as 
pure (I expect querying if .FENV_PLUS is pure to involve looking at a bit 
in its last argument), and would fit much more easily with the current 
optimizations. I can't claim that my uses are representative of all uses 
though, some people may do long, regular computations and trap on 

I am not that interested in exceptions, but since just rounding does not 
match a standard feature, it seemed more sensible to handle both together. 
I did wonder about making 2 sets of functions, the ones with exceptions 
(much harder for optimization, although not completely hopeless if people 
are really motivated) and the pure ones without exceptions, so the first 
wouldn't hinder the second too much. But having the strictest version 
first looked reasonable.

> There's no such thing currently as effects on memory state only depend 
> on arguments.

This reminds me of the initialization of static/thread_local variables in 
functions, when Jason tried to add an attribute, but I don't think it was 
ever committed, and the semantics were likely too different.

> I _think_ we don't have to say the mem out state depends on the mem in 
> state (FP ENV), well - it does, but the difference only depends on the 
> actual arguments.

A different rounding mode could cause different exceptions I believe.

> That said, tracking FENV together with memory will complicate things
> but explicitely tracking an (or multiple?) extra FP ENV register input/output
> makes the problem not go away (the second plus still has the mutated
> FP ENV from the first plus as input).  Instead we'd have to separately
> track the effect of a single operation and the overall FP state, like
> (_3, flags_5) = .IFN_PLUS (_1, _2, 0);
> fpexstate = merge (flags_5, fpexstate);
> (_4, flags_6) = .IFN_PLUS (_1, _2, 0);
> fpexstate = merge (flage_6, fpexstate);

We would have to be careful that lines 2 and 3 cannot be swapped (unless 
we keep all the merges and key expansion on those and not on the IFN? 
But we may end up with a use of the sum before the merge).

> or so and there we can CSE.

And I guess we would have a transformation
merge(f, merge(f, state)) --> merge(f, state)

> We have to track exception state separately
> from the FP control word for rounding-mode for this to work.  Thus when
> we're not interested in the exception state then .IFN_PLUS would be 'pure'
> (only dependent on the FP CW)?
> So I guess we should think of somehow separating rounding mode tracking
> and exception state?  If we make the functions affect memory anyway
> we can have the FP state reg(s) modeled explicitely with a fake decl(s) and pass
> that by reference to the IFNs?  Then we can make use of the "fn spec" attribute
> to tell which function reads/writes which reg.  Across unknown functions we'd
> then have to use the store/load "trick" to merge them with the global
> memory state though.

Splitting the rounding mode from the exceptions certainly makes sense, 
since they are used quite differently.

_3 = .FENV_PLUS (_1, _2, 0, &fenv_round, &fenv_except)
or just
_3 = .FENV_PLUS (_1, _2, 1, &fenv_round, 0)
_3 = .FENV_PLUS (_1, _2, 2, 0, &fenv_except)
when we are not interested in everything.

with fake global decls for fenv_round and fenv_except (so "unknown" 
already possibly reads/writes it) and fn specs to say it doesn't look at 
other memory? I was more thinking of making that implicit, through magic 
in a couple relevant functions (the value in flags says if the global 
fenv_round or fenv_except is accessed), as a refinement of just "memory".

But IIUC, we would need something that does not use memory at all (not 
even one variable) if we wanted to avoid the big penalty in alias 
analysis, etc.

If we consider the case without exceptions:

round = get_fenv_round()
_3 = .FENV_PLUS (_1, _2, opts, round)

with .FENV_PLUS "const" and get_fenv_round "pure" (or even reading round 
from a fake global variable instead of a function call) would be tempting, 
but it doesn't work, since now .FENV_PLUS can migrate after a later call 
to fesetround. Even without exceptions we need some protection after, so 
it may be easier to keep the memory (fenv) read as part of .FENV_PLUS.

Also, caring only about rounding doesn't match any standard #pragma, so 
such an option may see very little use in practice...

> Sorry for the incoherent brain-dump above ;)

It is great to have someone to discuss this with!

Marc Glisse

More information about the Gcc-patches mailing list