Marc Glisse marc.glisse@inria.fr
Fri Aug 7 10:45:51 GMT 2020

Thank you for your comments.

On Fri, 7 Aug 2020, Richard Biener wrote:

>> Conversions look like
>> .FENV_CONVERT (arg, (target_type*)0, 0)
>> the pointer is there so we know the target type, even if the lhs
>> disappears at some point. The last 0 is the same as for all the others, a
>> place to store options about the operation (do we care about rounding,
>> about exceptions, etc), it is just a placeholder for now. I could rename
>> it to .FENV_NOP since we seem to generate NOP usually, but it looked
>> strange to me.
> You could carry the info in the existing flags operand if you make that a
> pointer ...

Ah, true, I forgot that some other trees already use this kind of trick.
Not super pretty, but probably better than an extra argument.

> Adding some info missing above from reading the patch.
> The idea seems to be to turn FP operations like PLUS_EXPR, FLOAT_EXPR
> but also (only?) calls to BUILT_IN_SQRT to internal functions named
> IFN_FENV_* where the internal function presumably has some extra
> information.

Sqrt does seem to have a special place in IEEE 754, and in practice some
targets have instructions (with rounding) for it.

> You have
> +/* float operations with rounding / exception flags.  */
> so with -fnon-call-exceptions they will not be throwing (but regular
> FP PLUS_EXPR would).

Hmm, ok, I guess I should remove ECF_NOTHROW then, the priority should
be to be correct, we can carefully reintroduce optimizations later.

> They will appear to alter memory state - that's probably to have the
> extra dependence on FENV changing/querying operations but then why do you
> still need to emit asm()s?

The IFNs are for GIMPLE and represent the operations, while the asm are 
simple passthrough for RTL, I replace the first with the second (plus the 
regular operation) at expansion.

> I suppose the (currently unused) flags parameter could be populated with
> some known FP ENV state and then limited optimization across stmts
> with the same non-zero state could be done?

I was mostly thinking of storing information like:
* don't care about the rounding mode for this operation
* may drop exceptions produced by this operation
* may produce extra exceptions
* don't care about signed zero
* may contract into FMA
* don't care about errno (for sqrt?)

With fenv_round, we would actually have to store the rounding mode of
the operation (upward, towards-zero, dynamic, don't-care, etc), a bit
less nice because 0 is not a safe fallback anymore. We could also store
it when we detect a call to fesetround before, but we have to be careful
that this doesn't result in even more calls to fesetround at expansion
for targets that do not have statically rounded operations.

If there are other, better things to store there, great.

> Using internal function calls paints us a bit into a corner since they are still
> subject to the single-SSA def restriction in case we'd want to make FENV
> dataflow more explicit.  What's the advantage of internal functions compared
> to using asms for the operations themselves if we wrap this class into
> a set of "nicer" helpers?

I wanted the representation on gimple to look a bit nice so it would be 
both easy to read in the dumps, and not too hard to write optimizations 
for, and a function call looked good enough. Making FENV dataflow explicit 
would mean having PHIs for FENV, etc? At most I thought FENV would be 
represented by one specific memory region which would not alias user 
variables of type float or double, in particular.

I don't really see what it would look like with asms and helpers. In some 
sense, the IFNs are already wrappers, that we unwrap at expansion. Your 
asms would take some FENV as intput and output, so we have to track what 
FENV to use where, similar to .MEM.

> One complication with tracking data-flow is "unknown" stuff, I'd suggest
> to invent a mediator between memory state and FP state which would
> semantically be load and store operations of the FP state from/to memory.

All I can think of is make FP state a particular variable in memory, and 
teach alias analysis that those functions only read/write to this 
variable. What do you have in mind, splitting operations as:

fenv0 = read_fenv()
(res, fenv1) = oper(arg0, arg1, fenv0)

so that "oper" itself is const? (and hopefully simplify consecutive 
read_fenv/store_fenv so there are fewer of them) I wonder if lying about 
the constness of the operation may be problematic.

(and asm would be abused as a way to return a pair, with hopefully some 
marker so we know it isn't a real asm)

> That said, you're the one doing the work and going with internal functions
> is reasonable - I'm not sure to what extent optimization for FENV acccess
> code will ever be possible (or wanted/expected).  So going more precise
> might not have any advantage.

I think some optimizations are expected. For instance, not having to 
re-read the same number from memory many times just because there was an 
addition in between (which could write to fenv but that's it). Some may 
still want FMA (with a consistent rounding direction). For those (like me) 
who usually only care about rounding and not exceptions, making the 
operations pure would be great, and nothing says we cannot vectorize those 
rounded operations!

I am trying to be realistic with what I can achieve, but if you think the 
IFNs would paint us into a corner, then we can drop this approach.

> You needed to guard SQRT - will you need to guard other math functions?
> (round, etc.)

Maybe, but probably not many. I thought I might have to guard all of them 
(sin, cos, etc), but IIRC Joseph's comment seemed to imply that this 
wouldn't be necessary. I am likely missing FMA now...

> If we need to keep the IFNs use memory state they will count towards
> walk limits of the alias oracle even if they can be disambiguated against.
> This will affect both compile-time and optimizations.


> +  /* Careful not to end up with something like X - X, which could get
> +     simplified.  */
> +  if (!skip0 && already_protected (op1))
> we're already relying on RTL not optimizing (x + 0.5) - 0.5 but since
> that would involve association the simple X - X case might indeed
> be optimized (but wouldn't that be a bug if it is not correct?)

Indeed we do not currently simplify X-X without -ffinite-math-only. 
However, I am trying to be safe, and whether we can simplify or not is 
something that depends on each operation (what the pragma said at that 
point in the source code), while flag_finite_math_only is at best per 

Marc Glisse

More information about the Gcc-patches mailing list