Fri Aug 7 12:01:55 GMT 2020
On Fri, Aug 7, 2020 at 12:45 PM Marc Glisse <firstname.lastname@example.org> wrote:
> Thank you for your comments.
> On Fri, 7 Aug 2020, Richard Biener wrote:
> >> Conversions look like
> >> .FENV_CONVERT (arg, (target_type*)0, 0)
> >> the pointer is there so we know the target type, even if the lhs
> >> disappears at some point. The last 0 is the same as for all the others, a
> >> place to store options about the operation (do we care about rounding,
> >> about exceptions, etc), it is just a placeholder for now. I could rename
> >> it to .FENV_NOP since we seem to generate NOP usually, but it looked
> >> strange to me.
> > You could carry the info in the existing flags operand if you make that a
> > pointer ...
> Ah, true, I forgot that some other trees already use this kind of trick.
> Not super pretty, but probably better than an extra argument.
> > Adding some info missing above from reading the patch.
> > The idea seems to be to turn FP operations like PLUS_EXPR, FLOAT_EXPR
> > but also (only?) calls to BUILT_IN_SQRT to internal functions named
> > IFN_FENV_* where the internal function presumably has some extra
> > information.
> Sqrt does seem to have a special place in IEEE 754, and in practice some
> targets have instructions (with rounding) for it.
> > You have
> > +/* float operations with rounding / exception flags. */
> > +DEF_INTERNAL_FN (FENV_PLUS, ECF_LEAF | ECF_NOTHROW, NULL)
> > +DEF_INTERNAL_FN (FENV_MINUS, ECF_LEAF | ECF_NOTHROW, NULL)
> > +DEF_INTERNAL_FN (FENV_MULT, ECF_LEAF | ECF_NOTHROW, NULL)
> > +DEF_INTERNAL_FN (FENV_RDIV, ECF_LEAF | ECF_NOTHROW, NULL)
> > +DEF_INTERNAL_FN (FENV_FLOAT, ECF_LEAF | ECF_NOTHROW, NULL)
> > +DEF_INTERNAL_FN (FENV_CONVERT, ECF_LEAF | ECF_NOTHROW, NULL)
> > +DEF_INTERNAL_FN (FENV_SQRT, ECF_LEAF | ECF_NOTHROW, NULL)
> > so with -fnon-call-exceptions they will not be throwing (but regular
> > FP PLUS_EXPR would).
> Hmm, ok, I guess I should remove ECF_NOTHROW then, the priority should
> be to be correct, we can carefully reintroduce optimizations later.
> > They will appear to alter memory state - that's probably to have the
> > extra dependence on FENV changing/querying operations but then why do you
> > still need to emit asm()s?
> The IFNs are for GIMPLE and represent the operations, while the asm are
> simple passthrough for RTL, I replace the first with the second (plus the
> regular operation) at expansion.
> > I suppose the (currently unused) flags parameter could be populated with
> > some known FP ENV state and then limited optimization across stmts
> > with the same non-zero state could be done?
> I was mostly thinking of storing information like:
> * don't care about the rounding mode for this operation
> * may drop exceptions produced by this operation
> * may produce extra exceptions
> * don't care about signed zero
> * may contract into FMA
> * don't care about errno (for sqrt?)
So we could leverage the same mechanism for inlining a non-ffast-math
function into a -ffast-math function, rewriting operations to IFNs? Though
the resulting less optimization might offset any benefit we get from the
inlining... At least the above list somewhat suggests it want's to
capture the various -f*-math options.
> With fenv_round, we would actually have to store the rounding mode of
> the operation (upward, towards-zero, dynamic, don't-care, etc), a bit
> less nice because 0 is not a safe fallback anymore. We could also store
> it when we detect a call to fesetround before, but we have to be careful
> that this doesn't result in even more calls to fesetround at expansion
> for targets that do not have statically rounded operations.
> If there are other, better things to store there, great.
> > Using internal function calls paints us a bit into a corner since they are still
> > subject to the single-SSA def restriction in case we'd want to make FENV
> > dataflow more explicit. What's the advantage of internal functions compared
> > to using asms for the operations themselves if we wrap this class into
> > a set of "nicer" helpers?
> I wanted the representation on gimple to look a bit nice so it would be
> both easy to read in the dumps, and not too hard to write optimizations
> for, and a function call looked good enough. Making FENV dataflow explicit
> would mean having PHIs for FENV, etc? At most I thought FENV would be
> represented by one specific memory region which would not alias user
> variables of type float or double, in particular.
Uh, yeah - didn't think of PHIs.
> I don't really see what it would look like with asms and helpers. In some
> sense, the IFNs are already wrappers, that we unwrap at expansion. Your
> asms would take some FENV as intput and output, so we have to track what
> FENV to use where, similar to .MEM.
True. The main advantage (if it is any...) would be it not be .MEM
> > One complication with tracking data-flow is "unknown" stuff, I'd suggest
> > to invent a mediator between memory state and FP state which would
> > semantically be load and store operations of the FP state from/to memory.
> All I can think of is make FP state a particular variable in memory, and
> teach alias analysis that those functions only read/write to this
> variable. What do you have in mind, splitting operations as:
> fenv0 = read_fenv()
> (res, fenv1) = oper(arg0, arg1, fenv0)
> so that "oper" itself is const? (and hopefully simplify consecutive
> read_fenv/store_fenv so there are fewer of them) I wonder if lying about
> the constness of the operation may be problematic.
Kind-of. I thought to do this around "unknown" operations like function
fenv0 = read_fenv();
because all functions can set/query the FP state. You get this
implicitely by tracking FENV as memory itself.
> (and asm would be abused as a way to return a pair, with hopefully some
> marker so we know it isn't a real asm)
Yeah, I still have the idea to eventually do some of the RTL expansion on
GIMPLE by having GIMPLE asm()s with the actual assembly be a reference
to the define_insns or so (thus an enum). For our case it would be just
PLUS_EXPR or maybe even add2sf.
We've used _Complex to have two-valued returns but obviously the
FP stuff should also work for _Complex FP types and having the FENV
state as 'float' is quite ugly. Which means we would need to invent
some way to have a SSA name be a tuple of something. Maybe useful
elsewhere as well.
> > That said, you're the one doing the work and going with internal functions
> > is reasonable - I'm not sure to what extent optimization for FENV acccess
> > code will ever be possible (or wanted/expected). So going more precise
> > might not have any advantage.
> I think some optimizations are expected. For instance, not having to
> re-read the same number from memory many times just because there was an
> addition in between (which could write to fenv but that's it). Some may
> still want FMA (with a consistent rounding direction). For those (like me)
> who usually only care about rounding and not exceptions, making the
> operations pure would be great, and nothing says we cannot vectorize those
> rounded operations!
> I am trying to be realistic with what I can achieve, but if you think the
> IFNs would paint us into a corner, then we can drop this approach.
Well, I'm not sure - both asm() and IFNs are a separate class of what passes
are used to see (assigns). But indeed passes know how to handle function
calls, no pass knows how to handle asms.
I guess there's nothing else but to try ...
Suppose for example you have
_3 = .IFN_PLUS (_1, _2, 0);
_4 = .IFN_PLUS (_1, _2, 0);
the first plus may alter FP state (set inexact) but since the second plus
computes the same value we'd want to elide it(?). Now if there's a
feclearexcept() inbetween we can't elide it - and that works as proposed
because the memory state is inspected by feclearexcept(). But I can't
see how we can convince FRE that we can elide the second plus when
both are modifying memory. There's no such thing currently as
effects on memory state only depend on arguments. I _think_ we don't
have to say the mem out state depends on the mem in state (FP ENV),
well - it does, but the difference only depends on the actual arguments.
That said, tracking FENV together with memory will complicate things
but explicitely tracking an (or multiple?) extra FP ENV register input/output
makes the problem not go away (the second plus still has the mutated
FP ENV from the first plus as input). Instead we'd have to separately
track the effect of a single operation and the overall FP state, like
(_3, flags_5) = .IFN_PLUS (_1, _2, 0);
fpexstate = merge (flags_5, fpexstate);
(_4, flags_6) = .IFN_PLUS (_1, _2, 0);
fpexstate = merge (flage_6, fpexstate);
or so and there we can CSE. We have to track exception state separately
from the FP control word for rounding-mode for this to work. Thus when
we're not interested in the exception state then .IFN_PLUS would be 'pure'
(only dependent on the FP CW)?
So I guess we should think of somehow separating rounding mode tracking
and exception state? If we make the functions affect memory anyway
we can have the FP state reg(s) modeled explicitely with a fake decl(s) and pass
that by reference to the IFNs? Then we can make use of the "fn spec" attribute
to tell which function reads/writes which reg. Across unknown functions we'd
then have to use the store/load "trick" to merge them with the global
> > You needed to guard SQRT - will you need to guard other math functions?
> > (round, etc.)
> Maybe, but probably not many. I thought I might have to guard all of them
> (sin, cos, etc), but IIRC Joseph's comment seemed to imply that this
> wouldn't be necessary. I am likely missing FMA now...
Ah, as far as correctness is concerned. Yes, you're probably right.
> > If we need to keep the IFNs use memory state they will count towards
> > walk limits of the alias oracle even if they can be disambiguated against.
> > This will affect both compile-time and optimizations.
> > + /* Careful not to end up with something like X - X, which could get
> > + simplified. */
> > + if (!skip0 && already_protected (op1))
> > we're already relying on RTL not optimizing (x + 0.5) - 0.5 but since
> > that would involve association the simple X - X case might indeed
> > be optimized (but wouldn't that be a bug if it is not correct?)
> Indeed we do not currently simplify X-X without -ffinite-math-only.
> However, I am trying to be safe, and whether we can simplify or not is
> something that depends on each operation (what the pragma said at that
> point in the source code), while flag_finite_math_only is at best per
OK, fair enough.
Sorry for the incoherent brain-dump above ;)
> Marc Glisse
More information about the Gcc-patches