Local optimization options
Mon Jul 6 07:42:52 GMT 2020
On Sun, Jul 5, 2020 at 4:37 PM Marc Glisse <email@example.com> wrote:
> On Sun, 5 Jul 2020, Thomas König wrote:
> >> Am 04.07.2020 um 19:11 schrieb Richard Biener <firstname.lastname@example.org>:
> >> On July 4, 2020 11:30:05 AM GMT+02:00, "Thomas König" <email@example.com> wrote:
> >>> What could be a preferred way to achieve that? Could optimization
> >>> options like -ffast-math be applied to blocks instead of functions?
> >>> Could we set flags on the TREE codes to allow certain optinizations?
> >>> Other things?
> >> The middle end can handle those things on function granularity only.
> >> Richard.
> > OK, so that will not work (or not without a disproportionate
> > amount of effort). Would it be possible to set something like a
> > TREE_FAST_MATH flag on TREEs? An operation could then be
> > optimized according to these rules iff both operands
> > had that flag, and would also have it then.
> In order to support various semantics on floating point operations, I was
> planning to replace some trees with internal functions, with an extra
> operand to specify various behaviors (rounding, exception, etc). Although
> at least in the beginning, I was thinking of only using those functions in
> safe mode, to avoid perf regressions.
Note this tackles the dependency on fesetround and friends which is
of course another issue (tracking FP control and exception state).
> This may never happen now, but it sounds similar to setting flags like
> TREE_FAST_MATH that you are suggesting. I was going with functions for
> more flexibility, and to avoid all the existing assumptions about trees.
> While I guess for fast-math, the worst the assumptions could do is clear
> the flag, which would make use optimize less than possible, not so bad.
Indeed going with tree/gimple stmt flags or alternate tree codes
(PLUS_NONTRAP_EXPR?) isn't likely to scale for the myriads of
FP behavior controls we have. So using an internal function sounds
reasonable though, given your referenced patch above, one might
want to think about that extra input (FP env) and output (FP state)
those functions will have as well. Also extracting the important
bits from "fast-math" and thorougly documenting semantics of
what flags we use would be required.
To prevent too many bad effects on optimization one might think
of using regular PLUS_EXPR when global flags match the
specific ones on a internal-function ...
Btw, instead of using the _Complex and __real/__imag trick
for multiple defs we might want to go with more general SSA projections
or allow multiple defs on functions at least.
> Marc Glisse
More information about the Gcc