[tree-ssa] fold-const.c merge from mainline

Tue Oct 14 00:11:00 GMT 2003

> Roger -- did you get my message from about a week or so ago on this
> subject?

No, I didn't.  I was wondering why you were so silent :>

> Basically we broke fold into two pieces for two reasons:
>
>   1. To avoid the destructive behavior of fold.
>
>   2. Performance.  The only cases CCP cares about are when we can fold
>   something down to a constant.  Nothing else is interesting.  We don't
>   need the full blown folder for that.  In fact, using the full blown
>   folder gets to be rather expensive.

Excellent.

Firstly, you'll appreciate that since you broke fold into two pieces
on the tree-ssa branch, that the fold on mainline has been modified
such that it is no longer destructive and the input tree is considered
immutable.  Indeed, there's now even an --enable-checking="fold" mode,
where the tree passed to fold is hashed both before and afterwards to
ensure that nothing has changed.

Can we now assume that fold is non-destructive, and if not any remaining
tree modifying transformations are bugs?  If you like I could prepare a
patch to the comment above "fold" to document these new requirements.

Then all that remains is the "Performance" argument, to which there
are three strong counter arguments.

(1) Whilst it is true that CCP only cares about cases when we can fold
down to a constant, it is frequently the case that fold reduces an
tree expression to a constant but in a number of steps.  For example,
"(x + 3) - x" can be reduced by "fold" via the recursive call to
"(x - x) + 3".  Unfortunately, nondestructive_fold_binary_to_constant
on the tree-ssa branch isn't able to do this.

(2) I strongly doubt that performance is an issue.  Although, "fold"
as a function is large, it is principally a single switch statement
meaning that the only code executed are the transformations applicable
to the current TREE_CODE.  I'll agree there's plenty that could be
done to manually factor tests [Indeed my original motivation for GCC's
jump bypassing pass was to optimize control flow through fold].

However, as with all good discussions on performance, the ultimate
arbiter is experimental measurement.  At the time you decided to split
fold, its destructive nature mean't that it was impossible to assess
overhead.  However, it should now be possible to time a tree-ssa
variant that implements nondestructive_fold_* as wrappers around fold.

(3) But the most significant problem I see with the approach taken on
the tree-ssa branch is the duplication of tree-level constant folding
logic.  For example, plans of mine to add constant folding support
for "local" range propagation, "((unsigned int)x | 8) > 2" will either
be useless to SSA optimizers (such as ssa-vrp!) or duplicated.
As an example of this leakage look at the many constant cases returned
by fold_inf_compare that are unreachable from either of the new functions
fold_relational_hi_lo or fold_relational_const.

To summarise, if we can confirm that there isn't a significant performance
overhead to using "full blown fold" (that can't be fixed), the improved
level of optimization and significantly reduced code duplication will
produce a better and much easier to maintain compiler.

As a starting point it might be worthwhile getting a timevar for fold
so we can determine exactly how much of expand's time is spent there.
Unlike cse_insn, it often doesn't even show up on GCC's hot function
lists.

Roger
--