Its disappointing the way that we now have to speculatively construct
trees (representing the CALL_EXPRs) to test whether a built-in function
can be folded. One of the recent pushes with tree-level optimizers has
been to move towards APIs that allow us to fold/simplify them without
having to explicitly construct them first, and thereby leak memory to
the garbage collector.
I suspect a more agressive restructuring of this code, introducing new
APIs, such as fold_unary_builtin, fold_binary_builtin, etc... where we
explicitly pass arg0, arg1 ... argN in the common cases (most builtins
don't have a variable number of arguments, and often only one or two
operands/arguments) would address the majority of the memory regression
incurred by this change.