I see patch one as the mosts important - maybe not the approach, but
the result. I.e. that we remove the abstraction penalty. I tried to do
this by only mucking with the RETURN_EXPR handling, but did not
succeed, because we have an extra copy here in the caller, too. I tried
to account for this, but that didn't work. Ignoring nodes that don't result
in code generation (aka, effectively counting what TER would give us
before RTL expansion) did produce size estimates more similar to the 3.4
ones, and - magically - solved the abstraction problem. "Magically"
to me - but it looks appealing because of the simplicity of the patch.