[RFC] Using function clones for Pointer Bounds Checker
Tue Jan 14 14:36:00 GMT 2014
2014/1/14 Richard Biener <firstname.lastname@example.org>:
> On Tue, Jan 14, 2014 at 1:47 PM, Ilya Enkovich <email@example.com> wrote:
>> 2014/1/14 Richard Biener <firstname.lastname@example.org>:
>>> On Tue, Jan 14, 2014 at 10:15 AM, Ilya Enkovich <email@example.com> wrote:
>>>> I've been working for some time on the prototype of the Pointer Bounds
>>>> Checker which uses function clones for instrumentation
>>>> (http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03327.html). After
>>>> several experiments with this approach I want to share my results and
>>>> ask for some feedback to make a decision about the future steps.
>>>> Firstly I want to remind the reasons for digging in this direction. In
>>>> the original approach bounds of call arguments and input parameters
>>>> are associated with arguments via special built-in calls. It creates
>>>> implicit data flow compiler is not aware about which confuses some
>>>> optimizations resulting in miss-optimization and breaks bounds data
>>>> flow. Thus optimizations have to be fixed to get better pointers
>>>> Clones approach does not use special built-in function calls to
>>>> associate bounds with call arguments and input parameters. Each
>>>> function which should be instrumented gets an additional version and
>>>> only this special version will be instrumented.This new version gets
>>>> additional bound arguments to express input bounds. When function call
>>>> is instrumented, it is redirected to instrumented version and all
>>>> bounds are passed as explicit call arguments. Thus we have explicit
>>>> pointer bounds flow similar to regular function parameters. It should
>>>> allow to avoid changes in optimization, avoid miss-optimizations,
>>>> allow existing IPA optimizations to work with bound args (e.g.
>>>> propagate constant bounds value and remove checks in called function).
>>>> I made a prototype implementation of this approach in the following way:
>>>> - Add new IPA pass before early local passes to produce versions for
>>>> all functions to be instrumented.
>>>> - Put instrumentation pass after SSA pass.
>>>> - Add new pass after IPA passes to remove bodies of functions which
>>>> have instrumented versions. Function nodes may still be required for
>>>> calls in not instrumented code. But we do not emit this code and
>>>> therefore function bodies are not needed.
>>>> Positive changes are:
>>>> - IPA optimizations are not confused by bound parameters
>>>> - bounds are now more like regular arguments; it makes their
>>>> processing in expand easier
>>>> - functions with bounds not attached to any pointer are allowed
>>> First of all thanks for trying to work in this direction. Comments on the
>>> issues you encountered below (also CCed Honza as he should be more
>>> familiar with reachability and clone issues).
>>>> On simple codes this approach worked well but on a bigger tests some
>>>> issues were revealed.
>>>> 1. Nodes reachability. Instrumented version is actually always
>>>> reachable when original function is reachable because it is always
>>>> emitted instead of the original. Thus I had to fix reachability
>>>> analysis to achieve it. Another similar problem is check whether node
>>>> can be removed after inline when inlining instrumented function. Not
>>>> hard to fix but probably other similar problems exist.
>>> I suppose you do not update the callgraph / the call stmts when
>>> creating the clones? Btw, is it desirable to inline the uninstrumented
>>> function and then instrument the result (thus run cloning and
>>> instrumentation after early inlining?)? Especially handling always_inlines
>>> before cloning/isntrumentation looks very sensible.
>> Right. Created clones have the same code as the original function and
>> therefore same cgraph edges. I suppose instrumentation after early
>> inlining is OK and may be preferred because inline shouldn't lead to
>> any losses of bounds information. I tried variant when instrumentation
>> works right after early inlining but with cloning still before early
>> local passes. In general it looked OK.
>>>> 2. Function processing order. Function processing order is determined
>>>> before early local passes. But during function instrumentation call
>>>> graph is modified significantly and used topological order becomes
>>>> outdated. That causes some troubles. E.g. function marked as 'always
>>>> inline' cannot be inlined because it is not in SSA form yet. Surely
>>>> inlining problem may be solved by just putting instrumentation after
>>>> early inline, but similar problem may exist in other passes too. To
>>>> resolve this problem I tried to split early local passes into three
>>>> parts. The first one builds SSA, the second one performs
>>>> instrumentation, the last one does the rest. Each part is performed on
>>>> all functions before the next one starts. Thus I get all functions in
>>>> SSA form and all instrumentation performed before starting early
>>>> optimizations. Unfortunately such passes order leads to invalid SSA
>>>> because of local_pure_const optimization affecting callers correctness
>>>> (in case caller SSA was built before optimization revealed 'pure' or
>>>> 'const' flag).
>>> Generally the processing order of early_local_passes is chosen
>>> to get better optimization - changing it shouldn't affect correctness
>>> and thus the issues you observe demand fixing anyway.
>>> (I've noted in the past that the early_local_passes processing order
>>> should more explicitely honor callgraph SCCs, eventually even iterating).
>>> Moving SSA build out of the early_local_passes and into a
>>> separate lowering stage is possible, the pure-const stuff is
>>> handled by keeping pass_fixup_cfg where it is now I think.
>>> In theory you can go into SSA form in all_lowering_passes
>>> already (but you have to adjust some pieces dealing with
>>> late created functions).
>> Currently pass_fixup_cfg is called at the beginning of the early local
>> passes and therefore function processing order mattes much here. If
>> caller is processed before the callee marked as pure then there is no
>> additional pass_fixup_cfg for the caller after pure flag for callee is
>> set. Probably additional IPA pass is required after early_local_passes
>> to fixup all functions?
> You probably need an additional fixup_cfg pass somewhere.
>>> Note that the idea of having stuff inside a single early-local-passes
>>> pipeline is that it increases cache locality and reduces memory
>>> usage by reclaiming functions no longer needed after inlining.
>>>> In general I feel that having special function version for
>>>> instrumentation has a better potential, should lead to less intrusive
>>>> changes in the compiler and better code quality.
>>>> But before continue this implementation I would like to get some
>>>> feedback and probably some advice on how to order passes to get the
>>>> best result. Currently I incline to have all functions instrumented
>>>> before any local optimizations and solve pure_const problem by
>>>> modifying all callers when attribute is computed for some function.
>>> As noted above the pure_const proble is fixed by the fixup_cfg pass.
>>> Generally if instrumentation is not strictly required before inlining
>>> then doing cloning and instrumentation as a "real" IPA pass would
>>> be the way to go. You'd create clones during IPA analysis,
>>> fixup clones during IPA propagate (IPA inline should notify you
>>> of any decisions it makes), and instantiate and instrument the
>>> remaining required clones at IPA transform time.
>> Instrumentation is not required before inlining but is required before
>> any optimization pass. Optimizations may change pointers flow and
>> therefore break bounds flow, resulting in possible false bound
>> violations. It means instrumentation has to be performed before
> Ah, ok ...
>> Thus I may split early_local_passes into three
>> IPA passes:
>> 1. build SSA + early_inline
>> 2. Make instrumented versions + release bodies of original functions
>> (for functions having instrumented version)
>> 3. Perform early_optimizations.
> That defeats the purpose and functioning of early inlining though.
> early inlining wants the callee to have gone through all early local
> passes - doing the above breaks this.
> So I suppose your original ordering is the only sensible thing unless
> somebody comes up with new great ideas ;)
Thanks for your comments!
I'll continue my experiments with with my initial early_local_passes
splitting. Will just put there original functions bodies release to
avoid overhead for their useless optimizations. So, it will be 3 IPA
1. SSA build
2. Make instrumented versions + release bodies of original functions
3. early_inline + early optimizations.
Will be back with more results.
> (eventually inlining always-inline functions can be split off and done
>>> Generally if doing it before early opts is not too intrusive making it
>>> a real IPA pass later shouldn't be too difficult (heh ...).
>>> Honza may have more comments here, of course.
More information about the Gcc-patches