This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Using function clones for Pointer Bounds Checker


On Tue, Jan 14, 2014 at 1:47 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 2014/1/14 Richard Biener <richard.guenther@gmail.com>:
>> On Tue, Jan 14, 2014 at 10:15 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>> Hi,
>>>
>>> I've been working for some time on the prototype of the Pointer Bounds
>>> Checker which uses function clones for instrumentation
>>> (http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03327.html). After
>>> several experiments with this approach I want to share my results and
>>> ask for some feedback to make a decision about the future steps.
>>>
>>> Firstly I want to remind the reasons for digging in this direction. In
>>> the original approach bounds of call arguments and input parameters
>>> are associated with arguments via special built-in calls. It creates
>>> implicit data flow compiler is not aware about which confuses some
>>> optimizations resulting in miss-optimization and breaks bounds data
>>> flow. Thus optimizations have to be fixed to get better pointers
>>> protection.
>>>
>>> Clones approach does not use special built-in function calls to
>>> associate bounds with call arguments and input parameters. Each
>>> function which should be instrumented gets an additional version and
>>> only this special version will be instrumented.This new version gets
>>> additional bound arguments to express input bounds. When function call
>>> is instrumented, it is redirected to instrumented version and all
>>> bounds are passed as explicit call arguments. Thus we have explicit
>>> pointer bounds flow similar to regular function parameters. It should
>>> allow to avoid changes in optimization, avoid miss-optimizations,
>>> allow existing IPA optimizations to work with bound args (e.g.
>>> propagate constant bounds value and remove checks in called function).
>>>
>>> I made a prototype implementation of this approach in the following way:
>>>
>>> - Add new IPA pass before early local passes to produce versions for
>>> all functions to be instrumented.
>>> - Put instrumentation pass after SSA pass.
>>> - Add new pass after IPA passes to remove bodies of functions which
>>> have instrumented versions. Function nodes may still be required for
>>> calls in not instrumented code. But we do not emit this code and
>>> therefore function bodies are not needed.
>>>
>>> Positive changes are:
>>>
>>> - IPA optimizations are not confused by bound parameters
>>> - bounds are now more like regular arguments; it makes their
>>> processing in expand easier
>>> - functions with bounds not attached to any pointer are allowed
>>
>> First of all thanks for trying to work in this direction.  Comments on the
>> issues you encountered below (also CCed Honza as he should be more
>> familiar with reachability and clone issues).
>>
>>> On simple codes this approach worked well but on a bigger tests some
>>> issues were revealed.
>>>
>>> 1. Nodes reachability. Instrumented version is actually always
>>> reachable when original function is reachable because it is always
>>> emitted instead of the original. Thus I had to fix reachability
>>> analysis to achieve it. Another similar problem is check whether node
>>> can be removed after inline when inlining instrumented function. Not
>>> hard to fix but probably other similar problems exist.
>>
>> I suppose you do not update the callgraph / the call stmts when
>> creating the clones?  Btw, is it desirable to inline the uninstrumented
>> function and then instrument the result (thus run cloning and
>> instrumentation after early inlining?)?  Especially handling always_inlines
>> before cloning/isntrumentation looks very sensible.
>
> Right. Created clones have the same code as the original function and
> therefore same cgraph edges. I suppose instrumentation after early
> inlining is OK and may be preferred because inline shouldn't lead to
> any losses of bounds information. I tried variant when instrumentation
> works right after early inlining but with cloning still before early
> local passes. In general it looked OK.
>
>>
>>> 2. Function processing order. Function processing order is determined
>>> before early local passes. But during function instrumentation call
>>> graph is modified significantly and used topological order becomes
>>> outdated. That causes some troubles. E.g. function marked as 'always
>>> inline' cannot be inlined because it is not in SSA form yet. Surely
>>> inlining problem may be solved by just putting instrumentation after
>>> early inline, but similar problem may exist in other passes too. To
>>> resolve this problem I tried to split early local passes into three
>>> parts. The first one builds SSA, the second one performs
>>> instrumentation, the last one does the rest. Each part is performed on
>>> all functions before the next one starts. Thus I get all functions in
>>> SSA form and all instrumentation performed before starting early
>>> optimizations. Unfortunately such passes order leads to invalid SSA
>>> because of local_pure_const optimization affecting callers correctness
>>> (in case caller SSA was built before optimization revealed 'pure' or
>>> 'const' flag).
>>
>> Generally the processing order of early_local_passes is chosen
>> to get better optimization - changing it shouldn't affect correctness
>> and thus the issues you observe demand fixing anyway.
>> (I've noted in the past that the early_local_passes processing order
>> should more explicitely honor callgraph SCCs, eventually even iterating).
>>
>> Moving SSA build out of the early_local_passes and into a
>> separate lowering stage is possible, the pure-const stuff is
>> handled by keeping pass_fixup_cfg where it is now I think.
>> In theory you can go into SSA form in all_lowering_passes
>> already (but you have to adjust some pieces dealing with
>> late created functions).
>
> Currently pass_fixup_cfg is called at the beginning of the early local
> passes and therefore function processing order mattes much here. If
> caller is processed before the callee marked as pure then there is no
> additional pass_fixup_cfg for the caller after pure flag for callee is
> set. Probably additional IPA pass is required after early_local_passes
> to fixup all functions?

You probably need an additional fixup_cfg pass somewhere.

>> Note that the idea of having stuff inside a single early-local-passes
>> pipeline is that it increases cache locality and reduces memory
>> usage by reclaiming functions no longer needed after inlining.
>>
>>> In general I feel that having special function version for
>>> instrumentation has a better potential, should lead to less intrusive
>>> changes in the compiler and better code quality.
>>>
>>> But before continue this implementation I would like to get some
>>> feedback and probably some advice on how to order passes to get the
>>> best result. Currently I incline to have all functions instrumented
>>> before any local optimizations and solve pure_const problem by
>>> modifying all callers when attribute is computed for some function.
>>
>> As noted above the pure_const proble is fixed by the fixup_cfg pass.
>>
>> Generally if instrumentation is not strictly required before inlining
>> then doing cloning and instrumentation as a "real" IPA pass would
>> be the way to go.  You'd create clones during IPA analysis,
>> fixup clones during IPA propagate (IPA inline should notify you
>> of any decisions it makes), and instantiate and instrument the
>> remaining required clones at IPA transform time.
>
> Instrumentation is not required before inlining but is required before
> any optimization pass. Optimizations may change pointers flow and
> therefore break bounds flow, resulting in possible false bound
> violations. It means instrumentation has to be performed before
> early_optimizations.

Ah, ok ...

> Thus I may split early_local_passes into three
> IPA passes:
>   1. build SSA + early_inline
>   2. Make instrumented versions + release bodies of original functions
> (for functions having instrumented version)
>   3. Perform early_optimizations.

That defeats the purpose and functioning of early inlining though.
early inlining wants the callee to have gone through all early local
passes - doing the above breaks this.

So I suppose your original ordering is the only sensible thing unless
somebody comes up with new great ideas ;)

(eventually inlining always-inline functions can be split off and done
earlier)

Richard.

>
> Thanks,
> Ilya
>
>>
>> Generally if doing it before early opts is not too intrusive making it
>> a real IPA pass later shouldn't be too difficult (heh ...).
>>
>> Honza may have more comments here, of course.
>>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> Ilya


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]