Add support to trace comparison instructions and switch statements

Sat Jul 15 07:43:00 GMT 2017

On Sat, Jul 15, 2017 at 9:21 AM, 吴潍浠(此彼) <weixi.wwx@antfin.com> wrote:
> Hi
>
> Implementing __sanitizer_cov_trace_cmp[1248]_const is OK .
> And I will try to find some determinate way to judge this comparison is for loop or not.
> Because all the loops(for() or while()) seem to be transformed to "if" and "goto" before running sancov pass.
> Does it necessary to include APIs for float and double comparison ? like __sanitizer_cov_trace_cmpf(float, float)
> Why you do not include them ?

Note you don't need to implement any of this right now, since we can
make it [almost] backwards compatible. Unless, of course, it's simple
and you find it useful and want to do this.
I think it's better to get a first version of the change in as is
(implement current clang API), and then do improvements in subsequent
patches. I would expect that detecting consts should be simple. Re
loops, I don't know, I would expect that gcc has some existing
analysis for loop (it does lots of optimization with counting loop).
Re floats/doubles, I am not sure, we managed to live without them in
go-fuzz and then in libfuzzer, and I still don't see lots of value in
them. Anyway, it should be a separate patch.

> Now, I am following some suggests about my codes from Gribow. Final patch is still on processing.
> I am also waiting for copyright assignment of FSF which is requested by Jeff.
>
> With Regards
> Wish Wu
>
> ------------------------------------------------------------------
> From:Dmitry Vyukov <dvyukov@google.com>
> Time:2017 Jul 15 (Sat) 13:41
> To:Kostya Serebryany <kcc@google.com>
> Cc:Wish Wu <wishwu007@gmail.com>; gcc <gcc@gcc.gnu.org>; gcc-patches <gcc-patches@gcc.gnu.org>; Wish Wu <weixi.wwx@antfin.com>; Alexander Potapenko <glider@google.com>; andreyknvl <andreyknvl@google.com>; Victor Chibotaru <tchibo@google.com>; Yuri Gribov <tetra2005@gmail.com>
> Subject:Re: Add support to trace comparison instructions and switch statements
>
>
> On Fri, Jul 14, 2017 at 11:17 PM, Kostya Serebryany <kcc@google.com> wrote:
>>>>> > Hi
>>>>> >
>>>>> > I wrote a test for "-fsanitize-coverage=trace-cmp" .
>>>>> >
>>>>> > Is there anybody tells me if these codes could be merged into gcc ?
>>>>>
>>>>>
>>>>> Nice!
>>>>>
>>>>> We are currently working on Linux kernel fuzzing that use the
>>>>> comparison tracing. We use clang at the moment, but having this
>>>>> support in gcc would be great for kernel land.
>>>>>
>>>>> One concern I have: do we want to do some final refinements to the API
>>>>> before we implement this in both compilers?
>>>>>
>>>>> 2 things we considered from our perspective:
>>>>>  - communicating to the runtime which operands are constants
>>>>>  - communicating to the runtime which comparisons are counting loop checks
>>>>>
>>>>> First is useful if you do "find one operand in input and replace with
>>>>> the other one" thing. Second is useful because counting loop checks
>>>>> are usually not useful (at least all but one).
>>>>> In the original Go implementation I also conveyed signedness of
>>>>> operands, exact comparison operation (<, >, etc):
>>>>> https://github.com/dvyukov/go-fuzz/blob/master/go-fuzz-defs/defs.go#L13
>>>>> But I did not find any use for that.
>>>>> I also gave all comparisons unique IDs:
>>>>> https://github.com/dvyukov/go-fuzz/blob/master/go-fuzz-dep/sonar.go#L24
>>>>> That turned out to be useful. And there are chances we will want this
>>>>> for C/C++ as well.
>>>>>
>>>>> Kostya, did anything like this pop up in your work on libfuzzer?
>>>>> Can we still change the clang API? At least add an additional argument
>>>>> to the callbacks?
>>>>
>>>>
>>>> I'd prefer not to change the API, but extend it (new compiler flag, new
>>>> callbacks), if absolutely needed.
>>>> Probably make it trace-cmp-guard (similar to trace-pc-guard, with an extra
>>>> parameter that has the ID).
>>>> I don't like the approach with compiler-generated constant IDs.
>>>
>>> Yes, if we do it for C/C++, we need to create globals and pass pointer
>>> to a global to the callbacks. IDs do not work for C/C++.
>>>
>>>> Yes, it's a bit more efficient, but much less flexible in presence of
>>>> multiple modules, DSOs, dlopen, etc.
>>>>
>>>> I was also looking at completely inlining this instrumentation because it's
>>>> pretty expensive even in it's current form
>>>> (adding more parameters will make things worse).
>>>> This is going to be much less flexible, of course, so I'll attack it only
>>>> once I settle on the algorithm to handle CMPs in libFuzzer.
>>>
>>> This will require a new, completely different API for
>>> compiler<->runtime anyway, so we can put this aside for now.
>>>
>>>
>>>>> At the very least I would suggest that we add an additional arg that
>>>>> contains some flags (1/2 arg is a const, this is counting loop check,
>>>>> etc). If we do that we can also have just 1 callback that accepts
>>>>> uint64's for args because we can pass operand size in the flags:
>>>>
>>>>
>>>> How many flag combinations do we need and do we *really* need them?
>>>>
>>>> If the number of flag combinations is small, I'd prefer to have separate
>>>> callbacks (__sanitizer_cov_trace_cmp_loop_bound ?)
>>>>
>>>> Do we really need to know that one arg is a const?
>>>> It could well be a constant in fact, but compiler won't see it.
>>>>
>>>> int foo(int n) {   ... if (i < n) ... }
>>>> ...
>>>> foo(42);  // not inlined.
>>>>
>>>> We need to handle both cases the same way.
>>>
>>>
>>> Well, following this line of reasoning we would need only
>>> __asan_load/storeN callbacks for asan and remove
>>> __asan_load/store1/2/4/8, because compiler might not know the size at
>>> compile time. Constant-ness is only an optimization. If compiler does
>>> not know that something is a const, fine. Based on my experience with
>>> go-fuzz and our early experience with kernel, we badly need const
>>> hint.
>>> Otherwise fuzzer generates gazillions of candidates based on
>>> comparison arguments. Note that kernel is several order of magnitude
>>> larger than what people usually fuzz in user-space, inputs are more
>>> complex and at the same time execution speed is several order of
>>> magnitude lower. We can't rely on raw speed.
>>>
>>> Thinking of this more, I don't thing that globals will be useful in
>>> the kernel context (the main problem is that we have multiple
>>> transient isolated kernels). If we track per-comparison site
>>> information, we will probably use PCs. So I am ready to give up on
>>> this.
>>>
>>> Both of you expressed concerns about performance. Kostya says we
>>> should not break existing clang API.
>>> If we limit this to only constant-ness, then I think we can make this
>>> both forward and backward compatible, which means we don't need to
>>> handle it now. E.g. we can:
>>>  - if both operands are const (if it's possible at all), don't emit any callback
>>>  - if only one is const, emit __sanitizer_cov_trace_cmp_const1 and
>>> pass the const in a known position (i.e. always first/second arg)
>>>  - if none are const, emit callback __sanitizer_cov_trace_cmp_dyn1
>>> Then compiler emits weak aliases form
>>> __sanitizer_cov_trace_cmp_const/dyn1 to the old
>>> __sanitizer_cov_trace_cmp1, which makes it backwards compatible.
>>> New runtimes that implement __sanitizer_cov_trace_cmp_const/dyn1, also
>>> need to provide the old __sanitizer_cov_trace_cmp1 for old compilers.
>>
>> SGTM++
>>
>> Although, maybe we can actually break the API this way to keep things
>> simpler (no weak aliases)
>> * leave __sanitizer_cov_trace_cmp[1248] for general case
>> * add __sanitizer_cov_trace_cmp[1248]_const for constant in the 2-nd parameter
>> * add __sanitizer_cov_trace_cmp[1248]_loop for loop bound in the 2nd parameter
>
> This would work for us.
>
>> * do we need __sanitizer_cov_trace_cmp[1248]_const_loop ?
>
> We don't know yet. Potentially yes, because const is reading/writing
> to a static buffer, while non-const is reading/writing to a buffer
> with user-provided size.