This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Frame pointer for arm with THUMB2 mode


Hi Wilco,
thanks for the answer.

> Adding support for a frame chain would require an ABI change. It would have to
> work across GCC, LLVM, Arm, Thumb-1 and Thumb-2 - not a trivial amount of
> effort.

Clang already works that way.
Please look at this commit:
http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/ARM/ARMCallingConv.td?r1=269459&r2=269458&pathrev=269459

This is an example of code which clang generates
with -mthumb -fno-omit-frame-pointer -O2.

 @ %bb.0:
         push    {r4, r5, r6, r7, lr}
         add     r7, sp, #12
         push.w  {r8, r9, r10}
         sub     sp, #56
         mov     r8, r0
         movw    r0, :lower16:_ZTVN6sensor14sensor_handlerE
         movt    r0, :upper16:_ZTVN6sensor14sensor_handlerE
         mov     r9, r8
         adds    r0, #8
         str     r0, [r9], #4

The only difference is clang sets frame pointer to the r7 on the stack instead gcc sets to lr,
but it already handles by the Sanitizers.

>
> So this sounds like the first thing to do is reducing the size of the stack traces. > The default is 30 which is far larger than useful. Using 1 for example should > always be fast (since you can use __builtin_return_address(0)) and still get > the function that called malloc. Also if the unwinder happens to be too slow,
> it should be optimized and caching added etc.
>

If we change the size of the traces to 2, it could be something like this:

0xb42a50a0 is located 0 bytes inside of 88-byte region [0xb42a50a0,0xb42a50f8)
freed by thread T0 here:
    #0 0xb6a35cc7 in __interceptor_free (/usr/lib/libasan.so+0x127cc7)
#1 0xb5fa64e3 in ipc::message::unref() (/lib/libsensord-shared.so+0xe4e3)

previously allocated by thread T0 here:
    #0 0xb6a36157 in malloc (/usr/lib/libasan.so+0x128157)
#1 0xb2f8852d in accel_device::get_data(unsigned int, sensor_data_t**, int*) (/usr/lib/sensor/libsensor-hal-tm1.so+0x1052d)

Instead this:

0xb42250a0 is located 0 bytes inside of 88-byte region [0xb42250a0,0xb42250f8)
freed by thread T0 here:
    #0 0xb6989cff in __interceptor_free (/usr/lib/libasan.so+0x127cff)
#1 0xb5fa64e3 in ipc::message::unref() (/lib/libsensord-shared.so+0xe4e3) #2 0xb6efbf47 in sensor::sensor_handler::notify(char const*, sensor_data_t*, int) (/usr/bin/sensord+0x1ef47) #3 0xb6efad43 in sensor::sensor_event_handler::handle(int, unsigned int) (/usr/bin/sensord+0x1dd43)
    #4 0xb5fa3bbb  (/lib/libsensord-shared.so+0xbbbb)
#5 0xb62d9a15 in g_main_context_dispatch (/lib/libglib-2.0.so.0+0x91a15)
    #6 0xb62da2d9  (/lib/libglib-2.0.so.0+0x922d9)
    #7 0xb62da9a9 in g_main_loop_run (/lib/libglib-2.0.so.0+0x929a9)
#8 0xb5fa4e1b in ipc::event_loop::run(int) (/lib/libsensord-shared.so+0xce1b)
    #9 0xb6eec9a5 in main (/usr/bin/sensord+0xf9a5)
    #10 0xb5cb663b in __libc_start_main (/lib/libc.so.6+0x1663b)

previously allocated by thread T0 here:
    #0 0xb698a18f in malloc (/usr/lib/libasan.so+0x12818f)
#1 0xb2f8852d in accel_device::get_data(unsigned int, sensor_data_t**, int*) (/usr/lib/sensor/libsensor-hal-tm1.so+0x1052d) #2 0xb6ef848f in sensor::physical_sensor_handler::get_data(sensor_data_t**, int*) (/usr/bin/sensord+0x1b48f) #3 0xb6efaa51 in sensor::sensor_event_handler::handle(int, unsigned int) (/usr/bin/sensord+0x1da51)
    #4 0xb5fa3bbb  (/lib/libsensord-shared.so+0xbbbb)
#5 0xb62d9a15 in g_main_context_dispatch (/lib/libglib-2.0.so.0+0x91a15)
    #6 0xb62da2d9  (/lib/libglib-2.0.so.0+0x922d9)
    #7 0xb62da9a9 in g_main_loop_run (/lib/libglib-2.0.so.0+0x929a9)
#8 0xb5fa4e1b in ipc::event_loop::run(int) (/lib/libsensord-shared.so+0xce1b)
    #9 0xb6eec9a5 in main (/usr/bin/sensord+0xf9a5)
    #10 0xb5cb663b in __libc_start_main (/lib/libc.so.6+0x1663b)


At the first example we lost the full context, from where the control/data flow comes from.

>
> The issue is that the frame pointer and frame chain always add a large
> overhead even when you do not use any sanitizers. This is especially bad
> for the proposed patch - you lose much of the benefit of using Thumb-2...
>

The stack layout like this enables only with compile time flag (-mthumb-fp and works only together with -mthumb and
-fno-omit-frame-pointer). It does not affect other codegen.

Thanks.

On 09/05/2018 03:11 PM, Wilco Dijkstra wrote:
Hi Denis,

We are working on applying Address/LeakSanitizer for the full Tizen OS
distribution. It's about ~1000 packages, ASan/LSan runtime is installed
to ld.so.preload. As we know ASan/LSan has interceptors for
allocators/deallocators such as (malloc/realloc/calloc/free) and so on.
On every allocation from user space program, ASan calls
GET_STACK_TRACE_MALLOC;
which unwinds the stack frame, and by default uses frame based stack
unwinder. So, it requires to build with "-fno-omit-frame-pointer",
switching it to default unwinder really hits the performance in our case.

So this sounds like the first thing to do is reducing the size of the stack traces.
The default is 30 which is far larger than useful. Using 1 for example should
always be fast (since you can use __builtin_return_address(0)) and still get
the function that called malloc. Also if the unwinder happens to be too slow,
it should be optimized and caching added etc.

Doing real unwinding is also far more accurate than frame pointer based
unwinding (the latter doesn't handle leaf functions correctly,
entry/exit in
non-leaf functions and shrinkwrapped functions - and this breaks
callgraph
profiling).

I agree, but in our case, all interceptors for allocators are
leaf functions, so the frame based stack unwinder works well for us.

Yes a frame chain would work for this case. But it's not currently supported.

By default we build packages with ("-marm" "-fno-omit-frame-pointer"),
because need frame based stack unwinder for every allocation, as I said
before. As we know GCC sets fp to lr on the stack with
("-fno-omit-frame-pointer" and "-marm") and I don't really know why.
But the binary size is bigger than for thumb, so, we cannot use default
thumb frame pointer and want to reduce binary size for the full
sanitized image.

The issue is that the frame pointer and frame chain always add a large
overhead even when you do not use any sanitizers. This is especially bad
for the proposed patch - you lose much of the benefit of using Thumb-2...

Using normal unwinding means your code runs at full speed and still can be
used by the sanitizer.

In other case clang works the same way, as I offered at the patch.
It has the same issue, but it was fixed at the end of 2017
https://bugs.llvm.org/show_bug.cgi?id=18505 (The topics starts from
discussion about APCS, but it is not the main point.)

Also, unresolved issue related to this
https://github.com/google/sanitizers/issues/640

Adding support for a frame chain would require an ABI change. It would have to
work across GCC, LLVM, Arm, Thumb-1 and Thumb-2 - not a trivial amount of
effort.

Wilco




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]