Adding Thread-Safety Annotations to DWARF for Use by Static/Dynamic Analysis Tools
Author: Jaeheon Yi (jaeheon@google.com)
Date: August 14, 2008
Introduction
We have created program annotations for C/C++ (based on GCC attributes) to help developers document the intended locking policy in their multi-threaded code. In addition to documentation, these annotations are currently used by a new GCC analysis pass to identify potential thread-safety issues at compile time. It would be useful (and in fact has been requested) to push the annotation information to the binary so that other dynamic (or static) tools can use the information in their analysis. For example, Helgrind, a dynamic race detector based on Valgrind, could potentially benefit from the GUARDED_BY annotations and emit more accurate warnings earlier in its analysis. We decided to emit the annotation information in DWARF to take advantage of the existing DWARF support in GCC. The rest of this document details our initial design. Note that in our first implementation, we focus only on the annotations documenting the lock requirements for variables and functions as they are more useful to other analysis tools.
Dwarf Modification Proposal
We propose adding five DWARF attributes, a form class, and a section to represent thread-safety annotations. This is an incomplete list of modifications to support thread-safety annotations in DWARF.
New Thread-Safety Annotation Attributes
Attribute name |
Value |
Classes |
Child of |
Description |
DW_AT_GNU_guarded_by |
0x2108 |
mutexlistptr |
DW_TAG_variable, DW_TAG_member |
The mutexes that guard a variable |
DW_AT_GNU_pt_guarded_by |
0x2109 |
mutexlistptr |
DW_TAG_variable, DW_TAG_member |
The mutexes that guard a pointed-to variable |
DW_AT_GNU_guarded |
0x210a |
mutexlistptr |
DW_TAG_variable, DW_TAG_member |
|
DW_AT_GNU_pt_guarded |
0x210b |
mutexlistptr |
DW_TAG_variable, DW_TAG_member |
|
DW_AT_GNU_locks_excluded |
0x210c |
mutexlistptr |
DW_TAG_subprogram |
The mutexes that must not be acquired for this function to be called |
DW_AT_GNU_exclusive_locks_required |
0x210d |
mutexlistptr |
DW_TAG_subprogram |
The mutexes that must be write-acquired for this function to be called |
DW_AT_GNU_shared_locks_required |
0x210e |
mutexlistptr |
DW_TAG_subprogram |
The mutexes that must be shared-acquired for this function to be called |
New Form Class
mutexlistptr: Refers to a location in the DWARF section that holds lists of (reference, constant) pairs.
For now, this class contains only the specific forms DW_FORM_data4 or DW_FORM_data8.
New Section
Name: .debug_mutex
A separate object file section, containing lists of (reference, constant) pairs.
A mutex list is indicated by a thread-safety annotation attribute whose value is represented as an offset from the beginning of the section to the first byte of the mutex list. Each list terminates in a null pair, i.e., the byte value of (0,0).
A mutex list entry has the form (reference, constant), and is intended to refer to a specific mutex instance. The first part refers to a DIE which is either a mutex instance, or the outermost enclosing entity that contains the mutex. The second part is the mutex's byte offset from the base address of the referred-to DIE. Each entry is a pair because the outermost enclosing entity's base address is not enough to always distinguish the mutex's address, which is necessary for data-race detection purposes.
Some examples of "outermost enclosing entities": a mutex may be a (nested) member of an object - then the corresponding entry's first element refers to the outermost enclosing object's DIE, and the second element is the offset to access the mutex. Another example is where a mutex is the 4th element of an array - then the entry's first element refers to the enclosing array's DIE, and the second element is the byte offset into the array.
Examples
We provide three code listings and their respective DWARF fragments to illustrate the proposed changes. j1.cc is a simple example of how thread-safety annotations are used; j5.cc has multiple nested mutexes that show the effectiveness of offsets and lists. k05.cc is an example usage of PT_GUARDED_BY.
j1.cc code listing
j1.dwarf: Existing DIEs for a_, mu_, and bar()
<2><f6>: Abbrev Number: 14 (DW_TAG_member) <f7> DW_AT_name : a_ <fa> DW_AT_decl_file : 1 <fb> DW_AT_decl_line : 7 <fc> DW_AT_type : <e3> <100> DW_AT_data_member_location: 2 byte block: 23 0 (DW_OP_plus_uconst: 0) <103> DW_AT_accessibility: 3 (private) <2><104>: Abbrev Number: 14 (DW_TAG_member) <105> DW_AT_name : mu_ <109> DW_AT_decl_file : 1 <10a> DW_AT_decl_line : 8 <10b> DW_AT_type : <25> <10f> DW_AT_data_member_location: 2 byte block: 23 4 (DW_OP_plus_uconst: 4) <112> DW_AT_accessibility: 3 (private) <2><113>: Abbrev Number: 15 (DW_TAG_subprogram) <114> DW_AT_external : 1 <115> DW_AT_name : bar <119> DW_AT_decl_file : 1 <11a> DW_AT_decl_line : 11 <11b> DW_AT_MIPS_linkage_name: (indirect string, offset: 0x13b): _ZN3Foo3barEv <11f> DW_AT_declaration : 1
j1.dwarf: Proposed DIEs for a_, mu_, and bar(), with additional tag entry abbreviations and separate mutex section
...
22 DW_TAG_member [no children]
DW_AT_name DW_FORM_string
DW_AT_decl_file DW_FORM_data1
DW_AT_decl_line DW_FORM_data1
DW_AT_type DW_FORM_ref4
DW_AT_data_member_location DW_FORM_block1
DW_AT_accessibility DW_FORM_data1
DW_AT_GNU_guarded_by DW_FORM_data4
23 DW_TAG_subprogram [has children]
DW_AT_external DW_FORM_flag
DW_AT_name DW_FORM_strp
DW_AT_decl_file DW_FORM_data1
DW_AT_decl_line DW_FORM_data1
DW_AT_type DW_FORM_ref4
DW_AT_low_pc DW_FORM_addr
DW_AT_high_pc DW_FORM_addr
DW_AT_frame_base DW_FORM_data4
DW_AT_GNU_locks_excluded DW_FORM_data4
...
<2><f6>: Abbrev Number: 22 (DW_TAG_member)
<f7> DW_AT_name : a_
<fa> DW_AT_decl_file : 1
<fb> DW_AT_decl_line : 7
<fc> DW_AT_type : <e3>
<100> DW_AT_data_member_location: 2 byte block: 23 0 (DW_OP_plus_uconst: 0)
<103> DW_AT_accessibility: 3 (private)
<104> DW_AT_GNU_guarded_by : 0 (mutex list)
<2><108>: Abbrev Number: 14 (DW_TAG_member)
<109> DW_AT_name : mu_
<10c> DW_AT_decl_file : 1
<10d> DW_AT_decl_line : 8
<10e> DW_AT_type : <25>
<112> DW_AT_data_member_location: 2 byte block: 23 4 (DW_OP_plus_uconst: 4)
<115> DW_AT_accessibility: 3 (private)
<2><116>1: Abbrev Number: 23 (DW_TAG_subprogram)
<117> DW_AT_external : 1
<118> DW_AT_name : bar
<11a> DW_AT_decl_file : 1
<11c> DW_AT_decl_line : 11
<11d> DW_AT_MIPS_linkage_name: (indirect string, offset: 0x13b): _ZN3Foo3barEv
<121> DW_AT_declaration : 1
<125> DW_AT_GNU_locks_excluded: 16 (mutex list)
...
Contents of the .debug_mutex section:
ref offset
00000108 00000000
00000000 00000000
00000108 00000000
00000000 00000000Note. As an optimization, we can try to reduce the number of mutex lists in .debug_mutex by having thread-safety attributes point to identical mutex lists. For example in the above DWARF fragment, in reference 125 we can have DW_AT_GNU_locks_excluded take the value of 0.
j5.cc code listing
j5.dwarf: DIEs for a_, bar(), and arrayMu[]
<1><25>: Abbrev Number: 2 (DW_TAG_class_type) <26> DW_AT_name : (indirect string, offset: 0x163): Mutex <2a> DW_AT_byte_size : 1 <2b> DW_AT_decl_file : 2 <2c> DW_AT_decl_line : 53 <2d> DW_AT_sibling : <cb> ... <2><f6>: Abbrev Number: 14 (DW_TAG_member) <f7> DW_AT_name : a_ <fa> DW_AT_decl_file : 1 <fb> DW_AT_decl_line : 11 <fc> DW_AT_type : <e3> <100> DW_AT_data_member_location: 2 byte block: 23 0 (DW_OP_plus_uconst: 0) <103> DW_AT_accessibility: 3 (private) <2><104>: Abbrev Number: 15 (DW_TAG_subprogram) <105> DW_AT_external : 1 <106> DW_AT_name : bar <10a> DW_AT_decl_file : 1 <10b> DW_AT_decl_line : 14 <10c> DW_AT_MIPS_linkage_name: (indirect string, offset: 0x155): _ZN3Foo3barEv <110> DW_AT_declaration : 1 <3><111>: Abbrev Number: 4 (DW_TAG_formal_parameter) <112> DW_AT_type : <119> <116> DW_AT_artificial : 1 ... <1><15f>: Abbrev Number: 20 (DW_TAG_array_type) <160> DW_AT_type : <25> <164> DW_AT_sibling : <16f> <2><168>: Abbrev Number: 21 (DW_TAG_subrange_type) <169> DW_AT_type : <16f> <16d> DW_AT_upper_bound : 9 <1><16f>: Abbrev Number: 22 (DW_TAG_base_type) <170> DW_AT_byte_size : 4 <171> DW_AT_encoding : 7 (unsigned) <1><172>: Abbrev Number: 23 (DW_TAG_variable) <173> DW_AT_name : (indirect string, offset: 0xe6): arrayMu <177> DW_AT_decl_file : 1 <178> DW_AT_decl_line : 8 <179> DW_AT_type : <15f> <17d> DW_AT_external : 1 <17e> DW_AT_location : 5 byte block: 3 0 0 0 0 (DW_OP_addr: 0)
j5.dwarf: Proposed DIEs for a_, bar(), and arrayMu[] with additional tag entry abbreviations
...
24 DW_TAG_member [no children]
DW_AT_name DW_FORM_string
DW_AT_decl_file DW_FORM_data1
DW_AT_decl_line DW_FORM_data1
DW_AT_type DW_FORM_ref4
DW_AT_data_member_location DW_FORM_block1
DW_AT_accessibility DW_FORM_data1
DW_AT_GNU_guarded_by DW_FORM_data4
25 DW_TAG_subprogram [has children]
DW_AT_external DW_FORM_flag
DW_AT_name DW_FORM_string
DW_AT_decl_file DW_FORM_data1
DW_AT_decl_line DW_FORM_data1
DW_AT_MIPS_linkage_name DW_FORM_strp
DW_AT_declaration DW_FORM_flag
DW_AT_GNU_exclusive_locks_required DW_FORM_data4
...
<1><25>: Abbrev Number: 2 (DW_TAG_class_type)
<26> DW_AT_name : (indirect string, offset: 0x163): Mutex
<2a> DW_AT_byte_size : 1
<2b> DW_AT_decl_file : 2
<2c> DW_AT_decl_line : 53
<2d> DW_AT_sibling : <cb>
...
<2><f6>: Abbrev Number: 24 (DW_TAG_member)
<f7> DW_AT_name : a_
<fa> DW_AT_decl_file : 1
<fb> DW_AT_decl_line : 11
<fc> DW_AT_type : <e3>
<100> DW_AT_data_member_location: 2 byte block: 23 0 (DW_OP_plus_uconst: 0)
<103> DW_AT_accessibility: 3 (private)
<104> DW_AT_GNU_guarded_by : 0 (mutex list)
<2><108>: Abbrev Number: 25 (DW_TAG_subprogram)
<109> DW_AT_external : 1
<10a> DW_AT_name : bar
<10e> DW_AT_decl_file : 1
<10f> DW_AT_decl_line : 14
<110> DW_AT_MIPS_linkage_name: (indirect string, offset: 0x155): _ZN3Foo3barEv
<114> DW_AT_declaration : 1
<115> DW_AT_GNU_exclusive_locks_required: 16 (mutex list)
<3><119>: Abbrev Number: 4 (DW_TAG_formal_parameter)
<11a> DW_AT_type : <119>
<11e> DW_AT_artificial : 1
...
<1><15f>: Abbrev Number: 20 (DW_TAG_array_type)
<160> DW_AT_type : <25>
<164> DW_AT_sibling : <16f>
<2><168>: Abbrev Number: 21 (DW_TAG_subrange_type)
<169> DW_AT_type : <16f>
<16d> DW_AT_upper_bound : 9
<1><16f>: Abbrev Number: 22 (DW_TAG_base_type)
<170> DW_AT_byte_size : 4
<171> DW_AT_encoding : 7 (unsigned) <1><172>: Abbrev Number: 23 (DW_TAG_variable)
<173> DW_AT_name : (indirect string, offset: 0xe6): arrayMu
<177> DW_AT_decl_file : 1
<178> DW_AT_decl_line : 8
<179> DW_AT_type : <15f>
<17d> DW_AT_external : 1
<17e> DW_AT_location : 5 byte block: 3 0 0 0 0 (DW_OP_addr: 0)
...
Contents of the .debug_mutex section:
ref offset
0000016f 00000004
00000000 00000000
0000016f 00000004
0000016f 00000005
00000000 00000000Note. In the above code listing, the size of a Mutex object is 1 byte. This is the reason that the offsets in .debug_mutex coincide with the array indices.
k05.cc code listing
1 class Foo {
2 Mutex m;
3 int *p PT_GUARDED_BY(m);
4
5 public:
6 Foo(int *q) {
7 p = q;
8 }
9
10 void inc() LOCKS_EXCLUDED(m) {
11 m.Lock();
12 (*p)++;
13 m.Unlock();
14 }
15 };
16
17 int *a = new int;
18 Foo f(a); // Share f amongst many threads
19
20 int main() {
21 f.inc();
22 delete a;
23 return 0;
24 }
k05.dwarf: DIEs for m, p, Foo.inc()
... <1><29>: Abbrev Number: 2 (DW_TAG_class_type) <2a> DW_AT_name : (indirect string, offset: 0x1ad): Mutex <2e> DW_AT_byte_size : 1 <2f> DW_AT_decl_file : 2 <30> DW_AT_decl_line : 53 <31> DW_AT_sibling : <cf> ... <1><cf>: Abbrev Number: 8 (DW_TAG_pointer_type) <d0> DW_AT_byte_size : 4 <d1> DW_AT_type : <29> <1><d5>: Abbrev Number: 9 (DW_TAG_base_type) <d6> DW_AT_byte_size : 1 <d7> DW_AT_encoding : 2 (boolean) <d8> DW_AT_name : (indirect string, offset: 0x12a): bool <1><dc>: Abbrev Number: 10 (DW_TAG_reference_type) <dd> DW_AT_byte_size : 4 <de> DW_AT_type : <e2> <1><e2>: Abbrev Number: 11 (DW_TAG_const_type) <e3> DW_AT_type : <e7> <1><e7>: Abbrev Number: 12 (DW_TAG_base_type) <e8> DW_AT_byte_size : 4 <e9> DW_AT_encoding : 5 (signed) <ea> DW_AT_name : int <1><ee>: Abbrev Number: 13 (DW_TAG_class_type) <ef> DW_AT_name : Foo <f3> DW_AT_byte_size : 8 <f4> DW_AT_decl_file : 1 <f5> DW_AT_decl_line : 6 <f6> DW_AT_sibling : <142> <2><fa>: Abbrev Number: 14 (DW_TAG_member) <fb> DW_AT_name : m <fd> DW_AT_decl_file : 1 <fe> DW_AT_decl_line : 7 <ff> DW_AT_type : <29> <103> DW_AT_data_member_location: 2 byte block: 23 0 (DW_OP_plus_uconst: 0) <106> DW_AT_accessibility: 3 (private) <2><107>: Abbrev Number: 14 (DW_TAG_member) <108> DW_AT_name : p <10a> DW_AT_decl_file : 1 <10b> DW_AT_decl_line : 8 <10c> DW_AT_type : <142> <110> DW_AT_data_member_location: 2 byte block: 23 4 (DW_OP_plus_uconst: 4) <113> DW_AT_accessibility: 3 (private) ... <2><12d>: Abbrev Number: 16 (DW_TAG_subprogram) <12e> DW_AT_external : 1 <12f> DW_AT_name : inc <133> DW_AT_decl_file : 1 <134> DW_AT_decl_line : 15 <135> DW_AT_MIPS_linkage_name: (indirect string, offset: 0x19f): _ZN3Foo3incEv <139> DW_AT_declaration : 1 <3><13a>: Abbrev Number: 4 (DW_TAG_formal_parameter) <13b> DW_AT_type : <148> <13f> DW_AT_artificial : 1 <1><142>: Abbrev Number: 8 (DW_TAG_pointer_type) <143> DW_AT_byte_size : 4 <144> DW_AT_type : <e7> <1><148>: Abbrev Number: 8 (DW_TAG_pointer_type) <149> DW_AT_byte_size : 4 <14a> DW_AT_type : <ee>
k05.dwarf: Proposed DIEs for m, p, Foo.inc() with additional tag abbreviations
29 DW_TAG_member [no children]
DW_AT_name DW_FORM_string
DW_AT_decl_file DW_FORM_data1
DW_AT_decl_line DW_FORM_data1
DW_AT_type DW_FORM_ref4
DW_AT_data_member_location DW_FORM_block1
DW_AT_accessibility DW_FORM_data1
DW_AT_GNU_pt_guarded_by DW_FORM_data4
30 DW_TAG_subprogram [has children]
DW_AT_external DW_FORM_flag
DW_AT_name DW_FORM_string
DW_AT_decl_file DW_FORM_data1
DW_AT_decl_line DW_FORM_data1
DW_AT_MIPS_linkage_name DW_FORM_strp
DW_AT_declaration DW_FORM_flag
DW_AT_GNU_locks_excluded DW_FORM_data4
...
<1><29>: Abbrev Number: 2 (DW_TAG_class_type)
<2a> DW_AT_name : (indirect string, offset: 0x1ad): Mutex
<2e> DW_AT_byte_size : 1
<2f> DW_AT_decl_file : 2
<30> DW_AT_decl_line : 53
<31> DW_AT_sibling : <cf>
...
<1><cf>: Abbrev Number: 8 (DW_TAG_pointer_type)
<d0> DW_AT_byte_size : 4
<d1> DW_AT_type : <29>
<1><d5>: Abbrev Number: 9 (DW_TAG_base_type)
<d6> DW_AT_byte_size : 1
<d7> DW_AT_encoding : 2 (boolean)
<d8> DW_AT_name : (indirect string, offset: 0x12a): bool
<1><dc>: Abbrev Number: 10 (DW_TAG_reference_type)
<dd> DW_AT_byte_size : 4
<de> DW_AT_type : <e2>
<1><e2>: Abbrev Number: 11 (DW_TAG_const_type)
<e3> DW_AT_type : <e7>
<1><e7>: Abbrev Number: 12 (DW_TAG_base_type)
<e8> DW_AT_byte_size : 4
<e9> DW_AT_encoding : 5 (signed)
<ea> DW_AT_name : int
<1><ee>: Abbrev Number: 13 (DW_TAG_class_type)
<ef> DW_AT_name : Foo
<f3> DW_AT_byte_size : 8
<f4> DW_AT_decl_file : 1
<f5> DW_AT_decl_line : 6
<f6> DW_AT_sibling : <142>
<2><fa>: Abbrev Number: 14 (DW_TAG_member)
<fb> DW_AT_name : m
<fd> DW_AT_decl_file : 1
<fe> DW_AT_decl_line : 7
<ff> DW_AT_type : <29>
<103> DW_AT_data_member_location: 2 byte block: 23 0 (DW_OP_plus_uconst: 0)
<106> DW_AT_accessibility: 3 (private)
<2><107>: Abbrev Number: 29 (DW_TAG_member)
<108> DW_AT_name : p
<10a> DW_AT_decl_file : 1
<10b> DW_AT_decl_line : 8
<10c> DW_AT_type : <142>
<110> DW_AT_data_member_location: 2 byte block: 23 4 (DW_OP_plus_uconst: 4)
<113> DW_AT_accessibility: 3 (private)
<114> DW_AT_GNU_pt_guarded_by: 0 (mutex list)
...
<2><12d>: Abbrev Number: 30 (DW_TAG_subprogram)
<12e> DW_AT_external : 1
<12f> DW_AT_name : inc
<133> DW_AT_decl_file : 1
<134> DW_AT_decl_line : 15
<135> DW_AT_MIPS_linkage_name: (indirect string, offset: 0x19f): _ZN3Foo3incEv
<139> DW_AT_declaration : 1
<13a> DW_AT_GNU_locks_excluded: 16 (mutex list)
<3><13e>: Abbrev Number: 4 (DW_TAG_formal_parameter)
<13f> DW_AT_type : <148>
<143> DW_AT_artificial : 1
<1><146>: Abbrev Number: 8 (DW_TAG_pointer_type)
<147> DW_AT_byte_size : 4
<148> DW_AT_type : <e7>
<1><14c>: Abbrev Number: 8 (DW_TAG_pointer_type)
<14d> DW_AT_byte_size : 4
<14e> DW_AT_type : <ee>
...
Contents of the .debug_mutex section:
ref offset
000000fa 00000000
00000000 00000000
000000fa 00000000
00000000 00000000
Use Cases
In the initial design, we support only a subset of the thread-safety annotations [1], focusing on annotated locking requirements.
1. Class member or global variable can be ...
- GUARDED_BY some mutex
- pointing to some variable, PT_GUARDED_BY some mutex
2. Guarding mutex can be ...
- class member
- global variable
- nested member of an object
- array member
- referenced by a pointer
- all of the above
3. Class methods and global functions can have ...
- LOCKS_EXCLUDED
- EXCLUSIVE_LOCKS_REQUIRED
- SHARED_LOCKS_REQUIRED
- In addition, each category may contain multiple mutexes for a given function
We aim to support combining at least the above usages, and believe that this covers the majority of cases that could be useful to tools like Helgrind.
Limitations and Caveats
1. For mutexes that are described through multiple pointer accesses, it may be difficult to express the base address in terms of a single DIE reference. As a result, the recorded offset in .debug_mutex might be meaningless. We are looking into DWARF location expressions to see if they are useful in this situation. Also some lock expressions are not expressible in the current annotations mechanism (GCC attributes).
References
[1] C/C++ Thread-Safety Annotations
[2] DWARF tutorial by M. Eager