Dwarf2EHNewbiesHowto

This document is taken from the version revised by Aldy Hernandez and posted to the GCC mailing list here.

Until this document is fully marked-up, here are clickable versions of the embedded links:

Exception Handling on HP-UX

http://www.usenix.org/events/osdi2000/wiess2000/full_papers/dinechin/dinechin_html/

C++ ABI for Itanium: Exception Handling

http://www.codesourcery.com/cxx-abi/abi-eh.html

Dwarf2 Exception Handler HOWTO

Authors: Will Cohen and Andwrew Macleod, Red Hat Inc.

Description of EH

Exception Handling (EH) is provided by languages such as C++ and Java to indicate something unusual has happened in a called function and special action needs to be taken to resolve the problem. C++ and Java surround code that may produce an exception with a try block. Following the try block will be a catch block. The code in the catch block is only executed if a throw is encountered within the scope of the try block. A throw is used to start the exception handling process. If a throw is encountered, the processor goes up the call chain looking for an appropriate catch. Once an appropriate catch is found, the registers and stack are fixed up to resume execution in the catch block.

How EH works

Although the mechanism is simple in concept, the details introduced by the compiler complicate the mechanism. Most compiler ports use the processor registers to store values. The register values are often saved on the stack in a function's prologue and restored in the function's epilogue to give the function additional working registers. When the throw occurs, the processor does not complete the execution of the functions on the call chain between the function that contains the catch and the function performing the throw. However, the processor must restore the register values (including the frame and stack pointers) even though epilogues are not executed for functions on the call chain. In GCC the exception handling process can be implemented either with a setjmp/longjmp mechanism or a stack unwinder that uses dwarf2 debugging information to determine how to restore the registers. Because the dwarf2 mechanism leads to more compact code that executes more efficiently, the disccusion will be limited to the dwarf2 mechanism.

Consider the following function call chain:

a() calls b() calls c()

If c() throws, and the throw is caught by a(), we have to restore all the registers which c() saved in the prologue, then restore all the registers which b()'s prologue saved. Finally, control is transfered back to the exception handler located in the catch of function a(). This puts all the correct values back in their places so that a() will execute properly.

The epilogues cannot be run for each function because they will do other unrelated things (for example produce return values), as well as assume a certain start state themselves. Thus, a stack unwinding mechanism is used to restore the registers. The compiler marks every functions' prologue instructions that adjust the stack pointer, change the frame pointer, or store a saved register with a RTX_FRAME_RELATED_EXPR note. This information to unwind the stack is recorded in the dwarf2 debugging information.

The throw can use an object to pass information back to a catch. Because the portion of the stack being used by the function initiating the throw will soon be removed, memory for the object being thrown will be allocated on the heap rather than the stack, so the object will still exist at the catch. This is implemented with a call to the function __cxa_allocate_exception() followed by a call to a constructor to initalize the object. After the object is created, the throw is initiated with __cxa_throw(), and the processor will never return from this call.

The unwinding is performed in two steps. The first phase searchs for a frame with an associated catch and tracks the values for the registers. Much of the first phase is managed by _Unwind_RaiseException() in unwind.inc. The second phase installs the restored values for the registers, adjusts the stack and jumps to the catch. This is implemented by special code in the function epilogue of _Unwind_Raise_Exception().

The actual jump that transfers to the catch usually jumps to a landing pad rather than to the catch directly. The landing pad may perform fixup code within the function due to the optimizations performed within the function with the catch. After the landing pad code is executed, a switch case is executed to determine what action to take. There may be multiple catches associated with a particular try, each is for a different type of object thrown. It is possible that none of the catches in that frame match and the unwinding will need to resume to find an appropriate catch. The information on the type of object being thrown is passed in one of the processor registers.

For more information about the operation of the EH mechanism read the following documents:

Exception Handling on HP-UX
C++ ABI for Itanium: Exception Handling
Exception Handling for a C++ on Tahoe

Requirements for EH

The exception handing via dwarf2 debugging information requires several things to work:

Two registers to pass information into catch (EH_RETURN_DATA_REGNO):
- exc_ptr -- EH_RETURN_DATA_REGNO(0)
- filter (select appropriate catch) -- EH_RETURN_DATA_REGNO(1)
Two registers in epilogue described by RTL:
- EH_RETURN_STACKADJ_RTX (how much to bump stack pointer)
- EH_RETURN_HANDLER_RTX (where the jump should return to)
Unaligned accesses to read dwarf2 information
Binutils that undertand dwarf2 object file format for the target

Implementing Dwarf2 EH

Given a function call chain as was described above, RTX_FRAME_RELATED_EXPR notes are specified for a handful of significant instructions which affect registers the unwinder cares about. In general, these are:

The stack pointer.
The frame pointer.
The return address register, if it is not in memory.
Any call preserved register whose value is saved and is restored by the epilogue.

The compiler then examines these instructions and generates dwarf2 unwind information in the .eh_frame section. This information, encoded in dwarf2, are instructions for a state machine which describes where values are saved, and how to get at them. The library libgcc contains a dwarf2 interpreter which is used at runtime to actually get these values and restore them to their correct registers. Once described properly, this all happens automatically. Most of the time, ports do not need to modify the runtime at all, just set things up in their config directory and flag the proper instructions.

In order to understand which instructions are actually significant, it will help if you understand approximately how the dwarf2 unwind code works.

In order to restore the registers for function c() (which is executing a throw), we need to execute the dwarf2 code representing instructions up to the point where the throw happens. I.e., interpret the dwarf2 code which will "undo" the register saves which have been performed up until the point of the throw. This part is taken care of automatically as well. What is important is that we tag all the correct instructions. Again, we only care about prologue instructions. Any other ones which the unwinder might care about it will find itself. Usually, these are just other stack bumps within the body of a function.

The dwarf2 unwinder keeps a value called the Canonical Frame Address (CFA). All memory references it makes are relative to this address. Initially, this value is defined as the value of the stack pointer upon entry to the function (i.e. before any instructions are actually executed). Any memory references in the prologue to save the values of register are stored in the dwarf2 info as a POSITIVE offset to the value of the CFA. It looks at STACK_GROW_DOWNWARD to determine whether that offset is added or subtracted from the CFA.

The emitter tracks the value of the CFA by remembering what register it is based off of, and an offset to this register. By default this offset is 0, so the CFA is defined upon entry to the function as SP + 0. If the target stack frame stores any values at a negative offset to this, (for example, some targets store the return address in the previous 4 bytes) we need to define an initial offset for the CFA such that the offset is always non-negative. This is accomplished by defining the value the following macro in the target's header file.

#define INCOMING_FRAME_SP_OFFSET x

Where x would normally be 0, but in this case we need to specify 4 or -4 in order for all the offsets to be positive. The unwinder looks to see if STACK_GROWS_DOWNWARD is defined to determine which way the stack goes, and adjusts the sign of all its offsets appropriately. If the stack does grow downward, it knows all saves/load offsets will actually be subtracted from the CFA, or if the stack grows upward, the offsets will all be added to the CFA. In order to get the initial offset value of the CFA correct, you will need to subtract 4 bytes if the stack grows upward, or add 4 bytes if the stack grows downward.

We have to flag any instruction in the prologue which executes a register save which is in turn restored in the epilogue. The dwarf2 emitter needs to be able to examine the instruction and determine at what offset this store is going to happen from the CFA. The register number and this value is then inserted into the dwarf2 code stream. The key here is that the emitter needs to be able to tell from the instruction what the address is. Currently, the emitter only handles simple cases, so the instruction needs to be relatively self explanatory:

Since the CFA is initially defined as the value of the stack pointer, it is easy if the instruction saves off the stack pointer. For example:

(set (mem:SI
        (plus:SI (reg:SI SP) (const_int 8)))
     (reg:SI 8))

This saves register 8 at CFA + 8. It's easy to figure it from looking at the instruction, as long as we know of any modifications we've made to SP since the start of the function. So the above insttruction will need no annotations.

However, if our save instruction instead looks like:

(set (mem:SI
        (plus:SI (reg:SI SP) (reg:SI 6)))
     (reg:SI 8))

In this case, the unwinder has no way of knowing at what offset register 8 is being saved at, so the prologue code should emit an RTX_FRAME_RELATED_EXPR note that describes the action.

This is done by setting the RTX_FRAME_RELATED_P flag on the instruction, and attaching a REG_FRAME_RELATED_EXPR note to it. If this note is present, the unwinder ignores the instruction itself, and looks at the note to determine the instruction's action. For example, if the value of register 6 is 24, we would attach a note to our initial instruction like, such that the resulting instruction looks like this:

  (insn x x x x (set (mem:SI
                       (plus:SI (reg:SI SP) (reg:SI 6)))
                     (reg:SI 8))
                (expr_list:REG_FRAME_RELATED_EXPR
                  (set (mem:SI
                         (plus:SI (reg:SI SP) (const_int 24)))
                       (reg:SI 8))))

It is very important that at any given point in the function, the unwinder knows how to find the value of the CFA. Sometimes it is easy, as this value might be contained in the frame_pointer for the duration of the function, but other times we might have to calculate it. The runtime part takes care of this, inasmuch as we flag the appropriate instructions.

Relevant Macros:

RETURN_ADDR_RTX
This macro must be defined for a frame value of 0. It must be possible to retrieve the return address pointer in the current function in order to throw properly. This means that the prologue and epilogue must be structured in a way such that the return address is stored at a known offset or available in a register. You must define this macro or the call to builtin_return_address(0) will assume that that your return address is at SP + sizeof(Pmode).
INCOMING_RETURN_ADDR_RTX
This is the macro that enables dwarf2 EH. If it is not defined, exception handling will be implemented using setjmp and longjmp. The value of the macro is an RTX expression which is used to determine where the return address is located upon entry to a function, before any prologue instructions are exceuted. If the return address is passed in in a register, it would look something like:
#define INCOMING_RETURN_ADDR_RTX gen_rtx_REG (Pmode, 26)
DWARF_FRAME_RETURN_COLUMN
This tells the dwarf2 unwind mechanism which dwarf2 register slot the return address can be found. This is only a temporary internal storage location the unwinder uses to track things, so it doesn't have to be the correct location, it just has to be one which is not going to be used for anything else. The dwarf unwind mechanism keeps its own internal register mapping list to track where various hardware register are saved. This column number is simply the index of where in this internal list we can use a place mark for the return address value. If the return address is kept in a dedicated register, you should define this macro to refer to that register. For example, the arm backend defines this as:
#define DWARF_FRAME_RETURN_COLUMN DWARF_FRAME_REGNUM (LR_REGNUM)

By default, this will be either the PC slot, or the first value past the end of the hard registers. Generally, you only have to worry about this value if your port has a lot of registers. The dwarf2 unwind spec requires that the return address column number be a single byte value, so it must be less than 256. The only times you will have to define this will be if:
- the return addess is stored in a hard register OR
- you have more than 255 physical registers AND
- the PC register has a register number greater than 255.
When setting this macro, you need to choose some register whose gcc register number is less than 256, a register which will never be saved and restored in the prologue.
EH_RETURN_DATA_REGNO(N)
The new implementation of the EH uses two registers to pass information back to catch. The macro EH_RETURN_DATA_REGNO maps the values to hard registers. These registers require stack slots, so they cannot be scratch registers that are not saved across function calls. The macro EH_RETURN_DATA_REGNO will need to be defined and return appropriate register numbers for the values 0 and 1. Below is an example:
```
#define EH_RETURN_DATA_REGNO(N) \
        ((N) == 0 ? GPR_R7 : (N) == 1 ? GPR_R8 : INVALID_REGNUM)
```
EH_RETURN_HANDLER_RTX
The EH_RETURN_HANDLER_STACKADJ_RTX macro returns RTL which describes the location used to store the address the processor should jump to catch exception. This is usually a register that is available from end of the function's body to the end of the epilogue. Thus, this cannot be a register used as a temporary by the epilogue.

EPILOGUE COMMUNICATION

In key function in the unwinder is _Unwind_RaiseException(). In order for _Unwind_RaiseException() to work properly, it needs to be able to transfer control to the appropriate catch handler. Consequently, _Unwind_RaiseException() is processed in a special manner. First, it is compiled such that every possible preserved register is saved in the prologue and restored in the epilogue (this is done by virtue of __Unwind_Raise_Exception() calling uw_init_context(), which calls __builtin_unwind_init, which sets the function as having a nonlocal receiving function, causing gcc to mark all registers as live).

Next, the unwinder uses the EH tables to determine where this throw should transfer control to. The dwarf2 unwind interpreter is used to figure out what values are supposed to be in which registers. As the various values of the register are determined during unwinding, they are saved in a table which tracks the position in the frame of each value. We are not unwinding the stack yet, just determining where the right values for each register are currently located.

When we are ready to unwind the stack, we go through this table, and if any register does not already have the right value (for example it was saved in some prologue), we know where it is saved, and we copy it from that location into the place where _Unwind_RaiseException's prologue stored it. So we overwrite the value _Unwind_RaiseException() saved with the value we think the register should have when we transfer control to our selected handler. When we return from _Unwind_RaiseException(), the epilogue will restore these values.

There are still a few values which need to be fixed up. First, the return address of the function that called _Unwind_RaiseException() must be replaced with the address of the desired handler. When the return from _Unwind_RaiseException occurs, it will actually transfer control to the desired handler instead of returning to where _Unwind_RaiseException was called. The value of the stack pointer also needs to be adjusted.

Most of this is handled automatically, but you will have to do adjust the return address and stack pointer in the epilogue. You will need to define an eh_return pattern in your md file which will save the stack adjustment and return address values to the appropriate temporary registers. The code to generate the function epilogue will use the values in these registers to adjust the stack and jump to the appropriate location.

The trick with the eh_return instruction is that you will need to find 2 registers to use from the end of the function to the end of the epilogue. The last thing _Unwind_RaiseException() does is process the eh_return instruction which will set the stack offset and return address into the 2 registers specified in the eh_epilogue. Until needed at the end of the epilogue, these values cannot be overwritten. Typically, you pick 2 registers which are not preserved over calls, nor used as temporaries during epilogue processing. It is possible to use the register holding the return function value for the stack adjustment value. If the processor uses a register to hold the return address and you can prevent the epilogue from reloading the register from the stack, you can store the target address in the return address register.

You will need two additional registers to communicate information to the catch. These registers require stack slots, so they cannot be scratch registers that are not saved across function calls. The macro EH_RETURN_DATA_REGNO will need to be defined and return appropriate register numbers for the values 0 and 1.

REGISTERING FRAME INFORMATION

In order for exception handling to work, you need to register the frame information at runtime. This is accomplished in the same way that constructors and destructors are registered.

__register_frame_info() needs to be called in exactly the same way that a static constructor would be called. Each object file has a .eh_frame section which contains the frame information. If the given port does not support named sections, then the frame information will be issued in a text section with the label __FRAME_BEGIN__.

__register_frame_info should be called before any constructors, in case a constructor throws an exception.

The frame section's address is the first argument passed to __register_frame_info(). The second argument is the address of a local frame object. This is the struct object defined in frame.h.

After all the destructors are being called, __deregister_frame_info() must be called with the address of the start of the section.

DEBUGGING TIPS

First, compile a simple test case:

#include <stdio.h>

main() {
  try {
    throw 1;
  }
  catch (...) {
    printf(" in catch\n");
    return 1;
  }
  printf(" back in main\n");
  return 10;
}

When compiled and run like the following it should generate the output "in catch":

bash-2.04$ gcc -o throw0.x86 throw0.C
bash-2.04$ ./throw0.x86
 in catch

Compile it with -S -dA and look at the .s file. There should be an .eh_frame section/area beginning with __FRAME_BEGIN__ and it should be annotated with comments about what each dwarf2 instruction is.
If there is EH data, you should also see a .gcc_except_table section/area, beginning with the label: __EXCEPTION_TABLE__. If neither of these are present, this is the first aspect to fix. Usually these will show up on their own, but if you do not have named sections you might need to coerce it a bit. Make sure you have
enabled DWARF2 EH by defining INCOMING_RETURN_ADDR_RTX, or you will not get these tables. Also look in except.h to make sure that all the conditions hold so that MUST_USE_SJLJ_EXCEPTIONS is defined to be 0. If MUST_USE_SJLJ_EXCEPTIONS is set to 1, then the setjmp/longjmp mechanism will be used. Also check to make sure the instructions in the prologue are properly marked so the unwinder can track register values. This can be checked by using readelf. Look for the saves of the appropriate registers in the Frame Descriptor Entry (FDE). The following is the output of
readelf for an x86 program. The FDE shows saves of r3 and r5:

bash-2.04$ readelf --debug=frames throw1.x86
The section .eh_frame contains:

00000000 00000014 00000000 CIE
  Version:               1
  Augmentation:          "eh"
  Code alignment factor: 1
  Data alignment factor: -4
  Return address column: 8

  DW_CFA_def_cfa: r4 ofs 4
  DW_CFA_offset: r8 at cfa-4

00000018 0000002c 0000001c FDE cie=00000000 pc=08048730..080487ce
  DW_CFA_advance_loc: 1 to 08048731
  DW_CFA_def_cfa_offset: 8
  DW_CFA_offset: r5 at cfa-8
  DW_CFA_advance_loc: 2 to 08048733
  DW_CFA_def_cfa_reg: r5
  DW_CFA_advance_loc: 1 to 08048734
  DW_CFA_offset: r3 at cfa-12
  DW_CFA_advance_loc: 11 to 0804873f
  DW_CFA_GNU_args_size: 16
  DW_CFA_advance_loc: 37 to 08048764
  DW_CFA_GNU_args_size: 8
  DW_CFA_advance_loc: 12 to 08048770
  DW_CFA_GNU_args_size: 16
  DW_CFA_advance_loc: 8 to 08048778
  DW_CFA_GNU_args_size: 0
  DW_CFA_advance_loc: 16 to 08048788
  DW_CFA_GNU_args_size: 16
  DW_CFA_advance_loc: 20 to 0804879c
  DW_CFA_GNU_args_size: 0
  DW_CFA_advance_loc: 18 to 080487ae
  DW_CFA_GNU_args_size: 16

If the EH is using the dwarf2 stack unwinding, there should not be calls to setjmp or longjmp in the assembly language code.
Run the executable with gdb, and put a breakpoint in __register_frame_info (). If the routine does not exist, or it is never called, then the problem is that the unwind frames are not being registered at startup. Generally, what I will do here is compile a short test case which contains a static constructor:

class  A {
public:
  A() { }
  ~A () { }
};

A a;

main () {
}

compile and run it through gdb, setting a breakpoint in A::A(). If that does not work, then constructors and destructors in general are broken and needs to be fixed. Until this works, EH frames will not get registered. Assuming this does work, you can look at the traceback in gdb and see how static constructors are initialized and work on getting the eh_frames registered via a similar mechanism.
If the __register_frame_info() breakpoint gets hit, then the problem is most likely in the actual unwinding. Typically, something in the prologue is incorrect, but it could be your eh_epilogue. Also check to make sure that RETURN_ADDR_RTX is defined properly. In any case, now you have to debug _Unwind_RaiseException() in unwind.inc. If you are lucky, running your test program on a debugger will actually give you a decent traceback and you can track it back from there to see what has actually gone wrong. This is the point at which it is hard to write down what to look for in a document, but you want to watch for a few things:
- Is the _Unwind_RaiseException() unwinding the stack correctly? If everything is operating correctly, the processor should execute the uw_install_context at the end of _Unwind_RaiseException() to restore the registers to the proper values and jump to the approriate exception handling code. _Unwind_RaiseException() may not find a frame that has a catch, and unwind the stack until there is not stack left. It would return _URC_END_OF_STACK in this case.
- Verify that the unaligned loads are working properly for the gcc
  port. Check that the tests gcc.c-torture/execute/packed-1.c and gcc.c-torture/execute/packed-2.c work. The dwarf2 data is not aligned. Thus, the debugging information is not correctly read if unaligned data accesses do not work.
- One possible causes of _Unwind_RaiseException() not correctly unwinding the stack is inccorrect return addresses. Because the unwinding mechanism uses the return addresses to determine which FDE to use to track the stack unwinding, you will want to verify that the correct return addresses are being used. Obtain a disassembled version of the executable with objdump (-d option), so you can map the return address back to the original code. In the function uw_frame_state_for print out the value for context->ra. The first time _Unwind_RaiseException() calls uw_frame_state_for() (from uw_init_context()) it should produce a return address within _Unwind_RaiseException(). Each following call to uw_frame_state_for() should go up the call chain, so initially _Uwind_RaiseException(), then __cxa_throw() and then whatever function performing the throw. If the unwinding gets the return address wrong, it cannot find the correct FDE to figure out how to get the next frame.
- Another cause of incorrect stack unwinding is not getting the CFA to update the registers. The context should have a slot that points the frame pointer register. You should be able to set break points in the function that does the throw and anything else it calls. Print out the stack and frame pointer after the execution of the prologue to
  these functions. Compare these values to the values to context->cfa for the various iterations of the loop in _Unwind_RaiseException().
- If the processor makes it to the `uw_install_context() at the end of
  _Unwind_RaiseException(), but does not seem to be executing the code in the catch, step through the machine instructions in the epilogue of _Unwind_RaiseException(). Examine the value that the stack pointer and stack pointer are set to. The frame pointer should be set to the same value as the frame pointer for the function containing the catch. Step through the return, which transfers control to the catch. Does it jump to a reasonable place?
- Sometimes there are problems in the c++ specific part of the port, typically the rtti (run time type info) stuff that the handler uses to figure out the type of the throw, etc. If the program is crashing in
  __is_pointer() or __cplus_type_matcher(), this is likely your cause.
Once the simple test case works (simple, because the throw is in the same function as the handler, so no multiple stacks needs to be unwound, just the mechanism be present), try this slightly more complex one:

#include <stdio.h>


void f ()
{
  printf (" in f()\n");
  throw 1;
}


main() {
  try {
  printf(" before throw\n");
    f();
  }
  catch (...) {
  printf(" in catch\n");
    return 6;
  }
  printf(" back in main\n");
return 10;
}

Running this one will require that we unwind through f(), and require stack adjustments, and exercises most of the unwind mechanism.
Check to see that the C++ specific part of EH works and that constructors and destructors are being called in the EH process.

// Testcase for proper handling of
// c++ type, constructors and destructors.

#include <stdio.h>

int c, d;

struct A
{
  int i;
  A () { i = ++c; printf ("A() %d\n", i); }
  A (const A&) { i = ++c; printf ("A(const A&) %d\n", i); }
  ~A() { printf ("~A() %d\n", i); ++d; }
};

void
f()
{
  printf ("Throwing 1...\n");
  throw A();
}


int
main ()
{
  try
    {
      f();
    }
  catch (A)
    {
      printf ("Caught.\n");
    }
  printf ("c == %d, d == %d\n", c, d);
  return c != d;
}

You should get the following output:

Throwing 1...
A() 1
A(const A&) 2
~A() 1
A(const A&) 3
Caught.
~A() 3
~A() 2
c == 3, d == 3