Exception handling and embedded `gcj'

Jon Olson olson@mmsi.com
Wed Feb 3 11:33:00 GMT 1999


I'm including the `netwinder' group on this email because
exception support on the StrongARM is currently very broken.
Hopefully, this email will stimulate some conversation regarding
this topic.

Now that I've got `gcj' happily generating code for both our A29K and
StrongARM embedded targets I thought that I'd share one of my biggest
problems.

In C++ exception handling is useful but not necessary.  Most
applications don't use it and in an embedded environment you almost
always turn it off.  Compiling most C++ apps on a workstation I
normally turn off exceptions because of the code bloat it causes.
Note that C++ compilers, including g++, generate alot of exception
code even if you never throw or catch an exception!  This is because
C++ is required to call destructors even if some called method or
function throws an exception.  Thus, each and every destructor must
get wrapped into the equivalent of a `finally' clause just in case one
of your methods throws an exception.

In Java, on the other hand, exception handling is required.  All Java
code uses exceptions, most methods throw one or more exceptions, and
the API is designed to throw exceptions rather than return error
codes.

In g++, exceptions are implemented in one of two ways.  In my opinion,
both of these mechanisms are not appropriate for embedded targets.

1) Range tables

   This method has the advantage that it causes zero run-time overhead
   if exceptions are not thrown.  Throwing an exception causes
   considerable run-time cost, but the idea is that you don't often
   throw exceptions.

   In this method, the compiler emits range tables containing the
   starting address, ending address, entry code label, and
   run-time type.  Associated with each of these range tables is an
   exception matching function which returns TRUE the currently
   thrown exception matches the run-time type.  Range tables
   closely mirror the structure of Java class files, which also
   use an exception range table to represent their exceptions.

   The problem with this strategy is that it requires the run-time
   environment be able to unwind the hardware call stack to determine
   the PC within each frame and ultimately reload the registers
   in the context of the catching method.  On many, if not most,
   architectures this requires debugging information be contained
   within the binary that specifies which registers were saved
   at what point in the stack frame.  The currently supported
   unwind methods for X86 and SPARC utilize DWARF-2 unwind information
   which is sort of opcodes for a stack unwinding state machine.
   In addition, most embedded GCC targets are not supported with
   DWARF-2 unwinding information.  From what I can tell, unwinding
   information is currently only available for X86 and SPARC.

   In a typical C++ program containing lots of methods, this unwind
   information can add 50% or more to the size of the resulting
   binary.  All code must be compiled with exceptions enabled, or else
   the entire call stack can not be unwound and exceptions will break.
   In a system with libraries not compiled with the proper unwind
   information which makes callbacks to methods that throw exceptions,
   you lose!  In a system where disk space is cheap and the unwind
   information seldom gets paged in, the range table is a good
   approach since it is merely a data table on disk; in an embedded
   system where ROM or FLASH is expensive, the range table approach
   is too costly.

2) setjmp/longjmp

   This method has the advantage that it can be supported with only
   the trusty setjmp/longjmp functions which are already in common
   existence.  Code which does not catch exceptions incurs no
   performance or code size penalty and need not be compiled with
   exceptions enabled.

   At run-time, an exception gets pushed onto a singly linked list of
   exception frames.  Each exception frame contains basically just a
   jmp_buf and a link to the next exception frame.  Thus, for a simple
   try block, the compiler emits the equivalent of the following:

   ExceptionFrame frame;
   int res;

   frame.next = exceptionStack;
   exceptionStack = &frame;
   if( (res = setjmp(frame.jmp_buf)) == 0 )
   {
      doTryStuff();
      exceptionStack = frame.next;
   }
   else
   {
      Exception *e = (Exception *)res;

      exceptionStack = frame.next;
      if( matches(e, type) )
	doExceptionStuff();
      else
        rethrow(e);
   }

   This strategy has many disadvantages:

   a) Code which catches exceptions incurs incredible code bloat.
      All the exception stack manipulation, type matching, setjmp
      calling, and result checking gets performed inline.  The
      resulting code is usually 200% larger than the same code
      compiled without exceptions.
   b) The setjmp/longjmp support in GCC is buggy on many targets.
      On StrongARMs, C++ code can currently only be built with
      exceptions disabled.  Enabling exceptions causes core dumps
      during inline stack manipulation.  Makes it difficult when
      `GDB' tells your core dump is on a line containing just
      a `}' bracket.
   c) setjmp() is unnecessarily slow and inefficient for supporting
      an exception handling mechanism.  Many implementations of
      setjmp/longjmp save and restore the Unix signal mask;  Even
      on architectures which have a fast version, setjmp() and
      longjmp() typically save the entire register set, causing
      large ExceptionFrames on the stack, and slower execution.
      An exception need not save the entire register set, since
      GCC can easily be configured to know what registers are
      clobbered on entry to an exception handler label.  The stack
      size bloat gets particularly severe with exceptions, since
      GCC allocates a separate ExceptionFrame for each try block
      encountered, even when they're not nested.  For a Java
      try/catch/finally block requires two ExceptionFrame objects
      on the stack.
   d) The `exceptionStack' list pointer must be stored in thread
      local storage for this implementation to be thread safe.
      Typically, GCC makes an additional function call just to
      get the address of where `exceptionStack' is stored.

   In summary, for Java code and C++ code that catches alot of
   exceptions, the setjmp/longjmp mechanism creates far too much
   code bloat, and is too inefficient and buggy for an embedded
   implementation.

What I've implemented in `gcj' is a mechanism which is sort of a
hybrid between the range table approach and the setjmp approach.
Like the range table approach, it sets up exception tables which
specify code labels and types; like the setjmp approach, it explicitly
marks the beginning and end of an exception region by pushing
an ExceptionFrame onto a thread local exception stack.  Here's
an example of the run-time declarations for this mechanism.


    class ExceptionHandler
    {
      friend class ExceptionFrame;
      ExceptionType *type;	// Class metadata identifying the caught type
      void (*handler)();	// Handler invoked for matched types
    };

    class ExceptionFrame
    {
      ExceptionFrame *next;
      ExceptionHandler *handlers;
      ExceptionJmpBuf jmpbuf;
    public:
      void push(ExceptionHandler *h);
      static ExceptionFrame *pop();
      static void throwException(ExceptionObject *);
      u_int &pc() { return jmpbuf[0].jmpbuf[JMP_BUF_PC]; }
      u_int &sp() { return jmpbuf[0].jmpbuf[JMP_BUF_SP]; }
    };

The ExceptionHandler objects contain the types and entries emitted by
the compiler for each try/catch block.  The compiler emits these
entries in order, with the last block containing a NULL `type' field.
If this last entry contains a `handler', then the handler points to
the start of the finally block for this try/catch sequence.

At the start of a try block, the compiler emits a call to
`ExceptionFrame::push()', passing it a stack allocated exception block;
at the end of a try block, the compiler emits a call to ExceptionFrame::pop().
This requires no arguments, since it implicitly pops the top of the
exception stack.  On a RISC implementation, this typically requires
3 instructions for entry and 1 instruction for try exit.  Obviously,
there are additional instructions executed by push() and pop(), but
these are very short methods and aren't expanded inline.

In both RISC implementations, I've only needed to save 3 registers
in ExceptionJmpBuf: PC, FP, and SP.  Thus, the size of an ExceptionFrame
is 20 bytes.  This is significantly smaller than would be required
for a complete `jmpbuf', since the compiler knows, via an `exception_receiver'
RTL pattern, that all registers get clobbered at entry to an exception
handler.

When throwing an exception, the compiler invokes
ExceptionFrame::throwException, passing it the exception being thrown.
Throwing an exception does not need to unwind the hardware call stack,
but only the exception stack.  For each ExceptionFrame, it matches the
ExceptionObject's type against the handlers for that frame.  If it
finds a matching type, it either advances the `handlers' to point to
the finally block or pops the exception if no finally block exists and
executes the exception handler.  Thus, no explicit `pop()' is required
exception handlers.  If it finds no matching type, but finds a
finally block, it pops the exception and executes the finally handler.
The finally handler need only rethrow the exception.

This exception handling mechanism has many advantages over the
existing ones used both in g++ and gcj.

  1) It has zero overhead both in code size and run-time performance
     if exceptions are not used.
  2) It has small code size overhead (approximately 4 instructions)
     when exceptions are caught.
  3) It uses modest stack space requirements compared with using
     setjmp/longjmp.
  4) It provides a single location where exception stacks are
     manipulated.  This allows us to perform sanity checks on the
     exception stack without inlining it everywhere.
  5) It does not require the hardware dependent stack unwinding
     which is costly or impossible with optimized embedded code.
  6) Although slightly slower than range tables (we can't beat zero
     run-time overhead), it's significantly faster than using
     setjmp/longjmp because it saves fewer registers.

I have it currently working quite nicely for Java exceptions.  C++
exceptions are a bit uglier, since C++ allows throwing objects instead
of just references to objects.

Comments???

-- 
Jon Olson, Modular Mining Systems
	   3289 E. Hemisphere Loop
	   Tucson, AZ 85706
INTERNET:  olson@mmsi.com
PHONE:     (520)746-9127
FAX:       (520)889-5790



More information about the Java mailing list