Exception handling and embedded `gcj'
Jon Olson
olson@mmsi.com
Wed Feb 3 11:33:00 GMT 1999
I'm including the `netwinder' group on this email because
exception support on the StrongARM is currently very broken.
Hopefully, this email will stimulate some conversation regarding
this topic.
Now that I've got `gcj' happily generating code for both our A29K and
StrongARM embedded targets I thought that I'd share one of my biggest
problems.
In C++ exception handling is useful but not necessary. Most
applications don't use it and in an embedded environment you almost
always turn it off. Compiling most C++ apps on a workstation I
normally turn off exceptions because of the code bloat it causes.
Note that C++ compilers, including g++, generate alot of exception
code even if you never throw or catch an exception! This is because
C++ is required to call destructors even if some called method or
function throws an exception. Thus, each and every destructor must
get wrapped into the equivalent of a `finally' clause just in case one
of your methods throws an exception.
In Java, on the other hand, exception handling is required. All Java
code uses exceptions, most methods throw one or more exceptions, and
the API is designed to throw exceptions rather than return error
codes.
In g++, exceptions are implemented in one of two ways. In my opinion,
both of these mechanisms are not appropriate for embedded targets.
1) Range tables
This method has the advantage that it causes zero run-time overhead
if exceptions are not thrown. Throwing an exception causes
considerable run-time cost, but the idea is that you don't often
throw exceptions.
In this method, the compiler emits range tables containing the
starting address, ending address, entry code label, and
run-time type. Associated with each of these range tables is an
exception matching function which returns TRUE the currently
thrown exception matches the run-time type. Range tables
closely mirror the structure of Java class files, which also
use an exception range table to represent their exceptions.
The problem with this strategy is that it requires the run-time
environment be able to unwind the hardware call stack to determine
the PC within each frame and ultimately reload the registers
in the context of the catching method. On many, if not most,
architectures this requires debugging information be contained
within the binary that specifies which registers were saved
at what point in the stack frame. The currently supported
unwind methods for X86 and SPARC utilize DWARF-2 unwind information
which is sort of opcodes for a stack unwinding state machine.
In addition, most embedded GCC targets are not supported with
DWARF-2 unwinding information. From what I can tell, unwinding
information is currently only available for X86 and SPARC.
In a typical C++ program containing lots of methods, this unwind
information can add 50% or more to the size of the resulting
binary. All code must be compiled with exceptions enabled, or else
the entire call stack can not be unwound and exceptions will break.
In a system with libraries not compiled with the proper unwind
information which makes callbacks to methods that throw exceptions,
you lose! In a system where disk space is cheap and the unwind
information seldom gets paged in, the range table is a good
approach since it is merely a data table on disk; in an embedded
system where ROM or FLASH is expensive, the range table approach
is too costly.
2) setjmp/longjmp
This method has the advantage that it can be supported with only
the trusty setjmp/longjmp functions which are already in common
existence. Code which does not catch exceptions incurs no
performance or code size penalty and need not be compiled with
exceptions enabled.
At run-time, an exception gets pushed onto a singly linked list of
exception frames. Each exception frame contains basically just a
jmp_buf and a link to the next exception frame. Thus, for a simple
try block, the compiler emits the equivalent of the following:
ExceptionFrame frame;
int res;
frame.next = exceptionStack;
exceptionStack = &frame;
if( (res = setjmp(frame.jmp_buf)) == 0 )
{
doTryStuff();
exceptionStack = frame.next;
}
else
{
Exception *e = (Exception *)res;
exceptionStack = frame.next;
if( matches(e, type) )
doExceptionStuff();
else
rethrow(e);
}
This strategy has many disadvantages:
a) Code which catches exceptions incurs incredible code bloat.
All the exception stack manipulation, type matching, setjmp
calling, and result checking gets performed inline. The
resulting code is usually 200% larger than the same code
compiled without exceptions.
b) The setjmp/longjmp support in GCC is buggy on many targets.
On StrongARMs, C++ code can currently only be built with
exceptions disabled. Enabling exceptions causes core dumps
during inline stack manipulation. Makes it difficult when
`GDB' tells your core dump is on a line containing just
a `}' bracket.
c) setjmp() is unnecessarily slow and inefficient for supporting
an exception handling mechanism. Many implementations of
setjmp/longjmp save and restore the Unix signal mask; Even
on architectures which have a fast version, setjmp() and
longjmp() typically save the entire register set, causing
large ExceptionFrames on the stack, and slower execution.
An exception need not save the entire register set, since
GCC can easily be configured to know what registers are
clobbered on entry to an exception handler label. The stack
size bloat gets particularly severe with exceptions, since
GCC allocates a separate ExceptionFrame for each try block
encountered, even when they're not nested. For a Java
try/catch/finally block requires two ExceptionFrame objects
on the stack.
d) The `exceptionStack' list pointer must be stored in thread
local storage for this implementation to be thread safe.
Typically, GCC makes an additional function call just to
get the address of where `exceptionStack' is stored.
In summary, for Java code and C++ code that catches alot of
exceptions, the setjmp/longjmp mechanism creates far too much
code bloat, and is too inefficient and buggy for an embedded
implementation.
What I've implemented in `gcj' is a mechanism which is sort of a
hybrid between the range table approach and the setjmp approach.
Like the range table approach, it sets up exception tables which
specify code labels and types; like the setjmp approach, it explicitly
marks the beginning and end of an exception region by pushing
an ExceptionFrame onto a thread local exception stack. Here's
an example of the run-time declarations for this mechanism.
class ExceptionHandler
{
friend class ExceptionFrame;
ExceptionType *type; // Class metadata identifying the caught type
void (*handler)(); // Handler invoked for matched types
};
class ExceptionFrame
{
ExceptionFrame *next;
ExceptionHandler *handlers;
ExceptionJmpBuf jmpbuf;
public:
void push(ExceptionHandler *h);
static ExceptionFrame *pop();
static void throwException(ExceptionObject *);
u_int &pc() { return jmpbuf[0].jmpbuf[JMP_BUF_PC]; }
u_int &sp() { return jmpbuf[0].jmpbuf[JMP_BUF_SP]; }
};
The ExceptionHandler objects contain the types and entries emitted by
the compiler for each try/catch block. The compiler emits these
entries in order, with the last block containing a NULL `type' field.
If this last entry contains a `handler', then the handler points to
the start of the finally block for this try/catch sequence.
At the start of a try block, the compiler emits a call to
`ExceptionFrame::push()', passing it a stack allocated exception block;
at the end of a try block, the compiler emits a call to ExceptionFrame::pop().
This requires no arguments, since it implicitly pops the top of the
exception stack. On a RISC implementation, this typically requires
3 instructions for entry and 1 instruction for try exit. Obviously,
there are additional instructions executed by push() and pop(), but
these are very short methods and aren't expanded inline.
In both RISC implementations, I've only needed to save 3 registers
in ExceptionJmpBuf: PC, FP, and SP. Thus, the size of an ExceptionFrame
is 20 bytes. This is significantly smaller than would be required
for a complete `jmpbuf', since the compiler knows, via an `exception_receiver'
RTL pattern, that all registers get clobbered at entry to an exception
handler.
When throwing an exception, the compiler invokes
ExceptionFrame::throwException, passing it the exception being thrown.
Throwing an exception does not need to unwind the hardware call stack,
but only the exception stack. For each ExceptionFrame, it matches the
ExceptionObject's type against the handlers for that frame. If it
finds a matching type, it either advances the `handlers' to point to
the finally block or pops the exception if no finally block exists and
executes the exception handler. Thus, no explicit `pop()' is required
exception handlers. If it finds no matching type, but finds a
finally block, it pops the exception and executes the finally handler.
The finally handler need only rethrow the exception.
This exception handling mechanism has many advantages over the
existing ones used both in g++ and gcj.
1) It has zero overhead both in code size and run-time performance
if exceptions are not used.
2) It has small code size overhead (approximately 4 instructions)
when exceptions are caught.
3) It uses modest stack space requirements compared with using
setjmp/longjmp.
4) It provides a single location where exception stacks are
manipulated. This allows us to perform sanity checks on the
exception stack without inlining it everywhere.
5) It does not require the hardware dependent stack unwinding
which is costly or impossible with optimized embedded code.
6) Although slightly slower than range tables (we can't beat zero
run-time overhead), it's significantly faster than using
setjmp/longjmp because it saves fewer registers.
I have it currently working quite nicely for Java exceptions. C++
exceptions are a bit uglier, since C++ allows throwing objects instead
of just references to objects.
Comments???
--
Jon Olson, Modular Mining Systems
3289 E. Hemisphere Loop
Tucson, AZ 85706
INTERNET: olson@mmsi.com
PHONE: (520)746-9127
FAX: (520)889-5790
More information about the Java
mailing list