[PATCH] libitm: Add custom HTM fast path for RTM on x86_64.

This patch adds a custom HTM fast path for RTM on x86_64, which moves
the core HTM fast path bits from gtm_thread::begin_transaction into the
x86-specific ITM_beginTransaction implementation.  It extends/changes
the previous patch by Andi:

The custom fast path decreases the overheads of using HW transactions.
gtm_thread::begin_transaction remains responsible for handling the retry
policy after aborts of HW transactions, including when to switch to the
fallback execution method.  Right now, the C++ retry code isn't aware of
the specific abort reason but just counts the number of retries for a
particular transaction; it might make sense to add this in the future.

Tested on Haswell with microbenchmarks and STAMP Vacation.  OK for
trunk?  (Please take a closer look at the asm pieces of this.)

(I've seen failures for STAMP Genome during my recent tests, but those
happen also with just ITM_DEFAULT_METHOD=serialirr and a single thread,
and AFAICT they don't seem to be related to the changes in
_ITM_beginTransaction.  I'll have a look...)

Andreas and Peter: Is this sufficient as a proof of concept for custom
fast paths on your architectures, or would you like to see any changes?


