This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] libitm: Add custom HTM fast path for RTM on x86_64.


Attached is an updated patch.  Changes explained below.

On Wed, 2013-08-21 at 10:50 -0700, Richard Henderson wrote:
> > -#if defined(USE_HTM_FASTPATH) && !defined(HTM_CUSTOM_FASTPATH)
> > +#ifdef USE_HTM_FASTPATH
> >    // HTM fastpath.  Only chosen in the absence of transaction_cancel to allow
> >    // using an uninstrumented code path.
> >    // The fastpath is enabled only by dispatch_htm's method group, which uses
> > @@ -187,6 +187,7 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb)
> >    // indeed in serial mode, and HW transactions should never need serial mode
> >    // for any internal changes (e.g., they never abort visibly to the STM code
> >    // and thus do not trigger the standard retry handling).
> > +#ifndef HTM_CUSTOM_FASTPATH
> >    if (likely(htm_fastpath && (prop & pr_hasNoAbort)))
> >      {
> >        for (uint32_t t = htm_fastpath; t; t--)
> > @@ -237,6 +238,49 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb)
> >  	    }
> >  	}
> >      }
> > +#else
> > +  // If we have a custom HTM fastpath in ITM_beginTransaction, we implement
> > +  // just the retry policy here.  We communicate with the custom fastpath
> 
> Don't you want this logic arranged as
> 
> #ifdef HTM_CUSTOM_FASTPATH
>  ... your new code
> #elif defined(USE_HTM_FASTPATH)
>  ... existing htm code
> #endif

It seems you didn't feel strongly about it (and neither do I), so I kept
this as is.  The current code arrangement has the advantage that the
general HTM fastpath comes first, and second the stuff for the custom,
more complex fastpath variants.

> > -	subq	$56, %rsp
> > -	cfi_def_cfa_offset(64)
> > +	subq	$64, %rsp
> > +	cfi_def_cfa_offset(72)
> 
> You now have an abi-misaligned stack.  Since the return address is pushed by
> the call, an aligned stack frame allocation is (N + 8) % 16 == 0.

Fixed, and added more suggestions by Richard.  We also discussed
splitting the xbegin into two paths, one for the first HTM abort --
where we'd save the jmpbuf -- and one for subsequent aborts -- where we
wouldn't need to save the jmpbuf.  But given that we only save the
jmpbuf when either going to non-HTM or after an HTM abort, the overheads
didn't seem to matter enough to justify the added complexity.

> > +  // Accessed from assembly language, thus the "asm" specifier on
> > +  // the name, avoiding complex name mangling.
> > +#ifdef __USER_LABEL_PREFIX__
> > +#define UPFX1(t) UPFX(t)
> > +#define UPFX(t) #t
> > +  static gtm_rwlock serial_lock
> > +    __asm__(UPFX1(__USER_LABEL_PREFIX__) "gtm_serial_lock");
> > +#else
> > +  static gtm_rwlock serial_lock
> > +    __asm__("gtm_serial_lock");
> > +#endif
> 
> Now that we have 3 copies of this, we should simplify things.  E.g.
> 
> #ifdef __USER_LABEL_PREFIX__
> # define UPFX1(X) #X
> # define UPFX     UPFX1(__USER_LABEL_PREFIX__)
> #else
> # define UPFX
> #endif
> 
> static gtm_rwlock serial_lock __asm__(UPFX "gtm_serial_lock");

Fixed (including the one additional indirection that was necessary ;) ).

OK for trunk?

Attachment: patch
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]