This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] add self-tuning to x86 hardware fast path in libitm

From: Andi Kleen <andi at firstfloor dot org>
To: Nuno Diegues <nmld at ist dot utl dot pt>
Cc: Andi Kleen <andi at firstfloor dot org>, gcc-patches at gcc dot gnu dot org
Date: Wed, 8 Apr 2015 19:54:57 +0200
Subject: Re: [PATCH] add self-tuning to x86 hardware fast path in libitm
Authentication-results: sourceware.org; auth=none
References: <CAALS4mout4o-yVi8WgdL=EKHFLV+o2RSHaicRB7mHraQnQJhwQ at mail dot gmail dot com> <87y4m2ogvi dot fsf at tassilo dot jf dot intel dot com> <CAALS4mrVbO=TNZ7ynteXpayQYeHagODQy97qP0LL0NSy50X2ug at mail dot gmail dot com>

> On the STAMP suite of benchmarks for transactional memory (described here [1]).
> I have ran an unmodified GCC 5.0.0 against the patched GCC with these
> modifications and obtain the following speedups in STAMP with 4
> threads (on a Haswell with 4 cores, average 10 runs):

I expect you'll need different tunings on larger systems.

> That is a good point. While I haven't ever used fixed point
> arithmetic, a cursory inspection reveals that it does make sense and
> seems applicable to this case.
> Are you aware of some place where this is being done already within
> GCC that I could use as inspiration, or should I craft some macros
> from scratch for this?

I believe the inliner uses fixed point. Own macros should be fine too.

> > > +  int32_t last_attempts = optimizer.last_attempts;
> > > +  int32_t current_attempts = optimizer.optimized_attempts;
> > > +  int32_t new_attempts = current_attempts;
> > > +  if (unlikely(change_for_worse > 1.40))
> > > +    {
> > > +      optimizer.optimized_attempts = optimizer.best_ever_attempts;
> > > +      optimizer.last_throughput = current_throughput;
> > > +      optimizer.last_attempts = current_attempts;
> > > +      return;
> > > +    }
> > > +
> > > +  if (unlikely(random() % 100 < 1))
> > > +    {
> >
> > So where is the seed for that random stored? Could you corrupt some
> > user's random state? Is the state per thread or global?
> > If it's per thread how do you initialize so that they threads do
> > start with different seeds.
> > If it's global what synchronizes it?
> 
> As I do not specify any seed, I was under the impression that there
> would be a default initialization. Furthermore, the posix
> documentation specifies random() to be MT-safe, so I assumed its
> internal state to be per-thread.
> Did I mis-interpret this?

Yes, that's right. But it's very nasty to change the users RNG state.
A common pattern for repeatable benchmarks is to start with srand(1) 
and then use the random numbers to run the benchmark, so it always does
the same thing. If you non deterministically (transaction aborts are not
deterministic) change the random state it will make the benchmark not
repeatable anymore.  You'll need to use an own RNG state that it independent.

It would be good to see if any parts of the algorithm can be
simplified. In general in production software the goal is to have
the simplest algorithm that does the job.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

Follow-Ups:
- Re: [PATCH] add self-tuning to x86 hardware fast path in libitm
  - From: Nuno Diegues

References:
- [PATCH] add self-tuning to x86 hardware fast path in libitm
  - From: Nuno Diegues
- Re: [PATCH] add self-tuning to x86 hardware fast path in libitm
  - From: Andi Kleen
- Re: [PATCH] add self-tuning to x86 hardware fast path in libitm
  - From: Nuno Diegues

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]