This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Benchmarking theory


Toon Moene wrote:

> "Joseph S. Myers" wrote:
>
> > Benchmark results seem to get posted to the gcc list as single figures for
> > a test and old and new compilers, with assertions that results seem
> > significant or are consistent between runs.  Why are benchmarks done on
> > this basis rather than using actual statistical significance tests?
>
> Perhaps because we haven't included specific benchmarking tests into our
> release criteria ?
>
> > Could someone point me to appropriate references on the theory of
> > benchmarking that explain this?
>
> Tsk.  My theory of benchmarking is:
>
> 1. Take you own application.
>
> 2. Constuct a sample self-contained application out of it.
>
> 3. Ship it to prospective hardware sellers.
>
> 4. Rank results.
>
> 5. Buy.
>
> OK - simplistic, but it works.

I think Joe's point is that people aren't doing real statistics on the results.
For example, with just 2 data points
(time on one compiler and the other) you don't have what is known as "power" to
distinguish whether the difference is actually significant or not.  I certainly
wouldn't trust the hardware vendors to do real statistics either-- they'd just
run it a 100 times, and give you back their best result, even if they only got
that once.  I'd like, at the very least, to see the median, mode, and standard
deviation of scores.  Is your distribution of scores normal?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]