This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

speed of simple loops on x86_64 using opencc vs gcc


Hi,

I run some tests of simple number-crunching loops whenever new
architectures and compilers arise.

These tests on recent Intel architectures show similar performance
between gcc and icc compilers, at full optimization.

However a recent test on x86_64 showed the open64 compiler
outstripping gcc by a factor of 2 to 3.  I tried all the obvious
flags; nothing helped.

Versions: gcc 4.5.2, Open64 4.2.5.  AMD Phenom(tm) II X4 840 Processor.

A peek in the assembler makes it clear though.  Even with -O3, gcc is
not unrolling loops in this code, but opencc does, and profits.

Attached find the C file. It's not pretty but the guts are in the
small routine double_array_mults_by_const().

For your convenience, also attached is the assembler for the innermost
loop, generated by the two compilers with the -S flag.
-----------------------------------------------------------------------
Building and running:

$ gcc --std=c99 -O3 -Wall -pedantic mults_by_const.c
$ ./a.out
double array mults by const             450 ms [  1.013193]

$ opencc -std=c99 -O3 -Wall mults_by_const.c
$ ./a.out
double array mults by const             170 ms [  1.013193]
-----------------------------------------------------------------------
Now, the gcc -O3 should have turned on loop unrolling. I tried turning
it on explicitly without success.

By the way, I also tried.  No difference.
	-march=native
and
	-ffast-math
did not affect the time at all.

Cheers!
#ifdef __ICC
  #include <mathimf.h>
#else
  #include <math.h>
#endif

/* timer stuff ------------------------------------------------ */
#include <sys/time.h>
#include <stdio.h>
#define __USE_XOPEN2K	1
#include <stdlib.h>
#include <sys/resource.h>

static const int who = RUSAGE_SELF;
static struct rusage local;
static time_t tv_sec;
#define START_CLOCK() getrusage(who, &local)
#define MS_SINCE( ) ( tv_usec = local.ru_utime.tv_usec, tv_sec = local.ru_utime.tv_sec, \
			getrusage( who, &local), \
			(long)( ( local.ru_utime.tv_sec - tv_sec ) * 1000 \
				+ ( local.ru_utime.tv_usec - tv_usec ) / 1000 ) )

#ifdef __suseconds_t_defined
static suseconds_t tv_usec;
#else
static long tv_usec;
#endif

/* test parameters ------------------------------------------------ */
enum {
	ITERATIONS = 131072,
	size = 8192
};

static void double_array_mults_by_const( double dvec[] );

int
main( int argc, char *argv[] )
{
	double	* restrict dvec = 0;
	void	**dvecptr = (void **)&dvec;

	if( 0 == posix_memalign( dvecptr, 16, size * sizeof(double) ) )
	{
		double_array_mults_by_const( dvec );
	}

	return 0;
}

void
double_array_mults_by_const( double * restrict dvec )
{
	long		i, j;
	const double	dval = 1.0000001;

	for( i = 0; i < size; i++ )
		dvec[i] = 1.0;

	START_CLOCK();

	for( j = 0; j < ITERATIONS; j++ )
		for( i = 0; i < size; i++ )
			dvec[i] *= dval;
	
	printf( "%-38s %4ld ms [%10.6f]\n",
			"double array mults by const", MS_SINCE(), dvec[0] );
}

Attachment: gcc.asm
Description: Binary data

Attachment: opencc.asm
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]