This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: Measuring gcc optimizations: are they well balanced?

To: root at jacob dot remcomp dot fr
Subject: Re: Measuring gcc optimizations: are they well balanced?
From: "Martin v. Loewis" <martin at loewis dot home dot cs dot tu-berlin dot de>
Date: Sun, 9 Jan 2000 14:42:42 +0100
CC: bug-gcc at gnu dot org, friedrich dot dominicus at inka dot de
References: <m127HE4-000Y2gC@jacob.remcomp.fr>

> So, I decided to measure the speed of the generated code, to see if it was
> really faster.

I had a hard time reproducing your results. You did not report the
exact source code for the function gcc, nor the version of GCC you've
been using to obtain this assembler output.

Based on the assembler output you gave, I assumed the code

#define TOP 100000
int Table[TOP];
int counter;
void gcc()
{
  for(;counter;counter--)
    {
      int i=1;
      while (i <= TOP-1){
	Table[i-1] = Table[i];
	i++;
      }
    }
}

(Using the Table definition from your "main" driver program seemed
illogical, as it a) was spelled Table, not table, b) apparently had
the wrong size (1000000 instead of 100000), and c) both lcc and gcc
decided to word-address the field, instead of using byte addressing).

Using gcc 2.95.2 -O2, I got the assembler fragment attached
below. The loops become

.L6:
	leal 4(%esi),%edx
	movl $99998,%ecx
	.p2align 4,,7
.L9:
	movl (%edx),%eax
	movl %eax,-4(%edx)
	addl $4,%edx
	decl %ecx
	jns .L9
	decl %ebx
	jnz .L6
	movl $0,counter

You'll notice a number of improvements in this code compared to the
one you've reported:
- counter is not stored-away at all, but kept in a register until
  the end of the loop
- the actual copying uses much less instructions than you had observed

I couldn't directly compare that to your results, as I don't have lcc
to compare (also, I did all this on Linux). However, I'd like to know
what exact code you've been measuring, and what the gcc 2.95.2 results
were (in case you didn't use gcc 2.95.2).

> The C code is
> 	i=0
> 	while (i < TOP-1)
> 		table[i] = table[i+1];
> gcc transforms this into
> 	i=1;
> 	while (i <= TOP-1)
> 		table[i-1] = table[i];
> 
> I pasted into the test procedure "gcc", the code generated by gcc.

Are you sure you've been pasting *this* code? Where is the
modification of counter in that code? What is TOP? and where is 'i'
incremented?

> lcc: 12.9 seconds
> gcc: 20.0 seconds
> optimized assembly: 12.9 seconds
> 
> This means that in this "optimization", gcc's optimizer is SLOWING DOWN THE
> GENERATED CODE BY ALMOST 50%!!!!

How did you get this result? To see whether *optimization* is the
cause of the slow-down, you would need to compare 'gcc -O0' and 'gcc
-O9', not 'gcc' and 'lcc'...

Whether gcc produces worse code under optimization than lcc under
optimization (or even lcc without optimization) is a different matter.

> The source of egcs-2.7.1 (approximately 1-12 MB of C) was compiled with the 
> current version of gcc under windows: 2.95.2. 

Sorry to be picky, but it seems that here is another source of
confusion?  What exactly is "egcs-2.7.1"? Is that the
'experimental/enhanced GCC' project? In that case, it only had
versions 1.0.x and 1.1.x, not 2.7.x...

> To measure the speed of the resulting compiler the same file was
> presented to gcc1: a preprocessed file of approx 300K where the is a
> HUGE case statement that takes the compiler a long time to figure
> out so that the measurements are significative. The results of the
> first run were discarded to eliminate cache effects.

Is this file available somewhere, so people can try to reproduce your
results?

> Gcc: no optimizations	15.662 seconds
> Gcc -O1                 12.548
> Gcc -O2                 12.347
> Gcc -O9                 12.217
> 
> lcc-win32 -O            13.619
> MSVC++ 4.2 -Ox          11.526
> Intel icl -Ox           12.107
> What is obvious here is:

> 	1) The difference between gcc fully optimized and lcc-win32
>           (the slowest compiler of all) is approx 10%.

That is indeed obvious from the numbers.

> 	2) The difference between gcc -O1 and gcc -O9 is approx 1% only!

That is also obvious.

> This confirms my suspicion that the poor results of gcc's optimizer
> are due to many optimizations SLOWING DOWN THE CODE INSTEAD OF
> IMPROVING IT!!!

Again, I cannot see how you arrive at this conclusion. What I see is
that -O9 generates faster code than -O1? Why are you saying that
optimization slows down the compiler, when the resulting program gets
faster????? That is not logical.

> Now the compiler has grown so complex that the maintainers are very afraid of
> taking anything out of it for fear of introducing new bugs. This makes the
> problem even worse: additions continue to go in, and nobody ever takes a
> single line out...
> 
> I present this thesis here for discussion.

Even if your analysis is true - what would be your conclusion?
I.e. what are your personal consequences of that analysis (if any),
and what do you expect to happen now that people read your message?

Regards,
Martin

	.file	"a.c"
	.version	"01.01"
gcc2_compiled.:
.text
	.align 4
.globl gcc
	.type	 gcc,@function
gcc:
	pushl %ebp
	movl %esp,%ebp
	pushl %esi
	pushl %ebx
	movl counter,%eax
	testl %eax,%eax
	je .L4
	movl $Table,%esi
	movl %eax,%ebx
	.p2align 4,,7
.L6:
	leal 4(%esi),%edx
	movl $99998,%ecx
	.p2align 4,,7
.L9:
	movl (%edx),%eax
	movl %eax,-4(%edx)
	addl $4,%edx
	decl %ecx
	jns .L9
	decl %ebx
	jnz .L6
	movl $0,counter
.L4:
	popl %ebx
	popl %esi
	movl %ebp,%esp
	popl %ebp
	ret
.Lfe1:
	.size	 gcc,.Lfe1-gcc
	.comm	Table,400000,32
	.comm	counter,4,4
	.ident	"GCC: (GNU) 2.95.2 19991024 (release)"

References:
- Measuring gcc optimizations: are they well balanced?
  - From: root

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]