This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: gcc compile-time performance

From: Will Cohen <wcohen at redhat dot com>
To: Jan Hubicka <jh at suse dot cz>
Cc: "David S. Miller" <davem at redhat dot com>, shebs at apple dot com, dberlin at dberlin dot org, dhazeghi at pacbell dot net, neil at daikokuya dot demon dot co dot uk, ak at suse dot de, gcc at gcc dot gnu dot org
Date: Fri, 17 May 2002 15:25:10 -0400
Subject: Re: gcc compile-time performance
Organization: Red Hat, Inc.
References: <Pine.LNX.4.44.0205171159420.5015-100000@dberlin.org> <3CE52EFB.68C0809D@apple.com> <20020517.095753.17350203.davem@redhat.com> <20020517171624.GP22447@atrey.karlin.mff.cuni.cz>

I am not so sure divides are what kill it. -Will

Jan Hubicka wrote:

>>   From: Stan Shebs <shebs@apple.com>
>>   Date: Fri, 17 May 2002 09:25:32 -0700
>>   
>>   That's my personal suspicion too, but no, I don't have any real
>>   evidence.  The lack of hot spots in profiling is a strong hint.
>>   One oddball idea I've thought about is to functionize all the
>>   tree and rtl macros, and run a profile on that to see what are
>>   the most used/abused macros.
>>   
>>I know that the subreg-byte changes added a lot of overhead
>>particularly via the subreg_regno_offset() function (which was
>>an inline macro in my original diffs).
>>
> 
> Do you have some data?  Perhaps we can replace the division by simple lookup
> table...
> 
> Honza
> 
>>The divisions are what kill it.  That overhead could be eliminated
>>if all the mode sizes were powers of 2 and we had some
>>GET_MODE_SIZE_LOG2() interface.  Then we just transform all the
>>divides there into shifts.
>>
>>   Then there's the extreme approach of having maintainers only
>>   accept patches that either remove code or make the compiler run
>>   faster... :-)
>>
>>There is a better way, have maintainers work on approval of such
>>changes faster than approval of other changes :-)
>>

Given the comment that divides were taking a significant time, I
decided to get some data to see how much of a problem divides were. I
checked out the gcc_3_1_release from gcc.gnu.org. I configured it to
build a native compiler on a RH Linux 7.2 running on an Inspiron 4100
with a 1GHz mobile Pentium III processor, 256MB DRAM, and a 40GB hard
drive. I built the tool. I then went into the build/gcc directory,
"make clean; time make bootstrap" while having oprofile take
measurement using the CPU_CLK_UNHALTED (counter 0) and DIV (counter
1).

Result of the "time make bootstrap" command:
real	25m10.672s
user	23m51.240s
sys	0m32.500s

The real time of 25m10s is 1510 second, there were 32108 samples for
divides on /home/wcohen/gcc31/native/gcc/stage1/cc1 with each sample
representing 4980 divide operations (both floating point and
integer). This means there were 73,991 samples total, 368x10^6 divides
on the entire system.  Assuming worst case divides from Pentium 4
software optimization manual 70 cycles per divide and 1GHz this would
be 25.8 seconds of runtime, about 1.7% of the run time for the entire
system.  The 1.7% is a pretty pessemistic estimate of the time.  That
70 cycles is the latency of the idiv, throughput on the Pentium 4 is
23 cycles.

Of course this data assumes that the bootstrap process represents
typical program behavior and is exercising the same parts of gcc that
other programs are.

-Will

[root@litespeed root]# op_time -r -c 1|more /* DIV 4980 per sample */
32108     43.3945 0.0000 /home/wcohen/gcc31/native/gcc/stage1/cc1
20852     28.1818 0.0000 /home/wcohen/gcc31/native/gcc/stage2/cc1
10893     14.7221 0.0000 /home/wcohen/gcc31/native.install/lib/gcc-lib/i686-pc-linux-gnu/3.1/cc1
5886       7.9550 0.0000 /usr/bin/as
2366       3.1977 0.0000 /lib/modules/2.4.9-21custom/build/vmlinux
731        0.9880 0.0000 /usr/bin/ld
259        0.3500 0.0000 /lib/ext3.o
233        0.3149 0.0000 /usr/bin/make
123        0.1662 0.0000 /home/wcohen/gcc31/native/gcc/cc1
...

[root@litespeed root]# op_time -r -c 0 |more /* CPU_CLK_UNHALTED 498000 per */
1287676   51.8822 0.0000 /home/wcohen/gcc31/native/gcc/stage1/cc1
680858    27.4327 0.0000 /home/wcohen/gcc31/native/gcc/stage2/cc1
239265     9.6403 0.0000 /home/wcohen/gcc31/native.install/lib/gcc-lib/i686-pc-linux-gnu/3.1/cc1
92502      3.7270 0.0000 /usr/bin/as
63091      2.5420 0.0000 /lib/modules/2.4.9-21custom/build/vmlinux
34034      1.3713 0.0000 /home/wcohen/gcc31/native/gcc/genattrtab
28366      1.1429 0.0000 /home/wcohen/gcc31/native/gcc/fixinc/fixincl
9777       0.3939 0.0000 /usr/bin/ld
9127       0.3677 0.0000 /home/wcohen/gcc31/native/gcc/cc1
7339       0.2957 0.0000 /usr/lib/mozilla/mozilla-bin
....

References:
- Re: gcc compile-time performance
  - From: Daniel Berlin
- Re: gcc compile-time performance
  - From: Stan Shebs
- Re: gcc compile-time performance
  - From: David S. Miller
- Re: gcc compile-time performance
  - From: Jan Hubicka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]