[RFC] Old school parallelization of WPA streaming
Richard Biener
rguenther@suse.de
Wed Aug 21 15:58:00 GMT 2013
Andi Kleen <ak@linux.intel.com> wrote:
>On Wed, Aug 21, 2013 at 04:17:48PM +0200, Jan Hubicka wrote:
>> Hi,
>> this is my attempt to bring GCC into wonderful era of multicore CPUs
>:)
>> It is a hack, but it seems to help quite a lot. About 50% of WPA
>time is spent
>> by streaming the individual ltrans .o files. This can be easily
>parallelized
>> by fork - we do nothing afterwards, just exit and pass the list to
>the linker.
>
>One risk is if someone streams to a spinning disk it may add more seeks
>for
>the parallel IO. But I think it's a reasonable tradeoffs.
It'll also wreck all WPA dump files.
>We should also use a faster compressor
And we should avoid uncompressing the function sections...
That said, the patch is enough of a hack that I don't ever want to debug a bug in it....
I also fail to see why threads should not work here. Maybe simply annotate gcc with openmp?
Richard.
>> For -flto=jobserver I simply fork all 32 processes. It may not be a
>disaster,?
>> but perhaps we should figure out how to communicate with jobserver.
>At first
>> glance on document on how it works, it seems easy to add. Perhaps we
>can even
>> convicne GNU Make folks to put simple helpers to libiberty?
>
>lto=jobserver is still broken and confuses tokens on large builds (ends
>with a 0 read) I did some debugging recently, and I suspect a Linux
>kernel
>bug now. Still haven't tracked it down.
>
>Any workarounds would need make changs unfortunately.
>
>>
>> We also may figure out number of CPUs (is it available i.e. from
>libgomp)
>
>sysconf(_SC_NPROCESSORS_ONLN) ?
>
>> and use it by default even if user do not care to pass number of
>processes.
>> Naturally these streaming forks should be cheap memory wise. I hope
>Martin
>> will get me some actual numbers.
>>
>> With the patch the WPA time of firefox goes down to 2 minutes (4.8
>needs about
>> 30 minutes and without the hack one needs about 5 minutes)
>
>Cool!
>
>I'll try it on my builds
>>
>> +fparallelism=
>> +LTO Joined
>> +Run the link-time optimizer in whole program analysis (WPA) mode.
>
>The description does not make sense
>
>Rest of patch looks good from a quick read, although I would prefer to
>do the waiting for children in the "parent", not the "last one"
>
>-Andi
More information about the Gcc-patches
mailing list