This is the mail archive of the
mailing list for the GCC project.
Re: Failure to dlopen libgomp due to static TLS data
- From: Andrew Haley <aph at redhat dot com>
- To: Rich Felker <dalias at libc dot org>
- Cc: Jakub Jelinek <jakub at redhat dot com>, Ulrich Weigand <uweigand at de dot ibm dot com>, libc-alpha at sourceware dot org, gcc at gcc dot gnu dot org, rth at redhat dot com
- Date: Fri, 13 Feb 2015 09:12:41 +0000
- Subject: Re: Failure to dlopen libgomp due to static TLS data
- Authentication-results: sourceware.org; auth=none
- References: <201502121519 dot t1CFJMAe018776 at d03av02 dot boulder dot ibm dot com> <20150212160959 dot GS23507 at brightrain dot aerifal dot cx> <20150212161145 dot GD1746 at tucnak dot redhat dot com> <20150212161617 dot GU23507 at brightrain dot aerifal dot cx> <54DCEF90 dot 6090700 at redhat dot com> <20150212232756 dot GZ23507 at brightrain dot aerifal dot cx>
On 12/02/15 23:27, Rich Felker wrote:
> On Thu, Feb 12, 2015 at 06:23:12PM +0000, Andrew Haley wrote:
>> On 02/12/2015 04:16 PM, Rich Felker wrote:
>>> On Thu, Feb 12, 2015 at 05:11:45PM +0100, Jakub Jelinek wrote:
>>>> On Thu, Feb 12, 2015 at 11:09:59AM -0500, Rich Felker wrote:
>>>>> This usage is supposed to be deprecated. Why isn't libgomp using
>>>>> TLSDESC/gnu2 model?
>>>> Because it is significantly slower.
>>> Seems very unlikely. If storage is allocated in static TLS, TLSDESC is
>>> almost indistinguishable from IE in performance, even when you run
>>> artificial benchmarks that do nothing but hammer TLS access. When it
>>> gets allocated in dynamic TLS, it's somewhat slower, but still
>>> unlikely to matter for most usage IMO.
>> The problem I'm seeing is that dynamic TLS is always used even when not
>> necessary, and that hurts Java (which accesses TLS 128k times in the first
>> 500ms or so of execution). According to lxo his patch fixes that.
> Given those numbers, each access would need to be taking 38ns to
> consume even 1% of the cpu time being spent. I would guess accesses
> are closer to 5ns for TLSDESC in static area and 10-15ns for dynamic.
> So I don't think this is a botteneck.
I'm totally unconvinced by this style of argument. An efficient system
is composed of many small optimizations, each apparently insignificant
in itself. Your figures indicate that this slowdown may be about 0.5%.
0.5% is not small. I put in a lot of work to gain 0.5%.