This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libgomp/43706] scheduling two threads on one core leads to starvation


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706

--- Comment #26 from Johannes Singler <singler at kit dot edu> 2010-11-15 08:53:12 UTC ---
(In reply to comment #25)
> You might have misread what I wrote.  I did not mention "35 tests"; I 
> mentioned
> that a test became slower by 35%.  The total number of different tests was 4
> (and each was invoked multiple times per spincount setting, indeed).  One out
> of four stayed 35% slower until I increased GOMP_SPINCOUNT to 200000.

Sorry, I got that wrong.  

> This makes some sense, but the job of an optimizing compiler and runtime
> libraries is to deliver the best performance they can even with somewhat
> non-optimal source code.  

I agree with that in principle.  But please be reminded that as is, there is
the very simple testcase posted, which takes a serious performance hit.  And
repeated parallel loops like the one in the test program certainly appear very
often in real applications.
BTW:  How does the testcase react to this change on your machine?

> There are plenty of real-world cases where spending
> time on application redesign for speed is unreasonable or can only be 
> completed
> at a later time - yet it is desirable to squeeze a little bit of extra
> performance out of the existing code.  There are also cases where more
> efficient parallelization - implemented at a higher level to avoid frequent
> switches between parallel and sequential execution - makes the application
> harder to use.  To me, one of the very reasons to use OpenMP was to
> avoid/postpone that redesign and the user-visible complication for now.  If I
> went for a more efficient higher-level solution, I would not need OpenMP in 
> the
> first place.

OpenMP should not be regarded as "only good for loop parallelization".  With
the new task construct, it is a fully-fledged parallelization substrate.

> > So I would suggest a threshold of 100000 for now.
> 
> My suggestion is 250000.

Well, that's already much better than staying with 20,000,000, so I agree.

> > IMHO, something should really happen to this problem before the 4.6 release.
> 
> Agreed.  It'd be best to have a code fix, though.

IMHO, there is no obvious way to fix this in principle.  There will always be a
compromise between busy waiting and giving back control to the OS.

Jakub, what do you plan to do about this problem?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]