This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug libgomp/43706] scheduling two threads on one core leads to starvation
- From: "singler at kit dot edu" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Mon, 15 Nov 2010 08:55:48 +0000
- Subject: [Bug libgomp/43706] scheduling two threads on one core leads to starvation
- Auto-submitted: auto-generated
- References: <bug-43706-4@http.gcc.gnu.org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706
--- Comment #26 from Johannes Singler <singler at kit dot edu> 2010-11-15 08:53:12 UTC ---
(In reply to comment #25)
> You might have misread what I wrote. I did not mention "35 tests"; I
> mentioned
> that a test became slower by 35%. The total number of different tests was 4
> (and each was invoked multiple times per spincount setting, indeed). One out
> of four stayed 35% slower until I increased GOMP_SPINCOUNT to 200000.
Sorry, I got that wrong.
> This makes some sense, but the job of an optimizing compiler and runtime
> libraries is to deliver the best performance they can even with somewhat
> non-optimal source code.
I agree with that in principle. But please be reminded that as is, there is
the very simple testcase posted, which takes a serious performance hit. And
repeated parallel loops like the one in the test program certainly appear very
often in real applications.
BTW: How does the testcase react to this change on your machine?
> There are plenty of real-world cases where spending
> time on application redesign for speed is unreasonable or can only be
> completed
> at a later time - yet it is desirable to squeeze a little bit of extra
> performance out of the existing code. There are also cases where more
> efficient parallelization - implemented at a higher level to avoid frequent
> switches between parallel and sequential execution - makes the application
> harder to use. To me, one of the very reasons to use OpenMP was to
> avoid/postpone that redesign and the user-visible complication for now. If I
> went for a more efficient higher-level solution, I would not need OpenMP in
> the
> first place.
OpenMP should not be regarded as "only good for loop parallelization". With
the new task construct, it is a fully-fledged parallelization substrate.
> > So I would suggest a threshold of 100000 for now.
>
> My suggestion is 250000.
Well, that's already much better than staying with 20,000,000, so I agree.
> > IMHO, something should really happen to this problem before the 4.6 release.
>
> Agreed. It'd be best to have a code fix, though.
IMHO, there is no obvious way to fix this in principle. There will always be a
compromise between busy waiting and giving back control to the OS.
Jakub, what do you plan to do about this problem?