This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug libgomp/49490] New: suboptimal load balancing in loops
- From: "dennis.jespersen at nasa dot gov" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Tue, 21 Jun 2011 16:48:37 +0000
- Subject: [Bug libgomp/49490] New: suboptimal load balancing in loops
- Auto-submitted: auto-generated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49490
Summary: suboptimal load balancing in loops
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: minor
Priority: P3
Component: libgomp
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: dennis.jespersen@nasa.gov
Created attachment 24573
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24573
test code to show how a compiler/runtime splits an OpenMP loop
The OpenMP runtime library produces a correct but suboptimal load balance
in parallel loops.
For example, a loop of length 33 with 8 OpenMP threads will give the
threads work of lengths 5, 5, 5, 5, 5, 5, 3, 0 respectively. This is logically
correct, but imagine a dual-socket 4 core + 4 core configuration; then
the "left" socket has 20 units of work while the "right" socket has 13
units of work. This could put undue pressure on the left cache(s) and/or
memory connection. It would be better to spread out the work as much
as possible, so in the example in question the threads would get work
of lengths 5, 4, 4, 4, 4, 4, 4, 4.
It should be fairly easy to modify libgomp/iter.c to produce the better
load balancing (at least I think that's where the modification would go).
The attached Fortran code will show the load balance; the Portland Group and
Intel products give the desired even balance.