This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/68718] New: libgomp c.exp with -fipa-pta hang in sort-1.c
- From: "vries at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Sat, 05 Dec 2015 14:26:22 +0000
- Subject: [Bug tree-optimization/68718] New: libgomp c.exp with -fipa-pta hang in sort-1.c
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68718
Bug ID: 68718
Summary: libgomp c.exp with -fipa-pta hang in sort-1.c
Product: gcc
Version: 6.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vries at gcc dot gnu.org
Target Milestone: ---
Created attachment 36932
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36932&action=edit
minimal sort-1.c
The test hangs, because in this loop, we optimize away the test on busy:
...
while (size_int_pair_stack (&global_stack) == 0
&& busy)
busy_wait ();
...
Before the PR46043 fix, the loop looks like this just before fre2:
...
;; basic block 27, loop depth 3, count 0, freq 10000, maybe hot
;; prev block 26, next block 28, flags: (NEW, REACHABLE)
;; pred: 26 [100.0%] (FALLTHRU,EXECUTABLE)
;; 29 [100.0%] (FALLTHRU,DFS_BACK,EXECUTABLE)
# .MEM_10 = PHI <.MEM_88(26), .MEM_92(29)>
# VUSE <.MEM_10>
# PT = nonlocal unit-escaped
_89 = MEM[(struct .omp_data_s.5D.3564 &).omp_data_i_16(D) clique 1 base
1].global_stackD.3576;
# VUSE <.MEM_10>
# USE = nonlocal unit-escaped { D.3656 } (escaped)
_90 = size_int_pair_stackD.3436 (_89);
if (_90 == 0)
goto <bb 28>;
else
goto <bb 20>;
;; succ: 28 [95.5%] (TRUE_VALUE,EXECUTABLE)
;; 20 [4.5%] (FALSE_VALUE,EXECUTABLE)
;; basic block 28, loop depth 3, count 0, freq 9550, maybe hot
;; prev block 27, next block 29, flags: (NEW, REACHABLE)
;; pred: 27 [95.5%] (TRUE_VALUE,EXECUTABLE)
# VUSE <.MEM_10>
_91 = MEM[(struct .omp_data_s.5D.3564 &).omp_data_i_16(D) clique 1 base
1].busyD.3574;
if (_91 != 0)
goto <bb 29>;
else
goto <bb 20>;
;; succ: 29 [95.5%] (TRUE_VALUE,EXECUTABLE)
;; 20 [4.5%] (FALSE_VALUE,EXECUTABLE)
;; basic block 29, loop depth 3, count 0, freq 9120, maybe hot
;; prev block 28, next block 30, flags: (NEW, REACHABLE)
;; pred: 28 [95.5%] (TRUE_VALUE,EXECUTABLE)
# .MEM_92 = VDEF <.MEM_10>
# USE = nonlocal unit-escaped null { D.3473 D.3474 D.3566 D.3567 D.3650
D.3653 D.3654 D.3656 D.3807 } (escaped)
# CLB = nonlocal unit-escaped null { D.3473 D.3474 D.3566 D.3567 D.3650
D.3653 D.3654 D.3656 D.3807 } (escaped)
busy_waitD.3439 ();
goto <bb 27>;
;; succ: 27 [100.0%] (FALLTHRU,DFS_BACK,EXECUTABLE)
...
And the fix for PR46032 has the following effect:
...
;; basic block 27, loop depth 3, count 0, freq 10000, maybe hot
;; prev block 26, next block 28, flags: (NEW, REACHABLE)
;; pred: 26 [100.0%] (FALLTHRU,EXECUTABLE)
;; 29 [100.0%] (FALLTHRU,DFS_BACK,EXECUTABLE)
# .MEM_10 = PHI <.MEM_88(26), .MEM_92(29)>
# VUSE <.MEM_10>
- # PT = nonlocal unit-escaped
+ # PT = { D.3474 } (escaped)
_89 = MEM[(struct .omp_data_s.5D.3564 &).omp_data_i_16(D) clique 1 base
1].global_stackD.3576;
# VUSE <.MEM_10>
- # USE = nonlocal unit-escaped { D.3656 } (escaped)
+ # USE = { D.3474 D.3656 } (escaped)
_90 = size_int_pair_stackD.3436 (_89);
if (_90 == 0)
goto <bb 28>;
else
goto <bb 20>;
;; succ: 28 [95.5%] (TRUE_VALUE,EXECUTABLE)
;; 20 [4.5%] (FALSE_VALUE,EXECUTABLE)
;; basic block 28, loop depth 3, count 0, freq 9550, maybe hot
;; prev block 27, next block 29, flags: (NEW, REACHABLE)
;; pred: 27 [95.5%] (TRUE_VALUE,EXECUTABLE)
# VUSE <.MEM_10>
_91 = MEM[(struct .omp_data_s.5D.3564 &).omp_data_i_16(D) clique 1 base
1].busyD.3574;
if (_91 != 0)
goto <bb 29>;
else
goto <bb 20>;
;; succ: 29 [95.5%] (TRUE_VALUE,EXECUTABLE)
;; 20 [4.5%] (FALSE_VALUE,EXECUTABLE)
;; basic block 29, loop depth 3, count 0, freq 9120, maybe hot
;; prev block 28, next block 30, flags: (NEW, REACHABLE)
;; pred: 28 [95.5%] (TRUE_VALUE,EXECUTABLE)
# .MEM_92 = VDEF <.MEM_10>
- # USE = nonlocal unit-escaped null { D.3473 D.3474 D.3566 D.3567 D.3650
D.3653 D.3654 D.3656 D.3807 } (escaped)
- # CLB = nonlocal unit-escaped null { D.3473 D.3474 D.3566 D.3567 D.3650
D.3653 D.3654 D.3656 D.3807 } (escaped)
+ # USE = nonlocal unit-escaped null { D.3473 D.3474 D.3656 } (escaped)
+ # CLB = nonlocal unit-escaped null { D.3473 D.3474 D.3656 } (escaped)
busy_waitD.3439 ();
goto <bb 27>;
;; succ: 27 [100.0%] (FALLTHRU,DFS_BACK,EXECUTABLE)
...
AFICT, the new alias information is correct.
And the optimization by fre looks reasonable: the variable busy does not seem
to be modified by anything in the loop, so it removes the test on busy from the
loop.
But in fact, busy is a shared variable, so it can be modified by one of the
other threads:
...
int busy = 1;
int num_threads;
omp_init_lock (&lock);
init_int_pair_stack (&global_stack);
#pragma omp parallel firstprivate (array, count)
{
....
In other words, inside the parallel region, num_threads behaves as a volatile
variable (it may return different values when read twice, while nothing seems
to be writing to it inbetween the two reads).
So AFAIU, we need to mark the uses of shared variables in openmp regions as
volatile.