This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH, middle-end - RTL]: scheduling, UNSPEC_VOLATILEs should also represent insn blockage for insns with mem only operands
- From: Uros Bizjak <ubizjak at gmail dot com>
- To: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Fri, 26 Dec 2008 15:37:38 +0100
- Subject: [PATCH, middle-end - RTL]: scheduling, UNSPEC_VOLATILEs should also represent insn blockage for insns with mem only operands
Hello!
There is a generic problem with insn scheduling uncovered by the
combination of UNSPEC_VOLATILE RTX and memory_barriers, as implemented
on alphaev56-unknown-linux-gnu:
(define_expand "memory_barrier"
[(set (match_dup 0)
(unspec:BLK [(match_dup 0)] UNSPEC_MB))]
""
{
operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
MEM_VOLATILE_P (operands[0]) = 1;
})
(define_insn "*mb_internal"
[(set (match_operand:BLK 0 "" "")
(unspec:BLK [(match_dup 0)] UNSPEC_MB))]
""
"mb"
[(set_attr "type" "mb")])
The problem is, that UNSPEC_VOLATILEs do not represent insn blockage for
patterns having only memory operands, such as "*mb_internal" pattern
above. The core of the problem can be traced to sched-deps.c, around
line 2137, where UNSPEC_VOLATILEs don't flush pending list of memory
loads and stores.
This problem manifest itself in wrong insn scheduling on alpha target.
For following testcase:
--cut here--
static int icount;
void test (void)
{
#pragma omp atomic
icount++;
}
--cut here--
gcc -O2 -fopenmp generates:
test:
.frame $30,0,$26,0
mb # 16 *mb_internal [length = 4]
ldgp $29,0($27) # 25 *prologue_ldgp_1 [length = 4]
$test..ng:
.prologue 1
lda $1,icount # 5 *movdi_nofix/5 [length = 4]
$L3:
ldl_l $2,0($1) # 18 load_locked_si [length = 4]
addl $2,1,$2 # 19 *addsi_internal/1 [length = 4]
stl_c $2,0($1) # 20 store_conditional_si [length = 4]
beq $2,$L3 # 21 *bcc_normal [length = 4]
mb # 22 *mb_internal [length = 4]
ret $31,($26),1 # 29 *return_internal [length = 4]
Please note, how "mb" passed "ldgp" instruction, defined as:
(define_insn "*prologue_ldgp_1"
[(set (match_operand:DI 0 "register_operand" "=r")
(unspec_volatile:DI [(match_operand:DI 1 "register_operand" "r")
(match_operand 2 "const_int_operand" "")]
UNSPECV_LDGP1))]
""
"ldgp %0,0(%1)\n$%~..ng:"
[(set_attr "cannot_copy" "true")])
"mb" was also scheduled above alternate function entry point, resulting
in all sorts of "interesting" effects for certain optimization levels.
In addition, moving insn before ldgp confuses alpha linker to load wrong
GP value, leading to SEGV.
Attached patch teaches scheduler to flush pending lists of memory loads
and stores for UNSPEC_VOLATILEs, preventing insns that have only memory
operands to pass UNSPEC_VOLATILE insns. Patched gcc then generates
correct code:
test:
.frame $30,0,$26,0
ldgp $29,0($27) # 25 *prologue_ldgp_1 [length = 4]
$test..ng:
.prologue 1
mb # 16 *mb_internal [length = 4]
lda $1,icount # 5 *movdi_nofix/5 [length = 4]
$L3:
ldl_l $2,0($1) # 18 load_locked_si [length = 4]
addl $2,1,$2 # 19 *addsi_internal/1 [length = 4]
stl_c $2,0($1) # 20 store_conditional_si [length = 4]
beq $2,$L3 # 21 *bcc_normal [length = 4]
mb # 22 *mb_internal [length = 4]
ret $31,($26),1 # 29 *return_internal [length = 4]
The effect of attached patch is, that all but one libgomp failures are
gone for alpha (the one that remains is due to missing -mieee):
Running target unix
-FAIL: libgomp.c/atomic-10.c execution test
FAIL: libgomp.c/atomic-6.c execution test
-FAIL: libgomp.c/loop-6.c execution test
-FAIL: libgomp.c++/ctor-1.C -O2 execution test
-FAIL: libgomp.c++/ctor-1.C -Os execution test
-FAIL: libgomp.c++/ctor-11.C -O2 execution test
-FAIL: libgomp.c++/ctor-11.C -Os execution test
-FAIL: libgomp.c++/ctor-2.C -O2 execution test
-FAIL: libgomp.c++/ctor-2.C -Os execution test
-FAIL: libgomp.c++/ctor-3.C -O2 execution test
-FAIL: libgomp.c++/ctor-3.C -Os execution test
-FAIL: libgomp.c++/ctor-4.C -O2 execution test
-FAIL: libgomp.c++/ctor-4.C -Os execution test
-FAIL: libgomp.c++/ctor-7.C -O2 execution test
-FAIL: libgomp.c++/ctor-7.C -Os execution test
-FAIL: libgomp.c++/loop-9.C -O2 execution test
-FAIL: libgomp.c++/loop-9.C -Os execution test
-FAIL: libgomp.c++/pr30703.C -O2 execution test
-FAIL: libgomp.c++/pr30703.C -Os execution test
=== libgomp Summary ===
-# of expected passes 2308
-# of unexpected failures 19
+# of expected passes 2326
+# of unexpected failures 1
2008-12-26 Uros Bizjak <ubizjak@gmail.com>
* sched-deps.c (sched_analyze_2)[UNSPEC_VOLATILE]: Flush pending
memory loads and stores.
Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
{,-m32}, i686-pc-linux-gnu and alphaev56-unknown-linux-gnu. OK for 4.4
and 4.3?
Also, some targets define memory_barrier in kind of strange way,
creating double-mem references, etc... Also, there is no need to define
memory_barrier insn as UNSPEC_VOLATILE, since it is its operand (BLKmode
mem) that should be volatile, not the instruction itself. I plan to
change these to the definition, similar to the one above, in a
followup-patch.
Thanks,
Uros.
Index: sched-deps.c
===================================================================
--- sched-deps.c (revision 142913)
+++ sched-deps.c (working copy)
@@ -2132,9 +2132,12 @@ sched_analyze_2 (struct deps *deps, rtx
flush_pending_lists (deps, insn, true, false);
break;
+ case UNSPEC_VOLATILE:
+ flush_pending_lists (deps, insn, true, true);
+ /* FALLTHRU */
+
case ASM_OPERANDS:
case ASM_INPUT:
- case UNSPEC_VOLATILE:
{
/* Traditional and volatile asm instructions must be considered to use
and clobber all hard registers, all pseudo-registers and all of