This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH, middle-end - RTL]: scheduling, UNSPEC_VOLATILEs should also represent insn blockage for insns with mem only operands


Hello!

There is a generic problem with insn scheduling uncovered by the combination of UNSPEC_VOLATILE RTX and memory_barriers, as implemented on alphaev56-unknown-linux-gnu:

(define_expand "memory_barrier"
 [(set (match_dup 0)
   (unspec:BLK [(match_dup 0)] UNSPEC_MB))]
 ""
{
 operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
 MEM_VOLATILE_P (operands[0]) = 1;
})

(define_insn "*mb_internal"
 [(set (match_operand:BLK 0 "" "")
   (unspec:BLK [(match_dup 0)] UNSPEC_MB))]
 ""
 "mb"
 [(set_attr "type" "mb")])

The problem is, that UNSPEC_VOLATILEs do not represent insn blockage for patterns having only memory operands, such as "*mb_internal" pattern above. The core of the problem can be traced to sched-deps.c, around line 2137, where UNSPEC_VOLATILEs don't flush pending list of memory loads and stores.

This problem manifest itself in wrong insn scheduling on alpha target. For following testcase:

--cut here--
static int icount;

void test (void)
{
#pragma omp atomic
   icount++;
}
--cut here--

gcc -O2 -fopenmp generates:

test:
   .frame $30,0,$26,0
   mb     # 16    *mb_internal    [length = 4]
   ldgp $29,0($27)     # 25    *prologue_ldgp_1    [length = 4]
$test..ng:
   .prologue 1
   lda $1,icount     # 5    *movdi_nofix/5    [length = 4]
$L3:
   ldl_l $2,0($1)     # 18    load_locked_si    [length = 4]
   addl $2,1,$2     # 19    *addsi_internal/1    [length = 4]
   stl_c $2,0($1)     # 20    store_conditional_si    [length = 4]
   beq $2,$L3     # 21    *bcc_normal    [length = 4]
   mb     # 22    *mb_internal    [length = 4]
   ret $31,($26),1     # 29    *return_internal    [length = 4]

Please note, how "mb" passed "ldgp" instruction, defined as:

(define_insn "*prologue_ldgp_1"
 [(set (match_operand:DI 0 "register_operand" "=r")
   (unspec_volatile:DI [(match_operand:DI 1 "register_operand" "r")
                (match_operand 2 "const_int_operand" "")]
               UNSPECV_LDGP1))]
 ""
 "ldgp %0,0(%1)\n$%~..ng:"
 [(set_attr "cannot_copy" "true")])

"mb" was also scheduled above alternate function entry point, resulting in all sorts of "interesting" effects for certain optimization levels. In addition, moving insn before ldgp confuses alpha linker to load wrong GP value, leading to SEGV.

Attached patch teaches scheduler to flush pending lists of memory loads and stores for UNSPEC_VOLATILEs, preventing insns that have only memory operands to pass UNSPEC_VOLATILE insns. Patched gcc then generates correct code:

test:
   .frame $30,0,$26,0
   ldgp $29,0($27)     # 25    *prologue_ldgp_1    [length = 4]
$test..ng:
   .prologue 1
   mb     # 16    *mb_internal    [length = 4]
   lda $1,icount     # 5    *movdi_nofix/5    [length = 4]
$L3:
   ldl_l $2,0($1)     # 18    load_locked_si    [length = 4]
   addl $2,1,$2     # 19    *addsi_internal/1    [length = 4]
   stl_c $2,0($1)     # 20    store_conditional_si    [length = 4]
   beq $2,$L3     # 21    *bcc_normal    [length = 4]
   mb     # 22    *mb_internal    [length = 4]
   ret $31,($26),1     # 29    *return_internal    [length = 4]

The effect of attached patch is, that all but one libgomp failures are gone for alpha (the one that remains is due to missing -mieee):

Running target unix
-FAIL: libgomp.c/atomic-10.c execution test
FAIL: libgomp.c/atomic-6.c execution test
-FAIL: libgomp.c/loop-6.c execution test
-FAIL: libgomp.c++/ctor-1.C  -O2  execution test
-FAIL: libgomp.c++/ctor-1.C  -Os  execution test
-FAIL: libgomp.c++/ctor-11.C  -O2  execution test
-FAIL: libgomp.c++/ctor-11.C  -Os  execution test
-FAIL: libgomp.c++/ctor-2.C  -O2  execution test
-FAIL: libgomp.c++/ctor-2.C  -Os  execution test
-FAIL: libgomp.c++/ctor-3.C  -O2  execution test
-FAIL: libgomp.c++/ctor-3.C  -Os  execution test
-FAIL: libgomp.c++/ctor-4.C  -O2  execution test
-FAIL: libgomp.c++/ctor-4.C  -Os  execution test
-FAIL: libgomp.c++/ctor-7.C  -O2  execution test
-FAIL: libgomp.c++/ctor-7.C  -Os  execution test
-FAIL: libgomp.c++/loop-9.C  -O2  execution test
-FAIL: libgomp.c++/loop-9.C  -Os  execution test
-FAIL: libgomp.c++/pr30703.C  -O2  execution test
-FAIL: libgomp.c++/pr30703.C  -Os  execution test

=== libgomp Summary ===

-# of expected passes        2308
-# of unexpected failures    19
+# of expected passes        2326
+# of unexpected failures    1


2008-12-26 Uros Bizjak <ubizjak@gmail.com>


   * sched-deps.c (sched_analyze_2)[UNSPEC_VOLATILE]: Flush pending
   memory loads and stores.

Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}, i686-pc-linux-gnu and alphaev56-unknown-linux-gnu. OK for 4.4 and 4.3?

Also, some targets define memory_barrier in kind of strange way, creating double-mem references, etc... Also, there is no need to define memory_barrier insn as UNSPEC_VOLATILE, since it is its operand (BLKmode mem) that should be volatile, not the instruction itself. I plan to change these to the definition, similar to the one above, in a followup-patch.

Thanks,
Uros.
Index: sched-deps.c
===================================================================
--- sched-deps.c	(revision 142913)
+++ sched-deps.c	(working copy)
@@ -2132,9 +2132,12 @@ sched_analyze_2 (struct deps *deps, rtx 
       flush_pending_lists (deps, insn, true, false);
       break;
 
+    case UNSPEC_VOLATILE:
+      flush_pending_lists (deps, insn, true, true);
+      /* FALLTHRU */
+
     case ASM_OPERANDS:
     case ASM_INPUT:
-    case UNSPEC_VOLATILE:
       {
 	/* Traditional and volatile asm instructions must be considered to use
 	   and clobber all hard registers, all pseudo-registers and all of

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]