[PATCH, middle-end - RTL]: scheduling, UNSPEC_VOLATILEs should also represent insn blockage for insns with mem only operands

Uros Bizjak ubizjak@gmail.com
Sat Dec 27 06:50:00 GMT 2008


Hello!

There is a generic problem with insn scheduling uncovered by the 
combination of UNSPEC_VOLATILE RTX and memory_barriers, as implemented 
on alphaev56-unknown-linux-gnu:

(define_expand "memory_barrier"
  [(set (match_dup 0)
    (unspec:BLK [(match_dup 0)] UNSPEC_MB))]
  ""
{
  operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
  MEM_VOLATILE_P (operands[0]) = 1;
})

(define_insn "*mb_internal"
  [(set (match_operand:BLK 0 "" "")
    (unspec:BLK [(match_dup 0)] UNSPEC_MB))]
  ""
  "mb"
  [(set_attr "type" "mb")])

The problem is, that UNSPEC_VOLATILEs do not represent insn blockage for 
patterns having only memory operands, such as "*mb_internal" pattern 
above. The core of the problem can be traced to sched-deps.c, around 
line 2137, where UNSPEC_VOLATILEs don't flush pending list of memory 
loads and stores.

This problem manifest itself in wrong insn scheduling on alpha target. 
For following testcase:

--cut here--
static int icount;

void test (void)
{
#pragma omp atomic
    icount++;
}
--cut here--

gcc -O2 -fopenmp generates:

test:
    .frame $30,0,$26,0
    mb     # 16    *mb_internal    [length = 4]
    ldgp $29,0($27)     # 25    *prologue_ldgp_1    [length = 4]
$test..ng:
    .prologue 1
    lda $1,icount     # 5    *movdi_nofix/5    [length = 4]
$L3:
    ldl_l $2,0($1)     # 18    load_locked_si    [length = 4]
    addl $2,1,$2     # 19    *addsi_internal/1    [length = 4]
    stl_c $2,0($1)     # 20    store_conditional_si    [length = 4]
    beq $2,$L3     # 21    *bcc_normal    [length = 4]
    mb     # 22    *mb_internal    [length = 4]
    ret $31,($26),1     # 29    *return_internal    [length = 4]

Please note, how "mb" passed "ldgp" instruction, defined as:

(define_insn "*prologue_ldgp_1"
  [(set (match_operand:DI 0 "register_operand" "=r")
    (unspec_volatile:DI [(match_operand:DI 1 "register_operand" "r")
                 (match_operand 2 "const_int_operand" "")]
                UNSPECV_LDGP1))]
  ""
  "ldgp %0,0(%1)\n$%~..ng:"
  [(set_attr "cannot_copy" "true")])

"mb" was also scheduled above alternate function entry point, resulting 
in all sorts of "interesting" effects for certain optimization levels. 
In addition, moving insn before ldgp confuses alpha linker to load wrong 
GP value, leading to SEGV.

Attached patch teaches scheduler to flush pending lists of memory loads 
and stores for UNSPEC_VOLATILEs, preventing insns that have only memory 
operands to pass UNSPEC_VOLATILE insns. Patched gcc then generates 
correct code:

test:
    .frame $30,0,$26,0
    ldgp $29,0($27)     # 25    *prologue_ldgp_1    [length = 4]
$test..ng:
    .prologue 1
    mb     # 16    *mb_internal    [length = 4]
    lda $1,icount     # 5    *movdi_nofix/5    [length = 4]
$L3:
    ldl_l $2,0($1)     # 18    load_locked_si    [length = 4]
    addl $2,1,$2     # 19    *addsi_internal/1    [length = 4]
    stl_c $2,0($1)     # 20    store_conditional_si    [length = 4]
    beq $2,$L3     # 21    *bcc_normal    [length = 4]
    mb     # 22    *mb_internal    [length = 4]
    ret $31,($26),1     # 29    *return_internal    [length = 4]

The effect of attached patch is, that all but one libgomp failures are 
gone for alpha (the one that remains is due to missing -mieee):

 Running target unix
-FAIL: libgomp.c/atomic-10.c execution test
 FAIL: libgomp.c/atomic-6.c execution test
-FAIL: libgomp.c/loop-6.c execution test
-FAIL: libgomp.c++/ctor-1.C  -O2  execution test
-FAIL: libgomp.c++/ctor-1.C  -Os  execution test
-FAIL: libgomp.c++/ctor-11.C  -O2  execution test
-FAIL: libgomp.c++/ctor-11.C  -Os  execution test
-FAIL: libgomp.c++/ctor-2.C  -O2  execution test
-FAIL: libgomp.c++/ctor-2.C  -Os  execution test
-FAIL: libgomp.c++/ctor-3.C  -O2  execution test
-FAIL: libgomp.c++/ctor-3.C  -Os  execution test
-FAIL: libgomp.c++/ctor-4.C  -O2  execution test
-FAIL: libgomp.c++/ctor-4.C  -Os  execution test
-FAIL: libgomp.c++/ctor-7.C  -O2  execution test
-FAIL: libgomp.c++/ctor-7.C  -Os  execution test
-FAIL: libgomp.c++/loop-9.C  -O2  execution test
-FAIL: libgomp.c++/loop-9.C  -Os  execution test
-FAIL: libgomp.c++/pr30703.C  -O2  execution test
-FAIL: libgomp.c++/pr30703.C  -Os  execution test
 
         === libgomp Summary ===
 
-# of expected passes        2308
-# of unexpected failures    19
+# of expected passes        2326
+# of unexpected failures    1


2008-12-26  Uros Bizjak  <ubizjak@gmail.com>

    * sched-deps.c (sched_analyze_2)[UNSPEC_VOLATILE]: Flush pending
    memory loads and stores.

Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu 
{,-m32}, i686-pc-linux-gnu and alphaev56-unknown-linux-gnu. OK for 4.4 
and 4.3?

Also, some targets define memory_barrier in kind of strange way, 
creating double-mem references, etc... Also, there is no need to define 
memory_barrier insn as UNSPEC_VOLATILE, since it is its operand (BLKmode 
mem) that should be volatile, not the instruction itself. I plan to 
change these to the definition, similar to the one above, in a 
followup-patch.

Thanks,
Uros.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: p.diff.txt
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20081227/1304d642/attachment.txt>


More information about the Gcc-patches mailing list