This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

GCC47 movmem breaks RA, GCC46 RA is fine


Hi,

I am facing a problem with the GCC47 register allocation and my movmemqi. GCC46 dealt very well with the problem but GCC47 keeps throwing at me register spill failures.

My backend has very few registers. 3 chip registers in total (class CHIP_REGS), one of them (XL) is used for memory references (class ADDR_REGS) and the other two (AL, AH) are for normal use (DATA_REGS), so CHIP_REGS = ADDR_REGS U DATA_REGS.

There are a couple of other memory mapped registers, but all loads and stores go through CHIP_REGS.

My chip has a block copy instruction which needs source address in XL, destination address in AH and count in AL. My movmemqi is similar to movmemsi in rx.

(define_expand "movmemqi"
  [(use (match_operand:BLK 0 "memory_operand"))
   (use (match_operand:BLK 1 "memory_operand"))
   (use (match_operand:QI 2 "general_operand"))
   (use (match_operand:QI 3 "general_operand"))]
  ""
{
    rtx dst_addr = XEXP(operands[0], 0);
    rtx src_addr = XEXP(operands[1], 0);
    rtx dst_reg = gen_rtx_REG(QImode, RAH);
    rtx src_reg = gen_rtx_REG(QImode, RXL);
    rtx cnt_reg = gen_rtx_REG(QImode, RAL);

emit_move_insn(cnt_reg, operands[2]);

    if(GET_CODE(dst_addr) == PLUS)
    {
        emit_move_insn(dst_reg, XEXP(dst_addr, 0));
        emit_insn(gen_addqi3(dst_reg, dst_reg, XEXP(dst_addr, 1)));
    }
    else
        emit_move_insn(dst_reg, dst_addr);

    if(GET_CODE(src_addr) == PLUS)
    {
        emit_move_insn(src_reg, XEXP(src_addr, 0));
        emit_insn(gen_addqi3(src_reg, src_reg, XEXP(src_addr, 1)));
    }
    else
        emit_move_insn(src_reg, src_addr);

emit_insn(gen_bc2());

    DONE;
})

(define_insn "bc2"
  [(set (reg:QI RAL) (const_int 0))
   (set (mem:BLK (reg:QI RAH)) (mem:BLK (reg:QI RXL)))
   (set (reg:QI RXL) (plus:QI (reg:QI RXL) (reg:QI RAL)))
   (set (reg:QI RAH) (plus:QI (reg:QI RAH) (reg:QI RAL)))]
  ""
  "bc2")

The parallel in bc2 setups what the bc2 chip instruction modifies. Copies block in XL to AH, Moves XL to point to the end of the source block, AH to point to the end of the destination block and sets AL to 0.

The C code
int **
t25 (int *d, int **s)
{
  memcpy (d, *s, 16);
  return s;
}

turns into the following after asmcons (-Os passed in):
(note 5 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)

(insn 2 5 3 2 (parallel [
            (set (reg/v/f:QI 22 [ d ])
                (reg:QI 1 AL [ d ]))
            (clobber (reg:CC 13 CC))
        ]) memcpy.i:3 6 {*movqi}
     (expr_list:REG_DEAD (reg:QI 1 AL [ d ])
        (expr_list:REG_UNUSED (reg:CC 13 CC)
            (nil))))

(insn 3 2 4 2 (parallel [
            (set (reg/v/f:QI 23 [ s ])
                (reg:QI 0 AH [ s ]))
            (clobber (reg:CC 13 CC))
        ]) memcpy.i:3 6 {*movqi}
     (expr_list:REG_DEAD (reg:QI 0 AH [ s ])
        (expr_list:REG_UNUSED (reg:CC 13 CC)
            (nil))))

(note 4 3 7 2 NOTE_INSN_FUNCTION_BEG)

(insn 7 4 8 2 (parallel [
            (set (reg/f:QI 24 [ *s_1(D) ])
                (mem/f:QI (reg/v/f:QI 23 [ s ]) [2 *s_1(D)+0 S1 A16]))
            (clobber (reg:CC 13 CC))
        ]) memcpy.i:4 6 {*movqi}
     (expr_list:REG_UNUSED (reg:CC 13 CC)
        (nil)))

(insn 8 7 9 2 (parallel [
            (set (reg:QI 1 AL)
                (const_int 16 [0x10]))
            (clobber (reg:CC 13 CC))
        ]) memcpy.i:4 6 {*movqi}
     (expr_list:REG_UNUSED (reg:CC 13 CC)
        (nil)))

(insn 9 8 10 2 (parallel [
            (set (reg:QI 0 AH)
                (reg/v/f:QI 22 [ d ]))
            (clobber (reg:CC 13 CC))
        ]) memcpy.i:4 6 {*movqi}
     (expr_list:REG_DEAD (reg/v/f:QI 22 [ d ])
        (expr_list:REG_UNUSED (reg:CC 13 CC)
            (nil))))

(insn 10 9 11 2 (parallel [
            (set (reg:QI 3 X)
                (reg/f:QI 24 [ *s_1(D) ]))
            (clobber (reg:CC 13 CC))
        ]) memcpy.i:4 6 {*movqi}
     (expr_list:REG_DEAD (reg/f:QI 24 [ *s_1(D) ])
        (expr_list:REG_UNUSED (reg:CC 13 CC)
            (nil))))

(insn 11 10 16 2 (parallel [
            (set (reg:QI 1 AL)
                (const_int 0 [0]))
            (set (mem:BLK (reg:QI 0 AH) [0 A16])
                (mem:BLK (reg:QI 3 X) [0 A16]))
            (set (reg:QI 3 X)
                (plus:QI (reg:QI 3 X)
                    (reg:QI 1 AL)))
            (set (reg:QI 0 AH)
                (plus:QI (reg:QI 0 AH)
                    (reg:QI 1 AL)))
        ]) memcpy.i:4 21 {bc2}
     (expr_list:REG_UNUSED (reg:QI 3 X)
        (expr_list:REG_UNUSED (reg:QI 1 AL)
            (expr_list:REG_UNUSED (reg:QI 0 AH)
                (nil)))))

(insn 16 11 19 2 (parallel [
            (set (reg/i:QI 1 AL)
                (reg/v/f:QI 23 [ s ]))
            (clobber (reg:CC 13 CC))
        ]) memcpy.i:6 6 {*movqi}
     (expr_list:REG_DEAD (reg/v/f:QI 23 [ s ])
        (expr_list:REG_UNUSED (reg:CC 13 CC)
            (nil))))

(insn 19 16 0 2 (use (reg/i:QI 1 AL)) memcpy.i:6 -1
     (nil))

Pass ira starts by reporting:
;; Function t25 (t25, funcdef_no=0, decl_uid=1309, cgraph_uid=0)

starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called
Building IRA IR
starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called
init_insns for 24: (insn_list:REG_DEP_TRUE 7 (nil))

Pass 0 for finding pseudo/allocno costs

    a1 (r24,l0) best ADDR_REGS, allocno ADDR_REGS
    a0 (r23,l0) best ADDR_REGS, allocno ADDR_REGS
    a2 (r22,l0) best GENERAL_REGS, allocno GENERAL_REGS

a0(r23,l0) costs: ADDR_REGS:0 DATA_REGS:4000 STACK_REGS:4000 CHIP_REGS:4000 FAKE_REGS:4000 MEM_REGS:4000 GENERAL_REGS:6000 ALL_REGS:6000 MEM:13000
a1(r24,l0) costs: ADDR_REGS:-1000 DATA_REGS:0 STACK_REGS:2000 CHIP_REGS:0 FAKE_REGS:2000 MEM_REGS:2000 GENERAL_REGS:2000 ALL_REGS:2000 MEM:-1000
a2(r22,l0) costs: ADDR_REGS:0 DATA_REGS:0 STACK_REGS:0 CHIP_REGS:0 FAKE_REGS:0 MEM_REGS:0 GENERAL_REGS:2000 ALL_REGS:2000 MEM:7000



Pass 1 for finding pseudo/allocno costs


r24: preferred ADDR_REGS, alternative NO_REGS, allocno ADDR_REGS
r23: preferred ADDR_REGS, alternative GENERAL_REGS, allocno GENERAL_REGS
r22: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS


a0(r23,l0) costs: ADDR_REGS:0 DATA_REGS:4000 STACK_REGS:6000 CHIP_REGS:4000 FAKE_REGS:6000 MEM_REGS:6000 GENERAL_REGS:6000 ALL_REGS:6000 MEM:13000
a1(r24,l0) costs: ADDR_REGS:-1000 DATA_REGS:0 STACK_REGS:2000 CHIP_REGS:0 FAKE_REGS:2000 MEM_REGS:2000 GENERAL_REGS:2000 ALL_REGS:2000 MEM:-1000
a2(r22,l0) costs: GENERAL_REGS:2000 MEM:7000


   Insn 19(l0): point = 0
   Insn 16(l0): point = 2
   Insn 11(l0): point = 4
   Insn 10(l0): point = 6
   Insn 9(l0): point = 8
   Insn 8(l0): point = 10
   Insn 7(l0): point = 12
   Insn 3(l0): point = 14
   Insn 2(l0): point = 16
 a0(r23): [3..14]
 a1(r24): [7..12]
 a2(r22): [9..16]
Compressing live ranges: from 19 to 2 - 10%
Ranges after the compression:
 a0(r23): [0..1]
 a1(r24): [0..1]
 a2(r22): [0..1]
+++Allocating 12 bytes for conflict table (uncompressed size 12)
;; a0(r23,l0) conflicts: a1(r24,l0) a2(r22,l0)
;;     total conflict hard regs: 0 1 3
;;     conflict hard regs: 0 1 3

;; a1(r24,l0) conflicts: a0(r23,l0) a2(r22,l0)
;;     total conflict hard regs:
;;     conflict hard regs:

;; a2(r22,l0) conflicts: a0(r23,l0) a1(r24,l0)
;;     total conflict hard regs: 0 1
;;     conflict hard regs: 0 1

  regions=1, blocks=3, points=2
    allocnos=3 (big 0), copies=0, conflicts=0, ranges=3

**** Allocnos coloring:


Loop 0 (parent -1, header bb2, depth 0)
bbs: 2
all: 0r23 1r24 2r22
modified regnos: 22 23 24
border:
Pressure: GENERAL_REGS=4
Hard reg set forest:
0:( 0 1 3 7-12)@0
1:( 3 7-12)@18000
2:( 7-12)@26000
3:( 3)@0
Allocno a0r23 of GENERAL_REGS(9) has 6 avail. regs 7-12, node: 7-12 (confl regs = 0-6 13-12)
Allocno a1r24 of ADDR_REGS(1) has 1 avail. regs 3, node: 3 (confl regs = 0-2 4-12)
Allocno a2r22 of GENERAL_REGS(9) has 7 avail. regs 3 7-12, node: 3 7-12 (confl regs = 0-2 4-6 13-12)
Pushing a2(r22,l0)(cost 0)
Making a1(r24,l0) colorable
Pushing a1(r24,l0)(cost 0)
Pushing a0(r23,l0)(cost 0)
Popping a0(r23,l0) -- assign reg 7
Popping a1(r24,l0) -- assign reg 3
Popping a2(r22,l0) -- assign reg 8
Disposition:
2:r22 l0 8 0:r23 l0 7 1:r24 l0 3
New iteration of spill/restore move
+++Costs: overall 7000, reg 7000, mem 0, ld 0, st 0, move 0
+++ move loops 0, new jumps 0



Followed by reload breaking after a spill it can't handle: ;; Function t25 (t25, funcdef_no=0, decl_uid=1309, cgraph_uid=0)

insn=2, live_throughout: 0, 5, dead_or_set: 1, 22
insn=3, live_throughout: 5, 22, dead_or_set: 0, 23
insn=7, live_throughout: 5, 22, 23, dead_or_set: 24
insn=8, live_throughout: 5, 22, 23, 24, dead_or_set: 1
insn=9, live_throughout: 1, 5, 23, 24, dead_or_set: 0, 22
insn=10, live_throughout: 0, 1, 5, 23, dead_or_set: 3, 24
insn=11, live_throughout: 5, 23, dead_or_set: 0, 1, 3
insn=16, live_throughout: 5, dead_or_set: 1, 23
insn=19, live_throughout: 1, 5, dead_or_set:
init_insns for 24: (insn_list:REG_DEP_TRUE 7 (nil))
changing reg in insn 2
changing reg in insn 9
changing reg in insn 3
changing reg in insn 16
changing reg in insn 7
changing reg in insn 7
changing reg in insn 7
changing reg in insn 10
Spilling for insn 7.
Using reg 3 for reload 0
      Try Assign 24(a1), cost=0
changing reg in insn 7
changing reg in insn 10
 Register 24 now on stack.

Spilling for insn 7.
Using reg 3 for reload 0
Using reg 3 for reload 1
Using reg 1 for reload 4
Spilling for insn 10.
reload failure for reload 0

Reloads for insn # 10
Reload 0: reload_in (QI) = (reg/v/f:QI 7 @H'fff8 [orig:23 s ] [23])
ADDR_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 1)
reload_in_reg: (reg/v/f:QI 7 @H'fff8 [orig:23 s ] [23])
Reload 1: CHIP_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 1), optional, can't combine, secondary_reload_p
Reload 2: reload_in (QI) = (mem/f:QI (reg/v/f:QI 7 @H'fff8 [orig:23 s ] [23]) [2 *s_1(D)+0 S1 A16])
GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 1), optional, can't combine
reload_in_reg: (reg/f:QI 24 [ *s_1(D) ])
secondary_in_reload = 1




What's interesting is that GCC46 manages this fine since during IRA it reports:
Pass 0 for finding pseudo/allocno costs


    a1 (r24,l0) best ADDR_REGS, cover GENERAL_REGS
    a0 (r23,l0) best ADDR_REGS, cover GENERAL_REGS
    a2 (r22,l0) best GENERAL_REGS, cover GENERAL_REGS

This differs from what GCC47 does and seems to work better.
I would like help on how to best handle this situation under GCC47.

Cheers,

--
PMatos


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]