Re: GCC47 movmem breaks RA, GCC46 RA is fine

On 27/04/12 11:49, Richard Guenther wrote:

Yes, it inlines it. You may want to look at s390 which I believe has a similar block-copy operation.


I looked at s390 and even though the block copy instruction seems similar ours is much more restrictive since it expects values in specific registers, instead of allowing the register numbers to be passed to the instruction (which is the case with s390 mvcle insn).

I decided to try and not hardcode the registers in the instruction but since the instruction requires specific registers as operands I had to create a class per register (with a single register in it) and then register constraints for each of the classes. This turned out not to work. RA breaks even earlier than before. Here's what I did:
(define_expand "movmemqi"
[(set (match_operand:BLK 0 "memory_operand") ; destination
(match_operand:BLK 1 "memory_operand")) ; source
(use (match_operand:QI 2 "general_operand"))] ; count
"!TARGET_NO_BLOCK_COPY && !reload_completed"
rtx dst_addr = XEXP(operands[0], 0);
rtx src_addr = XEXP(operands[1], 0);
rtx dst_reg = gen_reg_rtx(QImode); /* will be forced into AH */
rtx src_reg = gen_reg_rtx(QImode); /* will be forced into XL */
rtx cnt_reg = gen_reg_rtx(QImode); /* will be forced into AL */

emit_move_insn(cnt_reg, operands[2]);

    if(GET_CODE(dst_addr) == PLUS)
        emit_move_insn(dst_reg, XEXP(dst_addr, 0));
        emit_insn(gen_addqi3(dst_reg, dst_reg, XEXP(dst_addr, 1)));
        emit_move_insn(dst_reg, dst_addr);

    if(GET_CODE(src_addr) == PLUS)
        emit_move_insn(src_reg, XEXP(src_addr, 0));
        emit_insn(gen_addqi3(src_reg, src_reg, XEXP(src_addr, 1)));
        emit_move_insn(src_reg, src_addr);

emit_insn(gen_bc2(dst_reg, src_reg, cnt_reg));


(define_insn "bc2"
  [(set (match_operand:QI 0 "register_operand" "=l")
        (const_int 0))
   (set (mem:BLK (match_operand:QI 1 "register_operand" "=h"))
        (mem:BLK (match_operand:QI 2 "register_operand" "=x")))
   (set (match_dup 2)
        (plus:QI (match_dup 2) (match_dup 0)))
   (set (match_dup 1) (plus:QI (match_dup 1) (match_dup 0)))]

constraints l, h and x correspond to singleton classes for registers AL, AH and XL respectively. I think the problem here is the RA inability to deal with such a constrained register set. Since I want to be able to use our block copy instruction instead of disabling movmemqi, setmemqi and therefore branch to memcpy, is there anything I can try to tune the RA?


