This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Modifying ARM code generator for elimination of 8bit writes - need help


On Tue, Jun 06, 2006 at 08:27:10PM +0200, Rask Ingemann Lambertsen wrote:
> On Tue, Jun 06, 2006 at 10:39:46AM +0100, Richard Sandiford wrote:

> > This is just a guess, but the insn above might be an output reload.
> 
> It is, in a peculiar (and not useful) way. Diffing the greg dump against
> the lreg dump shows (using the example code I posted):
> 
> +(insn:HI 25 17 38 2 (set (reg:QI 3 r3)
> +        (reg:QI 3 r3 [110])) 158 {*arm_movqi_insn_swp} (nil)
> +    (nil))
> 
> -(insn:HI 25 17 36 2 (set (mem/s:QI (plus:SI (reg/v/f:SI 101 [ x ])
> +(insn 38 25 36 2 (set (mem/s:QI (plus:SI (reg/v/f:SI 0 r0 [orig:101 x ]
> + [101])
>                  (const_int 5 [0x5])) [0 <variable>.c2+0 S1 A8])
> -        (subreg:QI (reg:SI 110) 0)) 158 {*arm_movqi_insn_swp} (nil)
> -    (expr_list:REG_DEAD (reg:SI 110)
> -        (expr_list:REG_DEAD (reg/v/f:SI 101 [ x ])
> -            (nil))))
> +        (reg:QI 3 r3)) 158 {*arm_movqi_insn_swp} (nil)
> +    (nil))
> 
> I.e. change insn 25 to a nop and then add insn 38 as essentially a duplicate
> of the original insn 25. I don't think reload was supposed to do that.

The reason this happens is because reload decides that the cheapest
alternative is the first one, which is reg->reg. Then of course it has to
make an output reload to copy the value to memory...

I have not yet looked into why reload thinks the reg->reg alternative is the
cheapest one. That'll be tomorrow at the earliest.

Meanwhile, to ensure that reload decides upon the alternative we want,
delete the *arm_movqi_insn_swp pattern and use these two instead:

(define_insn "*arm_movqi_insn_to_reg"
  [(set (match_operand:QI 0 "register_operand" "=r,r,r")
        (match_operand:QI 1 "general_operand" "rI,K,m"))]
  "TARGET_ARM && TARGET_SWP_BYTE_WRITES
   && (   register_operand (operands[0], QImode)
       || register_operand (operands[1], QImode))"
  "@
   mov%?\\t%0, %1
   mvn%?\\t%0, #%B1
   ldr%?b\\t%0, %1"
  [(set_attr "type" "*,*,load1")
   (set_attr "predicable" "yes")]
)

(define_insn "*arm_movqi_insn_swp"
  [(set (match_operand:QI 0 "memory_operand" "=Q")
        (match_operand:QI 1 "nonmemory_operand" "r"))]
  "TARGET_ARM && TARGET_SWP_BYTE_WRITES
   && (   register_operand (operands[0], QImode)
       || register_operand (operands[1], QImode))"
  "swp%?b\\t%1, %1, %0\;ldr%?b\\t%1, %0"
  [(set_attr "type" "store1")
   (set_attr "predicable" "yes")]
)

Also, undo the change to arm_legitimate_address_p() arm.c.

Btw, why are there calls to register_operand() in the insn conditions?

With the changes above, I now get this (-O2 -swp-byte-writes):

bytewritetest:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	ldrb	r3, [r0, #5]	@ zero_extendqisi2
	ldrb	r2, [r0, #4]	@ zero_extendqisi2
	eor	r1, r3, r2
	add	r3, r3, r2
	add	r2, r0, #5
	ldr	ip, [r0, #0]
	swpb	r1, r1, [r2, #0]
	@ lr needed for prologue
	str	ip, [r0, #8]
	str	r3, [r0, #0]
	bx	lr

That's a lot better. It is now only one instruction longer than without
-swp-byte-writes.

-- 
Rask Ingemann Lambertsen


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]