This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

ARM/getting rid of superfluous zero extension


Hi,

I have recently added ARM support for builtin_bswap16, which uses the
rev16 instruction when dealing with an unsigned argument.

Considering:
unsigned short myfunc(unsigned short x) {
  return __builtin_bswap16(x);
}

gcc -O2 generates:
myfunc:
	rev16	r0, r0
	uxth	r0, r0
	bx	lr

I'd like to get rid of the zero extension, which is not needed since
r0's 16 upper bits are zero on input.

Note that rev16 actually operates on a 32 bits value and swaps the
bytes in each halfword of a 32 bits register.

After discussions with Ulrich, I have changed the machine description
of bswaphi2 to:
(define_insn "arm_rev16_new"
  [(set (match_operand:SI 0 "s_register_operand" "=l,l,r")
	(ior:SI (and:SI (ashift:SI (match_operand:SI 1 "s_register_operand" "l,l,r")
                                  (const_int 8))
                       (const_int 4278255360))
               (and:SI (lshiftrt:SI (match_dup 1) (const_int 8))
                       (const_int 16711935))))]
  "arm_arch6"
  "@
   rev16\t%0, %1
   rev16%?\t%0, %1
   rev16%?\t%0, %1"
  [(set_attr "arch" "t1,t2,32")
   (set_attr "length" "2,2,4")]
)

(define_expand "bswaphi2"
  [(set (match_operand:HI 0 "s_register_operand" "")
       (bswap:HI (match_operand:HI 1 "s_register_operand" "")))]
  "arm_arch6"
  {
    rtx in = gen_lowpart (SImode, operands[1]);
    rtx out = gen_lowpart (SImode, operands[0]);

    emit_insn (gen_arm_rev16_new (out, in));

    DONE;
  }
 )

Now, this exposes the fact that rev16 also changes the 16 upper bits,
but the generated code is still the same.

I have been trying to understand why combine does not manage to infer
that the zero extension is superfluous.
Before RTL, the gimple IR contains:
myfunc (short unsigned int x)
{
  short unsigned int _2;
;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _2 = __builtin_bswap16 (x_1(D)); [tail call]
  return _2;
;;    succ:       EXIT
}

Before combine, the RTL is:

(note 4 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 2 4 3 2 (set (reg/v:SI 112 [ x ])
        (reg:SI 0 r0 [ x ])) rev16.c:11 636 {*arm_movsi_vfp}
     (expr_list:REG_DEAD (reg:SI 0 r0 [ x ])
        (nil)))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (set (subreg:SI (reg:HI 113) 0)
        (ior:SI (and:SI (ashift:SI (reg/v:SI 112 [ x ])
                    (const_int 8 [0x8]))
                (const_int 4278255360 [0xff00ff00]))
            (and:SI (lshiftrt:SI (reg/v:SI 112 [ x ])
                    (const_int 8 [0x8]))
                (const_int 16711935 [0xff00ff])))) rev16.c:17 354
{arm_rev16_new}
     (expr_list:REG_DEAD (reg/v:SI 112 [ x ])
        (nil)))
(insn 7 6 12 2 (set (reg:SI 110 [ D.4971 ])
        (zero_extend:SI (reg:HI 113))) rev16.c:17 166 {*arm_zero_extendhisi2_v6}
     (expr_list:REG_DEAD (reg:HI 113)
        (nil)))
(insn 12 7 15 2 (set (reg/i:SI 0 r0)
        (reg:SI 110 [ D.4971 ])) rev16.c:19 636 {*arm_movsi_vfp}
     (expr_list:REG_DEAD (reg:SI 110 [ D.4971 ])
        (nil)))
(insn 15 12 0 2 (use (reg/i:SI 0 r0)) rev16.c:19 -1
     (nil))

Stepping inside set_nonzero_bits_and_sign_copies() indicates that:
- insn 2 has nonzero_bits = 65535, and sign_bit_copies = 16
- insn 6 has nonzero_bits = 65535 and sign_bit_copies = 1
- insn 7 has nonzero_bits = 65535 and sign_bit_copies = 16

Any suggestion about how I could avoid generating this zero_extension?

Thanks,

Christophe.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]