This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Review microblaze.md, suggestions
- From: Richard Henderson <rth at redhat dot com>
- To: Michael Eager <eager at eagercon dot com>
- Cc: gcc-patches at gcc dot gnu dot org, "Joseph S. Myers" <joseph at codesourcery dot com>
- Date: Thu, 04 Feb 2010 15:38:44 -0800
- Subject: Review microblaze.md, suggestions
- References: <4B6B0699.6070702@eagercon.com>
I think you've made a mistake by adding the R constraint. You've done
it in order to split memory alternatives into those appropriate for a
delay slot, and those that aren't. However, there is a better way than
replicating alternatives like that. Instead, use computed rather than
literal attributes.
First, I'd drop all the "no_delay_*" members of attr "type". In
addition, icmp isn't used and darith doesn't seem to be useful. We will
need to add arith1 for arith operations with only a single rtx input.
That of course means that define_delay has to change. That's fine:
(define_delay (eq_attr "type" "branch,call,jump")
[(cond [(eq_attr "type" "branch,call,jump,multi")
(const_int 0)
(eq_attr "type" "fadd,frsub,fmul,fdiv,fcmp,store,load")
(and (eq_attr "length" "4")
(symbol_ref "microblaze_no_unsafe_delay")))
]
(eq_attr "length" "4"))
(nil) (nil)])
Aside from the more readable type filtering, we require that any
instruction to be put in a delay slot be 4 bytes long.
Next, adjust length computation:
(define_attr "length" ""
(cond [(eq_attr "type" "load")
(if_then_else (match_operand 1 "double_memory_operand" "")
(const_int 8)
(const_int 4))
(eq_attr "type" "store")
(if_then_else (match_operand 0 "double_memory_operand" "")
(const_int 8)
(const_int 4))
(eq_attr "type" "arith")
(if_then_else (match_operand 2 "long_constant_operand" "")
(const_int 8)
(const_int 4))
;; Require "multi" insns to have a length specified.
(eq_attr "type" "multi")
(symbol_ref "(gcc_unreachable(), 0)")
(eq_attr "type" "call")
(something)
;; ... any other special types I've forgotten
]
(const_int 4)))
This is, of course, the *default*, which can be overridden on a
case-by-case basis by the individual instructions. However, I think
that I've just covered 95% of the patterns that aren't obviously
multiple insns.
Which means that, for example,
(define_insn "zero_extendhisi2"
[(set (match_operand:SI 0 "register_operand" "=d,d,d")
(zero_extend:SI (match_operand:HI 1 "nonimmediate_operand" "d,R,m")))]
""
"@
andi\t%0,%1,0xffff
lhu%i1\t%0,%1
lhu%i1\t%0,%1"
[(set_attr "type" "no_delay_arith,load,no_delay_load")
(set_attr "mode" "SI,SI,SI")
(set_attr "length" "8,4,8")])
reduces to
(define_insn "zero_extendhisi2"
[(set (match_operand:SI 0 "register_operand" "=d,d")
(zero_extend:SI
(match_operand:HI 1 "nonimmediate_operand" "d,m")))
""
"@
andi\t%0,%1,0xffff
lhu%i1\t%0,%1"
[(set_attr "type" "arith,load")
(set_attr "length" "8,*")])
Note that the length of the first alternative is specified explicitly
since there's nothing in the rtl itself that could determine that this
is an AND with a long immediate. The length of the second alternative
is explicitly set to default, so the above computation will come into
effect.
> (define_insn "*one_cmpldi2"
> (define_insn "anddi3"
> (define_insn "iordi3"
> (define_insn "xordi3"
Drop all of your double-word logical operations and associated
splitters; they do just what the generic expander code would do.
> (define_insn "extendsidi2"
> [(set (match_operand:DI 0 "register_operand" "=d,d,d")
> (sign_extend:DI (match_operand:SI 1 "nonimmediate_operand"
"d,R,m")))]
> ""
> {
> if (which_alternative == 0)
> output_asm_insn ("addk\t%D0,r0,%1", operands);
> else
> output_asm_insn ("lw%i1\t%D0,%1", operands);
>
> output_asm_insn ("add\t%0,%D0,%D0", operands);
> output_asm_insn ("addc\t%0,r0,r0", operands);
> output_asm_insn ("beqi\t%0,.+8", operands);
> return "addi\t%0,r0,0xffffffff";
> }
> [(set_attr "type" "multi,multi,multi")
> (set_attr "mode" "DI")
> (set_attr "length" "20,20,20")])
Ug. Surely we can do better than this.
I suspect the best way to approach this problem is to let the generic
code do most of the work. If we don't implement extendsidi2, then
convert_move will call emit_store_flag, which will try (x >> 31). So
all we have to do is make sure that's optimized well.
If TARGET_BARREL_SHIFT, then we're done. Otherwise,
(define_insn "*ashrsi3_by31"
[(set (match_operand:SI 0 "register_operand" "=d")
(ashiftrt:SI (match_operand:SI 1 "register_operand" "d")
(const_int 31)))]
""
"add\tr0,%1,%1\;addkc\t%0,r0,r0\;rsubk\t%0,%0,r0")
That's already one instruction shorter than your sequence above, and
it'll also be used for other things.
I think you can implement both arithmetic and logical right shift via
rotate, similar to ashlsi3_with_rotate. E.g.
(define_insn "*ashrsi3_with_rotate"
[(set (match_operand:SI 0 "register_operand" "=d")
(ashiftrt:SI (match_operand:SI 1 "register_operand" "d")
(match_operand:SI 2 "const_int_operand" "n")))]
"INTVAL (operands[2]) >= 17"
{
int i, nshift = INTVAL (operands[2]);
operands[3] = gen_rtx_REG (SImode, MB_ABI_ASM_TEMP_REGNUM);
/* The first bit is special; we want to replicate the sign bit to
all bits of the destination. All other bits shift in normally. */
output_asm_insn ("add\t%3,%1,%1", operands);
output_asm_insn ("addkc\t%0,r0,r0", operands);
output_asm_insn ("rsubk\t%0,%0,r0", operands);
for (i = 31; i > nshift; --i)
{
output_asm_insn ("add\t%3,%3,%3", operands);
output_asm_insn ("addkc\t%0,%0,r0, operands);
}
}
[...])
Similarly for logical right shift.
> (define_insn "*ashlsi3_with_mul_delay"
> [(set (match_operand:SI 0 "register_operand" "=d")
> (ashift:SI (match_operand:SI 1 "register_operand" "d")
> (match_operand:SI 2 "immediate_operand" "I")))]
const_int_operand
> (define_insn "*ashlsi3_with_mul_nodelay"
> [(set (match_operand:SI 0 "register_operand" "=d")
> (ashift:SI (match_operand:SI 1 "register_operand" "d")
> (match_operand:SI 2 "immediate_operand" "I")))]
Err, this either needs to be merged with the previous, or you need to
use more specific predicates. So, either
const_0_14_operand -- for ashlsi3_with_mul_delay
const_int_operand -- for ashlsi3_with_mul_nodelay
or choose a letter constraint for 0..14 and then you can use 2 alternatives.
> (define_insn "*ashlsi3_with_rotate"
You can remove the final ANDI from by using MB_ABI_ASM_TEMP_REGNUM as an
intermediate.
More to come...
r~