This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH MIPS][LS3A] Define peephole2 for Loongson3A 128 bits load/store
- From: Richard Sandiford <rsandifo at nildram dot co dot uk>
- To: Mingjie Xing <mingjie dot xing at gmail dot com>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Tue, 28 Dec 2010 11:08:56 +0000
- Subject: Re: [PATCH MIPS][LS3A] Define peephole2 for Loongson3A 128 bits load/store
- References: <AANLkTinruzzQR0GiOQfVbHA+d-H6P7+1oB3j=H9eYYkK@mail.gmail.com>
Thanks for the patch. Given that we're deep in stage 3, I think this
will have to wait until 4.7 stage 1 opens.
Mingjie Xing <mingjie.xing@gmail.com> writes:
> +;; Loongson 128-bits load/store peephole optimization. Try to combine two
> +;; 64 bits load/store insns into a 128 bits load/store insn.
> +
> +(define_peephole2
> + [(set (mem:DI (plus (match_operand:SI 0 "register_operand")
> + (match_operand:SI 1 "const_int_operand")))
> + (match_operand:DI 2 "register_operand"))
> + (set (mem:DI (plus (match_dup 0)
> + (match_operand:SI 3 "const_int_operand")))
> + (match_operand:DI 4 "register_operand"))]
> + "loongson_128bits_load_store_p (operands[0], operands[1], operands[3],
> + operands[2], operands[4], 0)"
> + [(parallel
> + [(set (mem:DI (plus (match_operand:SI 0 "register_operand")
> + (match_operand:SI 1 "const_int_operand")))
> + (match_operand:DI 2 "register_operand"))
> + (set (mem:DI (plus (match_dup 0)
> + (match_operand:SI 3 "const_int_operand")))
> + (match_operand:DI 4 "register_operand"))])])
There are a couple of problems here:
1. By hard-coding the "plus" in the pattern, you stop combinations
that involve a zero offset.
2. You're not capturing the original MEM rtxes themselves. You're
regenerating them from the addresses instead. This drops any
MEM_ATTRs that might have been attached to the original MEMs.
You could avoid both problems by matching the memory operands:
(define_peephole2
[(set (match_operand:DI 0 "memory_operand")
(match_operand:DI 1 "register_operand"))
(set (match_operand:DI 2 "memory_operand")
(match_operand:DI 3 "register_operand"))]
and using C code to check whether operands 0 and 2 are 8 bytes apart.
The C code should use mips_classify_address to decompose the address.
> + if (INTVAL (operands[1]) < INTVAL (operands[3]))
> + {
> + if (which_alternative == 0)
> + return "gssq\t%4,%2,%1>>4(%0)";
> + else
> + return "gssqc1\t%4,%2,%1>>4(%0)";
> + }
So the assembly syntax represents "$29+16" as "1($29)"? That seems like
a very unfortunate choice. Is there still a chance that we could change it,
or would it break too much existing code?
(I realise the idea was that only multiples of 16 are acceptable,
but I would have expected the assembler to take an unadjusted offset
and check that it was a multiple of 16. It's certainly easier to write
assembly by hand if addresses have the same meaning in all instructions.
It also helps when reading disassembly.)
> +/* Return 1, if it's valid to do 128-bits load/store peephole optimization,
> + otherwise return 0.
> +
> + Loongson 128-bits load/store requires the address (BASE + OFFSET) should
> + be 128 bits aligned.
> +
> + We currently just conside the case that BASE_REG is SP register and the
> + STACK_BOUNDARY is 128, which can guarantee that the BASE is 128 bits
> + aligned. Thus the OFFSET, which is the minimun of OFFSET1 and OFFSET2,
> + should also be 128 bits aligned. */
That seems overly restrictive. If you pass the memory operands (above),
you can use MEM_ALIGN to check the alignment of the lower address.
You should reject volatile memories.
Richard