This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH MIPS][LS3A] Define peephole2 for Loongson3A 128 bits load/store


Thanks for the patch.  Given that we're deep in stage 3, I think this
will have to wait until 4.7 stage 1 opens.

Mingjie Xing <mingjie.xing@gmail.com> writes:
> +;; Loongson 128-bits load/store peephole optimization.  Try to combine two
> +;; 64 bits load/store insns into a 128 bits load/store insn.
> +
> +(define_peephole2
> +  [(set (mem:DI (plus (match_operand:SI 0 "register_operand")
> +                      (match_operand:SI 1 "const_int_operand")))
> +        (match_operand:DI 2 "register_operand"))
> +   (set (mem:DI (plus (match_dup 0)
> +                      (match_operand:SI 3 "const_int_operand")))
> +        (match_operand:DI 4 "register_operand"))]
> +  "loongson_128bits_load_store_p (operands[0], operands[1], operands[3],
> +                                  operands[2], operands[4], 0)"
> +  [(parallel
> +    [(set (mem:DI (plus (match_operand:SI 0 "register_operand")
> +                        (match_operand:SI 1 "const_int_operand")))
> +          (match_operand:DI 2 "register_operand"))
> +     (set (mem:DI (plus (match_dup 0)
> +                        (match_operand:SI 3 "const_int_operand")))
> +          (match_operand:DI 4 "register_operand"))])])

There are a couple of problems here:

1. By hard-coding the "plus" in the pattern, you stop combinations
   that involve a zero offset.

2. You're not capturing the original MEM rtxes themselves.  You're
   regenerating them from the addresses instead.  This drops any
   MEM_ATTRs that might have been attached to the original MEMs.

You could avoid both problems by matching the memory operands:

(define_peephole2
  [(set (match_operand:DI 0 "memory_operand")
        (match_operand:DI 1 "register_operand"))
   (set (match_operand:DI 2 "memory_operand")
        (match_operand:DI 3 "register_operand"))]

and using C code to check whether operands 0 and 2 are 8 bytes apart.
The C code should use mips_classify_address to decompose the address.

> +  if (INTVAL (operands[1]) < INTVAL (operands[3]))
> +    {
> +       if (which_alternative == 0)
> +         return "gssq\t%4,%2,%1>>4(%0)";
> +       else
> +         return "gssqc1\t%4,%2,%1>>4(%0)";
> +    }

So the assembly syntax represents "$29+16" as "1($29)"?  That seems like
a very unfortunate choice.  Is there still a chance that we could change it,
or would it break too much existing code?

(I realise the idea was that only multiples of 16 are acceptable,
but I would have expected the assembler to take an unadjusted offset
and check that it was a multiple of 16.  It's certainly easier to write
assembly by hand if addresses have the same meaning in all instructions.
It also helps when reading disassembly.)

> +/* Return 1, if it's valid to do 128-bits load/store peephole optimization, 
> +   otherwise return 0.
> +
> +   Loongson 128-bits load/store requires the address (BASE + OFFSET) should
> +   be 128 bits aligned.
> +
> +   We currently just conside the case that BASE_REG is SP register and the
> +   STACK_BOUNDARY is 128, which can guarantee that the BASE is 128 bits
> +   aligned.  Thus the OFFSET, which is the minimun of OFFSET1 and OFFSET2,
> +   should also be 128 bits aligned.  */

That seems overly restrictive.  If you pass the memory operands (above),
you can use MEM_ALIGN to check the alignment of the lower address.

You should reject volatile memories.

Richard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]