This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Complex move by parts (PR rtl-optimization/20306)


On Thu, 10 Mar 2005, Richard Henderson wrote:
> I'm not thrilled with how you've rearranged the code.
> Nor am I especially thrilled with the target hook.
>
> I guess I'd be just as happy to simply change the default
> back to moving by parts.  Only use block move if try_int
> is true and we've got memories.

The issue is that different processors have different preferences,
even for the mem->mem case.  On PowerPC, apparently it's faster to
block move an array of doubles via FP registers than via integer
registers, as the peak FPU<->MEM bandwidth is higher than the peak
CPU<->MEM bandwidth.  Clearly, this isn't the case on IA-32 for
memory to memory moves, which are most efficiently implemented by
using integer load/stores and/or IA-32's block move instructions.

Apparently on PowerPC:

.L8:    stfd 13,0(3)
        stfd 0,8(3)
        addi 3,3,16
        bdnz .L8

is faster than

.L2:    stw 8,0(3)
        stw 11,4(3)
        stw 0,8(3)
        stw 10,12(3)
        addi 3,3,16
        bdnz .L2


Given, that the correct optimization choice is target dependent, and
that rtx_cost isn't fine grained enough to distinguish float vs. int
load/store costs, it seems that a target hook is the correct way to go.

Thoughts?

Roger
--


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]