This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: PowerPC register and memory cost update
- From: Segher Boessenkool <segher at koffie dot nl>
- To: David Edelsohn <dje at watson dot ibm dot com>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Thu, 24 Oct 2002 22:42:59 +0200
- Subject: Re: PowerPC register and memory cost update
- References: <200210231723.NAA30872@makai.watson.ibm.com>
> +
> + /* A C expression returning the cost of moving data from a register of class
> + CLASS1 to one of CLASS2. */
> +
> + int
> + rs6000_register_move_cost (mode, from, to)
> + enum machine_mode mode;
> + enum reg_class from, to;
> + {
> + /* Moves from/to GENERAL_REGS. */
> + if (reg_classes_intersect_p (to, GENERAL_REGS)
> + || reg_classes_intersect_p (from, GENERAL_REGS))
> + {
> + if (! reg_classes_intersect_p (to, GENERAL_REGS))
> + from = to;
> +
> + if (from == FLOAT_REGS || from == ALTIVEC_REGS)
> + return (rs6000_memory_move_cost (mode, from, 0)
> + + rs6000_memory_move_cost (mode, GENERAL_REGS, 0));
> +
> + /* It's more expensive to move CR_REGS than CR0_REGS because of the shift...*/
> + else if (from == CR_REGS)
> + return 4;
> +
> + else
> + /* A move will cost one instruction per GPR moved. */
> + return 2 * HARD_REGNO_NREGS (0, mode);
> + }
> +
> + /* Moving between two similar registers is just one instruction. */
> + else if (reg_classes_intersect_p (to, from))
> + return mode == TFmode ? 4 : 2;
> +
> + /* Everything else has to go through GENERAL_REGS. */
> + else
> + return (rs6000_register_move_cost (mode, GENERAL_REGS, to)
> + + rs6000_register_move_cost (mode, from, GENERAL_REGS));
> + }
This doesn't handle special registers (lr, ctr) specially -- moves between
general and special registers get the same cost as general<->general moves,
and special<->special moves get twice that cost. The old version didn't
make special<->general more expensive than general<->general either, but at
least it made special<->special really expensive.
I can imagine this hurts a lot on register-starved routines, but I have no
benchmarks to back this up. So please test or ignore :)
On a related note, looking at scheduling dumps (from -da), it seems to me
that GCC thinks loads have latency of 2 cycles (correct) and throughput of
1 per 2 cycles (incorrect for most cpu's, and certainly for the 7400 i had
it optimize for: it can issue one load per cycle). This hurts my indirect
threaded code interpreter a lot.
Could you point me at the "guilty" part of GCC? I just don't seem to be
able to find where the issue rates are described.
Segher
> ! On the RS/6000, copying between floating-point and fixed-point
> ! registers is expensive. */
> !
> ! #define REGISTER_MOVE_COST(MODE, CLASS1, CLASS2) \
> ! ((CLASS1) == FLOAT_REGS && (CLASS2) == FLOAT_REGS ? 2 \
> ! : (CLASS1) == FLOAT_REGS && (CLASS2) != FLOAT_REGS ? 10 \
> ! : (CLASS1) != FLOAT_REGS && (CLASS2) == FLOAT_REGS ? 10 \
> ! : (CLASS1) == ALTIVEC_REGS && (CLASS2) != ALTIVEC_REGS ? 20 \
> ! : (CLASS1) != ALTIVEC_REGS && (CLASS2) == ALTIVEC_REGS ? 20 \
> ! : (((CLASS1) == SPECIAL_REGS || (CLASS1) == MQ_REGS \
> ! || (CLASS1) == LINK_REGS || (CLASS1) == CTR_REGS \
> ! || (CLASS1) == LINK_OR_CTR_REGS) \
> ! && ((CLASS2) == SPECIAL_REGS || (CLASS2) == MQ_REGS \
> ! || (CLASS2) == LINK_REGS || (CLASS2) == CTR_REGS \
> ! || (CLASS2) == LINK_OR_CTR_REGS)) ? 10 \
> ! : 2)