This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Picking between alternative ways of expanding a section
- From: Michael Hope <michaelh at juju dot net dot nz>
- To: gcc at gcc dot gnu dot org
- Date: Fri, 24 Jul 2009 16:22:31 +1200
- Subject: Picking between alternative ways of expanding a section
Hi there. This is in follow up to my email on the 24 th of May.
The short version is: how can I track down why GCC is picking between
two alternatives for implementing a function? In a memcpy() where
Pmode == SImode, I get a near ideal implementation. If Pmode ==
PSImode (due to limitations of the pointer registers) I get something
much worse.
The difference happens early on. In the .128r.expand with Pmode ==
SImode I get:
;; MEM[base: to] = MEM[base: p];
With PSImode I get offset addressing instead:
;; MEM[base: pto + ivtmp.25] = MEM[base: pfrom + ivtmp.25];
This flows through into the actual code.
I assume this is due to GCC assuming that PSImode works differently to
SImode and that the cast/translation cost is enough to make offset
addressing overall cheaper.
The m32c compiler is the only other using PSImode but it doesn't
generate offsetted addresses. The same things happen with and without
a basic TARGET_ADDRESS_COSTS and TARGET_RTX_COSTS.
I guess I want a way of telling the compiler that PSImode and SImode
are equivalent.
The longer version is:
The machine I'm working on has two special registers for memory access
that are backed by caches. Any change to these registers can cause an
expensive cache load cycle so while they're great for memory access
they're terrible for general use.
The problem is that Pmode == SImode so the register allocator will now
and again use these registers for general operations. I've
implemented a partial integer mode PSImode suggested by Mihael
Meissner and set Pmode to PSImode. This correctly separates things but
the compiler now generates significantly worse code.
The example is a simple memcpy():
void copy(int *pfrom, int *pto, int count)
{
while (count != 0)
{
*pto = *pfrom;
pto++;
pfrom++;
count--;
}
}
If I have #define Pmode SImode then I get the near-best code:
copy:
LOADACC, R12 ;# 133 loadaccsi_insn/1
STOREACC, R13 ;# 134 storeaccsi_insn
LOADLONG, #0 ;# 139 loadaccsi_insn/2
XOR, R13 ;# 140 cmpccsi_insn/3
LOADLONG, #.L4 ;# 43 *bCCeq
SKIP_IF
STOREACC, PC
LOADACC, R11 ;# 121 loadaccsi_insn/1
STOREACC, Y ;# 122 storeaccsi_insn
LOADACC, R10 ;# 127 loadaccsi_insn/1
STOREACC, X ;# 128 storeaccsi_insn
.L3:
LOADACC, (X) ;# 79 loadaccsi_insn/1
STOREACC, (Y) ;# 86 storeaccsi_insn
LOADLONG, #4 ;# 149 loadaccsi_insn/2
ADD, Y ;# 150 addsi3_acc
ADD, X ;# 151 addsi3_acc
LOADLONG, #-1 ;# 103 loadaccsi_insn/2
ADD, R12 ;# 104 addsi3_acc
LOADACC, R12 ;# 109 loadaccsi_insn/1
STOREACC, R10 ;# 110 storeaccsi_insn
LOADLONG, #0 ;# 115 loadaccsi_insn/2
XOR, R10 ;# 116 cmpccsi_insn/3
LOADLONG, #.L3 ;# 57 *bCCne
STOREACC, PC_IF
.L4:
POP ;# 147 *expanded_return
STOREACC, PC
Note the good
LOADACC, (X) ;# 79 loadaccsi_insn/1
STOREACC, (Y) ;# 86 storeaccsi_insn
LOADLONG, #4 ;# 149 loadaccsi_insn/2
ADD, Y ;# 150 addsi3_acc
ADD, X ;# 151 addsi3_acc
in the middle.
Instead if I have #define Pmode PSImode I get
copy:
LOADACC, R14 ;# 186 loadaccsi_insn/1
PUSH ;# 187 pushsi_acc
LOADACC, R12 ;# 163 loadaccsi_insn/1
STOREACC, R13 ;# 164 storeaccsi_insn
LOADLONG, #0 ;# 169 loadaccsi_insn/2
XOR, R13 ;# 170 cmpccsi_insn/3
LOADLONG, #.L4 ;# 43 *bCCeq
SKIP_IF
STOREACC, PC
LOADLONG, #0 ;# 157 loadaccsi_insn/2
STOREACC, R13 ;# 158 storeaccsi_insn
.L3:
LOADACC, R13 ;# 85 loadaccsi_insn/1
STOREACC, X ;# 86 storeaccsi_insn
; No-op truncate on X = X ;# 47 truncsipsi2/1
LOADACC, R11 ;# 91 loadaccpsi_insn/1
STOREACC, Y ;# 92 storeaccpsi_insn
LOADACC, X ;# 97 loadaccpsi_insn/1
ADD, Y ;# 98 addpsi3_acc
LOADACC, R10 ;# 103 loadaccpsi_insn/1
STOREACC, R14 ;# 104 storeaccpsi_insn
LOADACC, X ;# 109 loadaccpsi_insn/1
ADD, R14 ;# 110 addpsi3_acc
LOADACC, R14 ;# 115 loadaccpsi_insn/1
STOREACC, X ;# 116 storeaccpsi_insn
LOADACC, (X) ;# 121 loadaccsi_insn/1
STOREACC, (Y) ;# 128 storeaccsi_insn
LOADLONG, #-1 ;# 133 loadaccsi_insn/2
ADD, R12 ;# 134 addsi3_acc
LOADLONG, #4 ;# 139 loadaccsi_insn/2
ADD, R13 ;# 140 addsi3_acc
LOADACC, R12 ;# 145 loadaccsi_insn/1
STOREACC, X ;# 146 storeaccsi_insn
LOADLONG, #0 ;# 151 loadaccsi_insn/2
XOR, X ;# 152 cmpccsi_insn/3
LOADLONG, #.L3 ;# 59 *bCCne
STOREACC, PC_IF
.L4:
POP ;# 178 popsi_insn
STOREACC, R14
POP ;# 179 *expanded_return
STOREACC, PC
This is equivalent to:
R13 = 0
L:
X = R13
X = truncate(X)
Y = R11
Y += X
R14 = R10
R14 += X
X = R14
(Y) = (X)
R12 -= 1
R13 += 4
R14 = R12
CMP R14, 0
BCCNE
Thank you for your time,
-- Michael