This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Picking between alternative ways of expanding a section

From: Michael Hope <michaelh at juju dot net dot nz>
To: gcc at gcc dot gnu dot org
Date: Fri, 24 Jul 2009 16:22:31 +1200
Subject: Picking between alternative ways of expanding a section

Hi there.  This is in follow up to my email on the 24 th of May.

The short version is: how can I track down why GCC is picking between
two alternatives for implementing a function?  In a memcpy() where
Pmode == SImode, I get a near ideal implementation.  If Pmode ==
PSImode (due to limitations of the pointer registers) I get something
much worse.

The difference happens early on.  In the .128r.expand with Pmode ==
SImode I get:
 ;; MEM[base: to] = MEM[base: p];

With PSImode I get offset addressing instead:
;; MEM[base: pto + ivtmp.25] = MEM[base: pfrom + ivtmp.25];

This flows through into the actual code.

I assume this is due to GCC assuming that PSImode works differently to
SImode and that the cast/translation cost is enough to make offset
addressing overall cheaper.

The m32c compiler is the only other using PSImode but it doesn't
generate offsetted addresses.  The same things happen with and without
a basic TARGET_ADDRESS_COSTS and TARGET_RTX_COSTS.

I guess I want a way of telling the compiler that PSImode and SImode
are equivalent.

The longer version is:
The machine I'm working on has two special registers for memory access
that are backed by caches.  Any change to these registers can cause an
expensive cache load cycle so while they're great for memory access
they're terrible for general use.

The problem is that Pmode == SImode so the register allocator will now
and again use these registers for general operations.  I've
implemented a partial integer mode PSImode suggested by Mihael
Meissner and set Pmode to PSImode. This correctly separates things but
the compiler now generates significantly worse code.

The example is a simple memcpy():

void copy(int *pfrom, int *pto, int count)
{
  while (count != 0)
    {
      *pto = *pfrom;
      pto++;
      pfrom++;
      count--;
    }
}

If I have #define Pmode SImode then I get the near-best code:
copy:
	LOADACC, R12	;# 133	loadaccsi_insn/1
	STOREACC, R13	;# 134	storeaccsi_insn
	LOADLONG, #0	;# 139	loadaccsi_insn/2
	XOR, R13	;# 140	cmpccsi_insn/3
	LOADLONG, #.L4	;# 43	*bCCeq
	SKIP_IF
	STOREACC, PC
	LOADACC, R11	;# 121	loadaccsi_insn/1
	STOREACC, Y	;# 122	storeaccsi_insn
	LOADACC, R10	;# 127	loadaccsi_insn/1
	STOREACC, X	;# 128	storeaccsi_insn
.L3:
	LOADACC, (X)	;# 79	loadaccsi_insn/1
	STOREACC, (Y)	;# 86	storeaccsi_insn
	LOADLONG, #4	;# 149	loadaccsi_insn/2
	ADD, Y	;# 150	addsi3_acc
	ADD, X	;# 151	addsi3_acc
	LOADLONG, #-1	;# 103	loadaccsi_insn/2
	ADD, R12	;# 104	addsi3_acc
	LOADACC, R12	;# 109	loadaccsi_insn/1
	STOREACC, R10	;# 110	storeaccsi_insn
	LOADLONG, #0	;# 115	loadaccsi_insn/2
	XOR, R10	;# 116	cmpccsi_insn/3
	LOADLONG, #.L3	;# 57	*bCCne
	STOREACC, PC_IF
.L4:
	POP	;# 147	*expanded_return
	STOREACC, PC

Note the good
 	LOADACC, (X)	;# 79	loadaccsi_insn/1
	STOREACC, (Y)	;# 86	storeaccsi_insn
	LOADLONG, #4	;# 149	loadaccsi_insn/2
	ADD, Y	;# 150	addsi3_acc
	ADD, X	;# 151	addsi3_acc

in the middle.

Instead if I have #define Pmode PSImode I get
copy:
	LOADACC, R14	;# 186	loadaccsi_insn/1
	PUSH	;# 187	pushsi_acc
	LOADACC, R12	;# 163	loadaccsi_insn/1
	STOREACC, R13	;# 164	storeaccsi_insn
	LOADLONG, #0	;# 169	loadaccsi_insn/2
	XOR, R13	;# 170	cmpccsi_insn/3
	LOADLONG, #.L4	;# 43	*bCCeq
	SKIP_IF
	STOREACC, PC
	LOADLONG, #0	;# 157	loadaccsi_insn/2
	STOREACC, R13	;# 158	storeaccsi_insn
.L3:
	LOADACC, R13	;# 85	loadaccsi_insn/1
	STOREACC, X	;# 86	storeaccsi_insn
	; No-op truncate on X = X	;# 47	truncsipsi2/1
	LOADACC, R11	;# 91	loadaccpsi_insn/1
	STOREACC, Y	;# 92	storeaccpsi_insn
	LOADACC, X	;# 97	loadaccpsi_insn/1
	ADD, Y	;# 98	addpsi3_acc
	LOADACC, R10	;# 103	loadaccpsi_insn/1
	STOREACC, R14	;# 104	storeaccpsi_insn
	LOADACC, X	;# 109	loadaccpsi_insn/1
	ADD, R14	;# 110	addpsi3_acc
	LOADACC, R14	;# 115	loadaccpsi_insn/1
	STOREACC, X	;# 116	storeaccpsi_insn
	LOADACC, (X)	;# 121	loadaccsi_insn/1
	STOREACC, (Y)	;# 128	storeaccsi_insn
	LOADLONG, #-1	;# 133	loadaccsi_insn/2
	ADD, R12	;# 134	addsi3_acc
	LOADLONG, #4	;# 139	loadaccsi_insn/2
	ADD, R13	;# 140	addsi3_acc
	LOADACC, R12	;# 145	loadaccsi_insn/1
	STOREACC, X	;# 146	storeaccsi_insn
	LOADLONG, #0	;# 151	loadaccsi_insn/2
	XOR, X	;# 152	cmpccsi_insn/3
	LOADLONG, #.L3	;# 59	*bCCne
	STOREACC, PC_IF
.L4:
	POP	;# 178	popsi_insn
	STOREACC, R14
	POP	;# 179	*expanded_return
	STOREACC, PC

This is equivalent to:
 R13 = 0
L:
 X = R13
 X = truncate(X)
 Y = R11
 Y += X
 R14 = R10
 R14 += X
 X = R14
 (Y) = (X)
 R12 -= 1
 R13 += 4
 R14 = R12
 CMP R14, 0
 BCCNE

Thank you for your time,

-- Michael

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]