This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Strange R3000 MIPS code generation
- To: Dave Brown <dave at snsys dot com>, egcs at cygnus dot com
- Subject: Re: Strange R3000 MIPS code generation
- From: Martin Knoblauch <knobi at rocketmail dot com>
- Date: Thu, 7 Jan 1999 07:24:48 -0800 (PST)
- Reply-To: knobi at knobisoft dot de
Dave,
after looking at the assembler it seems that egcs has
recognize that:
a) the shift ops are not necessary in this case and
b) that the masking with "ff" is also redundant.
The question is, which version is more optimal on speed.
If gcc is faster, it could be that egcs is a bit overagressive :-)
The lbu to $2 is the assigment to ucVertex0, which is then
used to accumulate the final sum.
Martin
===
------------------------------------------------------
Martin Knoblauch
email: knobi@knobisoft.de or knobi@rocketmail.com
www: http://www.knobisoft.de
---Dave Brown <dave@snsys.com> wrote:
>
> Hi,
>
> I have been trying egcs C & C++ builds for producing mips R3000 code
for
> the Sony Playstation, and on the whole the results have been rather
good,
> especially for C++.
>
> Just one problem I've noticed is where you perform a load, as in
this C
> example:
>
> egcs generates 4 single byte loads (lbu) instead of one 4-byte load
(lw),
> as GCC does.
>
> ¦
> ¦unsigned long ulVertex = 0x04030201;
> ¦
> ¦extern int TestCompiler(void);
> ¦
> ¦int TestCompiler(void)
> ¦{
> ¦ /* We need to keep the extracted vertex indices */
> ¦ /* around, as we use them for normal lookup later */
> ¦ unsigned char ucVertex0;
> ¦ unsigned char ucVertex1;
> ¦ unsigned char ucVertex2;
> ¦ unsigned char ucVertex3;
> ¦
> ¦ /* In this example I'm just splitting a 32-bit global */
> ¦ /* but in our rendering code this read/split operation */
> ¦ /* takes place for _every_ polygon. */
> ¦ ucVertex0 = ((ulVertex)>>0)&0xff;
> ¦ ucVertex1 = ((ulVertex)>>8)&0xff;
> ¦ ucVertex2 = ((ulVertex)>>16)&0xff;
> ¦ ucVertex3 = ((ulVertex)>>24)&0xff;
> ¦
> ¦ /* I've just put this here to use the results.. */
> ¦ /* otherwise the optimiser removes all the code in */
> ¦ /* this function. */
> ¦ return (ucVertex0 + ucVertex1 + ucVertex2 + ucVertex3);
> ¦}
> ¦
> ¦main()
> ¦{
> ¦ return (0);
> ¦}
> ¦
>
> egcs 1.1 & 1.1.1 produces :
>
> TestCompiler:
> .frame $sp,0,$31
> .mask 0x00000000,0
> .fmask 0x00000000,0
> lbu $2,ulVertex
> lbu $3,ulVertex+1
> lbu $4,ulVertex+2
> lbu $5,ulVertex+3
> addu $2,$2,$3
> addu $2,$2,$4
> .set noreorder
> .set nomacro
> j $31
> addu $2,$2,$5
> .set macro
> .set reorder
>
> whereas gcc 2.8.1 gives the better code:
>
> TestCompiler:
> .frame $sp,0,$31
> .mask 0x00000000,0
> .fmask 0x00000000,0
> lw $5,ulVertex
> lbu $2,ulVertex
> srl $3,$5,8
> srl $4,$5,16
> andi $3,$3,0x00ff
> addu $2,$2,$3
> andi $4,$4,0x00ff
> addu $2,$2,$4
> srl $5,$5,24
> .set noreorder
> .set nomacro
> j $31
> addu $2,$2,$5
> .set macro
> .set reorder
>
> Does anyone have any idea why egcs is using 4 single byte loads ?
Or even
> why gcc puts the extra (as far as I can tell) redundant lbu after
the lw ?
> Both sets of asm were generated using -O2.
>
> I've tried meddling with the settings in mips.h but to no avail. Can
> anyone help ?
>
> Many thanks,
>
> Dave Brown
>
_________________________________________________________
DO YOU YAHOO!?
Get your free @yahoo.com address at http://mail.yahoo.com