This is the mail archive of the
mailing list for the GCC project.
Re: gcc 3.1 is still very slow, compared to 2.95.3
- From: law at redhat dot com
- To: jseward at acm dot org
- Cc: "David S. Miller" <davem at redhat dot com>, neil at daikokuya dot demon dot co dot uk, gcc at gcc dot gnu dot org, njn25 at cam dot ac dot uk
- Date: Sun, 19 May 2002 10:38:19 -0600
- Subject: Re: gcc 3.1 is still very slow, compared to 2.95.3
- Reply-to: law at redhat dot com
In message <3CE78FCF.5F0F7249@acm.org>, Julian Seward writes:
> "David S. Miller" wrote:
> > From: Neil Booth <firstname.lastname@example.org>
> > Date: Sun, 19 May 2002 08:07:03 +0100
> > Results John posted had memset very high on the list, so I suspect
> > someone is. I think every tree and rtx allocated is memset to
> > zero.
> > What overkill, it's clearing out one word.
> > When optimizing, GCC should turn that into an inline
> > store into the first word of the rtx though....
> > (Dave checks...)
> > Yes, it does optimize this, but into 3 byte stores. One of
> > which overlaps with the PUT_CODE (rt, code) rtx_alloc does.
> > :-(
> > On Sparc this is:
> > stb %g0, [%rt + 1]
> > sth code, [%rt]
> > stb %g0, [%rt + 2]
> > stb %g0, [%rt + 3]
> > When it should be optimized into:
> > sth code, [%rt]
> > sth %g0, [%rt + 2]
> Sure, fixing this might reduce your insn count, but my point was
> it's killing you with L1 w misses. Can you skip the initialisation?
I don't think it can be easily skipped, though we may be able to reduce
the amount of memory traffic if we expose more of the internals of the
rtx_def structure to rtx_alloc.
Right now we have:
memset (rt, 0, sizeof (struct rtx_def) - sizeof (rtunion));
We could to do something like this, but unfortunately you can't take the
address of a bitfield:
memset (&rt->mode, 0, &rt->fld - &rt->mode);
We could do something like:
memset (((char *)rt)+2, 0, 2);
But that really exposes too much detail about the structure as well as the
host system. The trick is to find something that is reasonably portable
which allows us to initialize just the mode and various flags (all encoded
as bitfields) within that structure.
> How many of those words will get dragged into L1 D and then never
> read or written again?
Probably none. We're basically initializing the mode and a bunch of flag
bits. Both of which are highly likely to be read/written again later.
> Is this something that 2.95.3 also does, but 3.1's multiple allocation
> areas makes much worse?
The 2.95.3 code for the initialization is basically the same. 2.8 code is
somewhat different, but violates ANSI/ISO aliasing rules and can result in
incorrect code due to those violations.