This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: gcc 3.1 is still very slow, compared to 2.95.3
> From: Jan Hubicka <jh@suse.cz>
> Date: Sun, 19 May 2002 13:52:06 +0200
>
> > Yes, it does optimize this, but into 3 byte stores. One of
> > which overlaps with the PUT_CODE (rt, code) rtx_alloc does.
> > :-(
>
> That is probably because GCC is unable to detect the alignment for some
> purpose. I don't see why :(
>
> I promise to look more deeply into this. It may be a Sparc specific
> problem because x86 outputs:
>
> movl $0, (%eax)
> movw code, (%eax)
>
> But I think GCC should really give us:
>
> movw $0, 2(%eax)
> movw code, (%eax)
>
> Well, better yet:
>
> movw code, (%eax)
> movw $0, 2(%eax)
I don't think there is any framework for partially dead stores.
I also believe at least Athlon will combine stores in any order and
that majority of recent chips do (P3/P4).
On the other hand, I think it can be better to construct the value
in register and store once.
>
> so it actually combines in the store buffer of the processor. (This
> is one area GCC really needs to improve, ordering consequetive stores
> to the same area)
>
> The Sparc output is very perplexing because GCC eliminated one of
> the byte stores that overlapped the store of "code" but not both
> of them!
:) I think this is present in the flow.c dead store ellimination - it checks
address for equivalence.
Honza