This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
bzero optimization rarely does
- From: Paul Koning <pkoning at equallogic dot com>
- To: gcc at gcc dot gnu dot org
- Date: Wed, 10 Jul 2002 17:51:20 -0400
- Subject: bzero optimization rarely does
I asked this question a little while ago but got no reaction, so I'll
condense it to its essence and ask it again.
The 3.x series compilers do a pretty nice job of optimizing memcpy
with constant length argument into straight loads/stores.
They also have machinery to optimize bzero and memset in a similar
way. But that code is far more limited in what it handles, so much so
that most cases are missed.
It may be target dependent, but what I found for MIPS at least is
surprising.
It optimized ONLY the case where a bzero can be turned into a SINGLE
store instruction. Anything that takes more than one store is turned
into a library call.
That doesn't make sense, because a function call is way more expensive
than a single store.
I looked at the code a bit and it isn't all that obvious why this is
happening. I think the culprit is the "can_store_by_pieces" logic.
Would people agree that bzero should become straight code for cases
that are more than just a single store instruction? If yes, how does
one go about changing that? Who might do it?
paul