This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

bzero optimization rarely does


I asked this question a little while ago but got no reaction, so I'll
condense it to its essence and ask it again.

The 3.x series compilers do a pretty nice job of optimizing memcpy
with constant length argument into straight loads/stores.

They also have machinery to optimize bzero and memset in a similar
way.  But that code is far more limited in what it handles, so much so
that most cases are missed.

It may be target dependent, but what I found for MIPS at least is
surprising.  

It optimized ONLY the case where a bzero can be turned into a SINGLE
store instruction.  Anything that takes more than one store is turned
into a library call.

That doesn't make sense, because a function call is way more expensive
than a single store.  

I looked at the code a bit and it isn't all that obvious why this is
happening.  I think the culprit is the "can_store_by_pieces" logic. 

Would people agree that bzero should become straight code for cases
that are more than just a single store instruction?  If yes, how does
one go about changing that?  Who might do it?

    paul


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]