This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: i386 GCC-3.4 & Mandrake-9.2 GCC-3.3.1 non-inline memset/memcpy
- From: Etienne Lorrain <etienne_lorrain at yahoo dot fr>
- To: gcc at gcc dot gnu dot org
- Cc: Jan Hubicka <hubicka at ucw dot cz>
- Date: Tue, 4 Nov 2003 11:43:02 +0100 (CET)
- Subject: Re: i386 GCC-3.4 & Mandrake-9.2 GCC-3.3.1 non-inline memset/memcpy
Honza wrote:
> Etienne wrote:
> > For a project of mine (gujin at sourceforge) I cannot have non inline
> > memset and memcpy because I am generating i80386 real mode code in two
> > different code segments and have only one link (so name of functions
> > in each segments have to be different).
> > Most version of GCC up to GCC-3.4 do not insert implicit memset/memcpy
> > in my code and I am not often using explicitely these functions. I am
> > more coding things like:
> >
> > struct my_big_struct_type my_struct = (struct my_big_struct_type){};
> >
> > and so on ia32 usually a "rep stosl" is generated (the size of the
> > structure is constant!).
> >
> > But for GCC-3.4 and Mandrake-9.2 GCC-3.3.1 it is no more the case,
> > real call to those functions is done. I could generate non inline
> > memset and memcpy for each code segment I am using by renamimg and
> > prefixing memset and memcpy by the name of the segment - but it will
> > not be very efficient and I wonder if there were an option to go back
> > to the standard "rep stosl" and "rep cmpsl" inline.
>
> Perhaps -minline-all-stringops will help you?
> GCC preferes a call for very large blocks as it expects library
> implementation to be smarter than pure rep movsl.
Well, in this case -minline-all-stringops does not change anything,
it even increases the number of memset/memcpy because my own
replacement functions are not used (I need then to insert
/usr/include/string.h).
The problem I have is more that if I write my own memset/memcpy, it
will be a rep movsl / rep stosl because I am generating for i386 and
optimising for size. In fact I will more use stosb/movsb because I
really do not care of the 1/1000 s I may win if (and only if) base
address are aligned. After all, on newer processor, write combinning
is enabled on main memory, and a rep stos[bwl] does not even hit the
CODE L1 cache, leaving the processor saturate the DATA bus bandwidth.
But I do understand that is different for other applications:
after checking the assembler, memset/memcpy is called with number
of bytes to copy quite high (> 50) so using 64/128 bit registers
is usually faster.
Anyways it seems that I will have to rewrite memset/memcpy for those
construct:
struct my_struct st1 = (struct my_struct) {}, st2 = st1;
And rename by an assembler macro inserted by asm("") the names
of memset/memcpy depending on the current code segment.
Can I safely assume that when the compiler decides to insert
memset/memcpy, it will ignore the return value of those function,
and the standard set of register is preserved / spilled ?
Etienne.
___________________________________________________________
Do You Yahoo!? -- Une adresse @yahoo.fr gratuite et en français !
Yahoo! Mail : http://fr.mail.yahoo.com