This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: i386 GCC-3.4 & Mandrake-9.2 GCC-3.3.1 non-inline memset/memcpy


Honza wrote:
> Etienne wrote:
> >  For a project of mine (gujin at sourceforge) I cannot have non inline
> >  memset and memcpy because I am generating i80386 real mode code in two
> >  different code segments and have only one link (so name of functions
> >  in each segments have to be different).
> >  Most version of GCC up to GCC-3.4 do not insert implicit memset/memcpy
> >  in my code and I am not often using explicitely these functions. I am
> >  more coding things like:
> > 
> >  struct my_big_struct_type my_struct = (struct my_big_struct_type){};
> > 
> >  and so on ia32 usually a "rep stosl" is generated (the size of the
> >  structure is constant!).
> > 
> >  But for GCC-3.4 and Mandrake-9.2 GCC-3.3.1 it is no more the case,
> >  real call to those functions is done. I could generate non inline
> >  memset and memcpy for each code segment I am using by renamimg and
> >  prefixing memset and memcpy by the name of the segment - but it will
> >  not be very efficient and I wonder if there were an option to go back
> >  to the standard "rep stosl" and "rep cmpsl" inline.
> 
> Perhaps -minline-all-stringops will help you?
> GCC preferes a call for very large blocks as it expects library
> implementation to be smarter than pure rep movsl.

  Well, in this case -minline-all-stringops does not change anything,
 it even increases the number of memset/memcpy because my own
 replacement functions are not used (I need then to insert
 /usr/include/string.h).

  The problem I have is more that if I write my own memset/memcpy, it
 will be a rep movsl / rep stosl because I am generating for i386 and
 optimising for size. In fact I will more use stosb/movsb because I
 really do not care of the 1/1000 s I may win if (and only if) base
 address are aligned. After all, on newer processor, write combinning
 is enabled on main memory, and a rep stos[bwl] does not even hit the
 CODE L1 cache, leaving the processor saturate the DATA bus bandwidth.
 But I do understand that is different for other applications:
 after checking the assembler, memset/memcpy is called with number
 of bytes to copy quite high (> 50) so using 64/128 bit registers
 is usually faster.

  Anyways it seems that I will have to rewrite memset/memcpy for those
 construct:
struct my_struct st1 = (struct my_struct) {}, st2 = st1;
 And rename by an assembler macro inserted by asm("") the names
 of memset/memcpy depending on the current code segment.
 Can I safely assume that when the compiler decides to insert
 memset/memcpy, it will ignore the return value of those function,
 and the standard set of register is preserved / spilled ?

  Etienne.

___________________________________________________________
Do You Yahoo!? -- Une adresse @yahoo.fr gratuite et en français !
Yahoo! Mail : http://fr.mail.yahoo.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]