This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: help interpreting gcc 4.1.1 optimisation bug


andrew@walrond.org writes:
 > On Tue, Jun 13, 2006 at 10:37:29AM +0000, andrew@walrond.org wrote:
 > > On Mon, Jun 12, 2006 at 04:59:04PM -0700, Ian Lance Taylor wrote:
 > > > 
 > > > Probably better to say that these are read-write operands, using the
 > > > '+' constraint.
 > > > 
 > > > > Now everything works fine at -O3. However, I really don't understand
 > > > > the '&' early clobber constraint modifer. What use is it?
 > > > 
 > > > It is needed for assembly code which has both outputs and inputs, and
 > > > which includes more than one instruction, such that at least one of
 > > > the outputs is generated by an instruction which runs before another
 > > > instruction which requires one of the inputs.  The '&' constraint
 > > > tells gcc that some of the output operands are produced before some of
 > > > the input operands are used.  gcc will then avoid allocating the input
 > > > and output operands to the same register.
 > > > 
 > > 
 > > Ian, thanks for the reply.
 > > 
 > > So, in conclusion, a correct longcpy() would look like this:
 > > 
 > > void longcpy(long* _dst, long* _src, unsigned _numwords)
 > >  {
 > >      asm volatile (
 > >          "cld         \n\t"
 > >          "rep         \n\t"
 > >          "movsl       \n\t"
 > >  	// Outputs (read/write)
 > >          : "+S" (_src), "+D" (_dst), "+c" (_numwords)
 > >  	// Inputs - specify same registers as outputs
 > >          : "0"  (_src), "1"  (_dst), "2"  (_numwords)
 > >  	// Clobbers: direction flag, so "cc", and "memory"
 > >          : "cc", "memory"
 > >          );
 > >  }
 > > 
 > 
 > Which doesn't compile ;(
 > 
 > The correct version is I think,
 > 
 > void longcpy(long* _dst, long* _src, unsigned _numwords)
 >  {
 >      asm volatile (
 >          "cld         \n\t"
 >          "rep         \n\t"
 >          "movsl       \n\t"
 >  	// Outputs (read/write)
 >          : "=S" (_src), "=D" (_dst), "=c" (_numwords)
 >  	// Inputs - specify same registers as outputs
 >          : "0"  (_src), "1"  (_dst), "2"  (_numwords)
 >  	// Clobbers: direction flag, so "cc", and "memory"
 >          : "cc", "memory"
 >          );
 >  }

All you've got here is an inline asm version of 

inline void longcpy(long* _dst, long* _src, unsigned _numwords)
{
  __builtin_memcpy (_dst, _src, _numwords * sizeof (long));
}

which gcc will optimize if it can.  

These days, "rep movs" is not as advantageous as it once was, and you
may get better performance by allowing gcc to choose how to do memory
copies.

Andrew.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]