This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, i386]: Do not emit "cld" instructions


> On 12/5/06, Uros Bizjak <ubizjak@gmail.com> wrote:
> 
> >>>According to the guide, it applies to pentium4.
> >>>
> >>>
> >>
> >>This is pretty high.  Would be possible for you to rerun the
> >>test_stringops script on P4 machine after removing the CLD?  If it
> >>really is 48 cycles, it should show difference in the preffered memcpy
> >>codegen.
> >>
> >>
> >>
> >Sure! But I think that this is an error in the optimizing guide.
> 
> ... NOT.

Funny, it is great you noticed!
> 
> --cut here--
> #define rdtsc(value) \
> asm volatile("rdtsc":"=A" (value))
> 
> int main(int argc, char ** argv)
> {
>  unsigned long long a,b;
> 
>  rdtsc(a);
>  rdtsc(b);
>  printf("%lld\n", b-a);
> 
>  rdtsc(a);
>  asm volatile ("std; cld;");

P4 might behave in a way just removing redundant CLDs so there is still
hope that the benchmark would fare well if std wasn't included.  But
definitly, it would be great if you could rerun the test_stringop script
on P4 macihne I sent you.  It ought to make rep;mov sequences a lot more
fruitful that should in turn reduce code size.
I will re-check Athlons/K8s/Centrinos, but there cld is supposed to be
cheap.

Honza
>  rdtsc(b);
>  printf("%lld\n", b-a);
> 
>  return 0;
> }
> 
> --cut here--
> 
> gcc -O2
> ./a.out
> 
> ./a.out
> 84
> 172
> ./a.out
> 84
> 172
> /a.out
> 84
> 188
> /a.out
> 84
> 188
> ./a.out
> 84
> 172
> 
> Uros.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]