This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Improve -ftree-loop-distribution


Hi,
I thought about optimizing memcpy and have an idea to transform patterns
without having to deal with aliasing. When we are not sure about
aliasing we can still replace loop with call of this function (provided
that we know that n is large):

static int
__memcpy_loop(char *to,char *from, size_t n, int diff)
{
  size_t i;
  if (!overlap)
    memcpy(to, from, n);
  else 
    for (i=0; i<n; i++)
      {
        memmove(to,from,diff);
        from+=diff;
        to+=diff;
      }
}

We could extract bit of performance by changing a function to nonstatic
one after linking. Then a gcc would provide its version and glibc could
add its own version and by symbol resolution it would be called when
present.

A second improvement is that patterns
short x[n]; // or int x[n];
for (i=0;i<n;i++)
  x[i]=c;
we could be replaced with call to wmemset. 
For initializing blocks of 8/16 bytes it would be easy to add
memset8/memset16 that use suitable arguments. We could apply same trick
for compatibility.
Performance would be nearly identical as they could be implemented as
short prolog followed by jump to memset.

Comments?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]