This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Improve -ftree-loop-distribution
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Richard Biener <richard dot guenther at gmail dot com>
- Cc: gcc at gcc dot gnu dot org, libc-alpha at sourceware dot org
- Date: Tue, 30 Jul 2013 17:15:09 +0200
- Subject: Improve -ftree-loop-distribution
Hi,
I thought about optimizing memcpy and have an idea to transform patterns
without having to deal with aliasing. When we are not sure about
aliasing we can still replace loop with call of this function (provided
that we know that n is large):
static int
__memcpy_loop(char *to,char *from, size_t n, int diff)
{
size_t i;
if (!overlap)
memcpy(to, from, n);
else
for (i=0; i<n; i++)
{
memmove(to,from,diff);
from+=diff;
to+=diff;
}
}
We could extract bit of performance by changing a function to nonstatic
one after linking. Then a gcc would provide its version and glibc could
add its own version and by symbol resolution it would be called when
present.
A second improvement is that patterns
short x[n]; // or int x[n];
for (i=0;i<n;i++)
x[i]=c;
we could be replaced with call to wmemset.
For initializing blocks of 8/16 bytes it would be easy to add
memset8/memset16 that use suitable arguments. We could apply same trick
for compatibility.
Performance would be nearly identical as they could be implemented as
short prolog followed by jump to memset.
Comments?