This is the mail archive of the
mailing list for the GCC project.
Re: Builtin expansion versus headers optimization: Reductions
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Joseph Myers <joseph at codesourcery dot com>
- Cc: gcc at gcc dot gnu dot org, libc-alpha at sourceware dot org
- Date: Thu, 4 Jun 2015 16:35:30 +0200
- Subject: Re: Builtin expansion versus headers optimization: Reductions
- Authentication-results: sourceware.org; auth=none
- References: <20150604105929 dot GA19141 at domone> <alpine dot DEB dot 2 dot 10 dot 1506041206200 dot 31862 at digraph dot polyomino dot org dot uk>
On Thu, Jun 04, 2015 at 12:26:03PM +0000, Joseph Myers wrote:
> On Thu, 4 Jun 2015, OndÅej BÃlka wrote:
> > As I commented on libc-alpha list that for string functions a header
> > expansion is better than builtins from maintainance perspective and also
> > that a header is lot easier to write and review than doing that in gcc
> > Jeff said that it belongs to gcc. When I asked about benefits he
> > couldn't say any so I ask again where are benefits of that?
> One might equally ask where the benefits are of doing it in glibc headers
> and so not having optimizations in kernel space, not having optimizations
> on existing systems where using a newer GCC is much easier and more common
> than using a newer glibc, not having information from optimizers about
> alignment, etc.
I wanted answer like that first time not handwaving to add everything
into gcc builtins no matter what.
For kernel its bit off topic as these optimizations wouldn't be enough
as he uses its own functions. I could submit generic string routines
there to improve performance.
Kernel needs to be special-cased to avoid oops like transforming
memchr(x,0,n) => x+strnlen(x,n) which would be regression when kernel
implementation has assembly memchr but not strnlen which is rarely used.
For alignment etc answer would be that gcc doesn't use alignment
information well and should improve generic handling of it. For example
third party macros could result in following
foo (char *bar)
char *x = malloc(52);
x = x - ((uintptr_t) x) % 16;
where while gcc changes that into x & -16 it doesn't use alignment
information to eliminate that. In same way users would benefit from
checking alignments by __builtin_constant_p(((uintptr_t)x) & 15);
As newer gcc is good point that I wasn't aware.
> The aim should be to make the GNU system as a whole as good as possible
> rather than to argue for the merits of implementing things in the piece of
> the system you're most familiar with. That means working collaboratively
> to understand the best place for each optimization to go and get it
I don't do that. Just that there will be plenty of special cases that on
each architecture you need to do different transformation.
> > Here I cover problem of reductions that you should use foo instead bar.
> > Problem is that these depend on quality of implementation so its
> > questionable how to check these in gcc.
> I don't think either GCC or glibc should generally be working around
> suboptimality of the other. So glibc should generally expect GCC
> optimizations to be effective, and GCC should generally expect libc
> functions to be optimally implemented (so calling a more specific function
> is generally better than calling a less specific one - and only if any
> overhead would be constant, as in the memcpy / mempcpy case, are icache
> issues from calling less common functions relevant).
Miswrote this one. I mean depending on architecture performance profile
with that. Depending on architecture different transformations would be
optimal just because performance ratio of some functions changed. And
you cannot assume that functions are implemented optimally as there are
plenty of possible performance improvements for most architectures,
which could change every time new instructions become available.
You have problem that function running time is practically constant.
Most inputs are short (less than 64 bytes) so these constants matter
most and values for large sizes are mostly irrelevant.
> > Then there is missed optimization by gcc for following reduction chain
> > strchr (x, 0) -> strchrnul (x, 0) -> rawmemchr (x, 0) -> x + strlen (x)
> strchr -> strlen optimization is meaningful even when the nonstandard
> functions strchrnul and rawmemchr are not available.
> > Next missed reduction is
> > strdup (x) => s = strlen (x); memcpy (malloc (s+1), x, s+1)
> That's not valid when malloc returns NULL; you need to insert a check for
> NULL there.
Also correct, main point was not make strdup stop optimizations.
> > Again is this worth a gcc pass?
> This isn't a matter of compiler passes; it's additional checks in existing
> built-in function handling. Maybe that built-in function handling should
> move to the match-and-simplify infrastructure (some, for libm functions
> and bswap, already has) to make this even simpler to implement.
Still there would likely be lot of architecture #ifdef specialcasing there.