This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/80479] [7/8 Regression] strcmp() produces valgrind errors on ppc64le


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80479

--- Comment #15 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to jreiser from comment #14)
> Here's how to retain the increased speed (and save around 300 bytes per
> call) while enabling valgrind happiness.

It won't be as fast.  How much slower, only profiling can tell.  One
important point is that the inlined version has better branch prediction.

> Put this subroutine in archive library libgcc_s.a only, and not in shared
> library libgcc_s.so.  Then the linkage for 'bl' is direct, avoiding PLT
> (ProgramLinkageTable), and the time for 'bl' is hidden by cache latency for
> 'ldbrx'.  The return 'blr' often is free, but may cost 1 cycle if it
> immediately follows a conditional branch that tests for termination.

blr never is free (and neither is bl).  Its execution cost may be hidden
in happy cases, sure, but it ends the dispatch group (and causes fetch
to redirect as well).  This matters for small functions.

> Valgrind(memcheck) can be happy because it can intercept and re-direct the
> entire routine by name, thus avoiding having to analyze 'cmpb'.

We'll just need to find some good way to make valgrind behave; some way
that does not slow down the code.

> Notes: ldbrx and lwbrx are functional for non-aligned addresses.

Unaligned l*brx is quite slow, on many CPUs (it can take an alignment
interrupt for example, and a trip to the kernel is not free at all; even
when it doesn't cause an interrupt, unaligned accesses are slower).

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]