Summary: | For x86 PIC code, ebx should be spillable | ||
---|---|---|---|
Product: | gcc | Reporter: | Rich Felker <bugdal> |
Component: | target | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | RESOLVED FIXED | ||
Severity: | enhancement | CC: | evstupac, gcc, pageexec |
Priority: | P3 | ||
Version: | 4.8.0 | ||
Target Milestone: | 5.0 | ||
Host: | Target: | x86_64-*-*, i?86-*-* | |
Build: | Known to work: | ||
Known to fail: | Last reconfirmed: | 2012-08-13 00:00:00 |
Description
Rich Felker
2012-08-12 04:51:01 UTC
By the way, the code that inspired this report is crypt_blowfish.c and the corresponding asm by Solar Designer. We've been experimenting with performance characteristics while integrating it into musl libc, and I found that the C code is just as fast as the hand-optimized asm on the machine I was testing it on when using static libraries without -fPIC, but takes over 30% more runtime when built with -fPIC due to running out of registers. I think the GOT is introduced too late to do any fancy ananlysis on whether we need it or not. I also think that for outgoing function calls the ABI relies on a properly setup GOT, even for those that bind locally and thus do not go through the PLT. > I think the GOT is introduced too late to do any fancy ananlysis > on whether we need it or not. This may be true, but if so, it's a highly suboptimal design that's hurting performance badly. 30% on the cryptographic code I looked at, and from working on FFmpeg in the past, I remember quite a few cases where PIC was hurting performance by significant measurable amounts like that too. If there's any way the changes I describe could be targeted even just in the long term, I think it would make a big difference for a lot of software. > I also think that for outgoing function calls the ABI > relies on a properly setup GOT, even for those that bind > locally and thus do not go through the PLT. The extern function call ABI on x86 does not allow the caller to depend on EBX containing the GOT address. This is because the callee has no way of knowing whether it was called by the same DSO it resides in. If not, the GOT address will be invalid for it. For static functions whose addresses never leak out of the translation unit they're defined in, the calling convention is up to GCC. Ideally it would assume the GOT register is already loaded in such functions (as long as all the callees use the GOT), but in reality it rarely does. This is a separate code generation QoI implementation that should perhaps be addressed as its own bug. Fixed in 5.0 . |