This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/82260] [x86] Unnecessary use of 8-bit registers with -Os. slightly slower and larger code
- From: "peter at cordes dot ca" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 20 Sep 2017 15:36:08 +0000
- Subject: [Bug target/82260] [x86] Unnecessary use of 8-bit registers with -Os. slightly slower and larger code
- Auto-submitted: auto-generated
- References: <bug-82260-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82260
--- Comment #4 from Peter Cordes <peter at cordes dot ca> ---
(In reply to Jakub Jelinek from comment #2)
> From pure instruction size POV, for the first 2 alternatives as can be seen
> say on:
> ...
> movb $0x15, %al
> movl $0x15, %eax
> movb $-0x78, %bl
> movl $-0x78, %ebx
There are ways to save code-size when setting up constants. If you already
have one constant in a register, you can get other nearby constants in 3 bytes
with LEA
xor %edi, %edi # you often need a zero for something
lea -0x78(%rdi), %ebx # 3 bytes vs. 5 for mov $imm32, %r32
Or a 4-byte LEA with a 64-bit destination to replace a 7-byte mov $imm32, %r64.
Modern CPUs have pretty good LEA throughput (2 per clock on Intel SnB-family +
KNL and AMD K8/K10/BD-family/Zen), especially for 2-component LEA (base + disp,
no index). 1 per clock on others, still 1c latency. With efficient
xor-zeroing support, the LEA can execute without any extra delay even if it
issues in the same cycle as the xor-zeroing. If using LEA relative to some
other constant, well it's still just 1c extra.
If gcc had a -Oz mode like clang does (optimize for size even more), you could
consider stuff like 3-byte push+pop (clobbering the top of the red zone).
push $-0x78 # imm8 sign-extended to 64-bit
pop %rbx
https://stackoverflow.com/questions/45105164/set-all-bits-in-cpu-register-to-1-efficiently
https://stackoverflow.com/questions/33825546/shortest-intel-x86-64-opcode-for-rax=1