This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug target/82260] [x86] Unnecessary use of 8-bit registers with -Os. slightly slower and larger code

From: "peter at cordes dot ca" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Wed, 20 Sep 2017 15:36:08 +0000
Subject: [Bug target/82260] [x86] Unnecessary use of 8-bit registers with -Os. slightly slower and larger code
Auto-submitted: auto-generated
References: <bug-82260-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82260

--- Comment #4 from Peter Cordes <peter at cordes dot ca> ---
(In reply to Jakub Jelinek from comment #2)
> From pure instruction size POV, for the first 2 alternatives as can be seen
> say on:
> ...
> movb $0x15, %al
> movl $0x15, %eax
> movb $-0x78, %bl
> movl $-0x78, %ebx

There are ways to save code-size when setting up constants.  If you already
have one constant in a register, you can get other nearby constants in 3 bytes
with LEA

  xor  %edi, %edi         # you often need a zero for something
  lea -0x78(%rdi), %ebx   # 3 bytes vs. 5 for mov $imm32, %r32

Or a 4-byte LEA with a 64-bit destination to replace a 7-byte mov $imm32, %r64.
 Modern CPUs have pretty good LEA throughput (2 per clock on Intel SnB-family +
KNL and AMD K8/K10/BD-family/Zen), especially for 2-component LEA (base + disp,
no index).  1 per clock on others, still 1c latency.  With efficient
xor-zeroing support, the LEA can execute without any extra delay even if it
issues in the same cycle as the xor-zeroing.  If using LEA relative to some
other constant, well it's still just 1c extra.

If gcc had a -Oz mode like clang does (optimize for size even more), you could
consider stuff like 3-byte push+pop (clobbering the top of the red zone).

  push $-0x78       # imm8 sign-extended to 64-bit 
  pop  %rbx

https://stackoverflow.com/questions/45105164/set-all-bits-in-cpu-register-to-1-efficiently
https://stackoverflow.com/questions/33825546/shortest-intel-x86-64-opcode-for-rax=1

References:
- [Bug target/82260] New: [x86] Unnecessary use of 8-bit registers with -Os. slightly slower and larger code
  - From: peter at cordes dot ca

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]