Bug 46942 - x86_64 parameter passing unnecessary sign/zero extends
Summary: x86_64 parameter passing unnecessary sign/zero extends
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.6.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on: 42324
Blocks:
  Show dependency treegraph
 
Reported: 2010-12-14 16:49 UTC by Jakub Jelinek
Modified: 2023-09-28 00:44 UTC (History)
8 users (show)

See Also:
Host:
Target: x86_64-linux
Build:
Known to work:
Known to fail:
Last reconfirmed: 2010-12-16 15:12:22


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jelinek 2010-12-14 16:49:53 UTC
It seems we zero/sign extend < 64-bit integral parameters into 64-bit registers both on the caller and callee side on x86_64, one of those should be redundant.
In http://blog.regehr.org/archives/320 Example 4 it seems that LLVM probably only zero/sign extends in the caller, not callee, not sure what ICC does.

__attribute__((noinline, noclone))
unsigned long f1 (unsigned int a, int b, unsigned short c, short d, unsigned char e, signed char f)
{
  return (unsigned long) a + b + c + d + e + f;
}

unsigned long l;

unsigned long f2 (void)
{
  return f1 (l + 41, l + 41, l + 41, l + 41, l + 41, l + 41);
}

unsigned long f3 (unsigned int a, int b, unsigned short c, short d, unsigned char e, signed char f)
{
  return foo (a, b, c, d, e, f);
}
Comment 1 Jakub Jelinek 2010-12-14 17:13:19 UTC
__attribute__((noinline, noclone))
unsigned long f1 (unsigned int a, int b, unsigned short c, short d, unsigned char e, signed char f)
{
  return (unsigned long) a + b + c + d + e + f;
}

unsigned long l;

unsigned long f2 (void)
{
  return f1 (l + 41, l + 41, l + 41, l + 41, l + 41, l + 41) + 1;
}

unsigned long f3 (unsigned int a, int b, unsigned short c, short d, unsigned char e, signed char f)
{
  return f1 (a, b, c, d, e, f);
}

unsigned long f4 (int a, unsigned int b, short c, unsigned short d, signed char e, unsigned char f)
{
  return f1 (a, b, c, d, e, f);
}

at -O2 shows in f4 that we can't trust that the sign/zero extension is done on the caller side, at least we can't trust that it is sign/zero extended into 64-bits.
Comment 2 jsm-csl@polyomino.org.uk 2010-12-14 17:26:11 UTC
If the conclusion is that the callee can rely on the caller having done 
the extension then you need to watch out for security issues in the kernel 
syscall ABI when building with a compiler that generates code relying on 
this.

http://lkml.org/lkml/2007/6/4/376
http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2009-0029

If there were target-specific aspects to the fix to that security issue, 
they may not have included x86_64 changes.
Comment 3 Richard Biener 2010-12-16 15:12:22 UTC
I think the ABI guarantees that the caller did the sign/zero extension, so
we don't have to repeat it in the callee.  Of course we don't model this
at the tree level, so probably RTL has to figure this out and optimize.

(Yeah, that's one of the areas where lowering call ABI related promitions
earlier would be nice).
Comment 4 H.J. Lu 2011-01-02 17:58:18 UTC
I proposed to update x86-64 psABI with

---
When a value of type _Bool is returned in a register, bit 0 contains the truth
value and bits 1 to 7 shall be zero. When an argument of type _Bool is passed
in a register or on the stack, bit 0 contains the truth value and bits
1 to 31 shall be
zero.

When a value of type signed/unsigned char or short is returned in a register,
bits 0 to 7 for char and bits 0 to 15 for short contain the value and other
bits are left unspecified. When an argument of signed/unsigned type char or
short is passed in a register or on the stack, it shall be sign/zero extended to
signed/unsigned int.
---
Comment 5 Jakub Jelinek 2011-01-02 19:01:58 UTC
And upper 32 bits are undefined if the argument is 8/16/32 bit (i.e. callee must sign/zero extend, instead of caller)?
Comment 6 H.J. Lu 2011-01-02 20:53:12 UTC
(In reply to comment #5)
> And upper 32 bits are undefined if the argument is 8/16/32 bit (i.e. callee
> must sign/zero extend, instead of caller)?

If callee wants 64bit, it has to sign/zero extend it to 64bit.
Comment 7 H.J. Lu 2011-01-03 14:23:05 UTC
For

void f1(char c, char d, char e, char f, char g, char h, char i);

    char x;

    void f2()
    {
        f1(x, x, x, x, x, x, x);
    }

ICC generates this assembly, where we only store 8 bits to the stack
for the final parameter.

f2:
        pushq     %rsi
        addq      $-16, %rsp
        movsbl    x(%rip), %edi
        movb      %dil, (%rsp)
        movl      %edi, %esi
        movl      %edi, %edx
        movl      %edi, %ecx
        movl      %edi, %r8d
        movl      %edi, %r9d
        call      f1
        addq      $16, %rsp
        popq      %rcx
        ret
Comment 8 Jakub Jelinek 2011-03-25 19:52:58 UTC
GCC 4.6.0 is being released, adjusting target milestone.
Comment 9 Meador Inge 2012-03-08 03:41:27 UTC
What is the status of this issue?  Did the psABI ever get updated?  Is the intent of this issue to modify GCC to remove the sign extension from the callee?
Comment 10 James Y Knight 2015-09-16 17:12:43 UTC
Well, the ABI doc doesn't appear to say anything about this still.

GCC as of 5.1 seems to still do sign/zero extension of 8/16-bit arguments to 32-bit on the callee side. Clang does not. Both do extension of 32-bit arguments to 64-bit on the callee side.

Both sign/zero-extend 8/16-bit values to 32-bits, and do /not/ truncate 64-bit values to 32-bit on the caller side.

So it looks like GCC could still generate more optimal code by taking advantage of the "de-facto" ABI that lets you assume 32-bit sign/zero-extension has happened on arguments.

But it'd also be real nice for this all to be actually documented, so there's something to point people to. :)

BTW: This undocumentedness came up recently with an optimizer change in clang: libjpeg-turbo has some assembly code which was using the full 64-bit value of an argument register, assuming the upper bits would be zeroed, while on the C side, the function was declared as taking an "int". The upper bits are thus left undefined (as is correct, per the unwritten ABI rules), which broke the asm.

https://github.com/libjpeg-turbo/libjpeg-turbo/pull/20
http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20150907/138253.html