It seems we zero/sign extend < 64-bit integral parameters into 64-bit registers both on the caller and callee side on x86_64, one of those should be redundant. In http://blog.regehr.org/archives/320 Example 4 it seems that LLVM probably only zero/sign extends in the caller, not callee, not sure what ICC does. __attribute__((noinline, noclone)) unsigned long f1 (unsigned int a, int b, unsigned short c, short d, unsigned char e, signed char f) { return (unsigned long) a + b + c + d + e + f; } unsigned long l; unsigned long f2 (void) { return f1 (l + 41, l + 41, l + 41, l + 41, l + 41, l + 41); } unsigned long f3 (unsigned int a, int b, unsigned short c, short d, unsigned char e, signed char f) { return foo (a, b, c, d, e, f); }
__attribute__((noinline, noclone)) unsigned long f1 (unsigned int a, int b, unsigned short c, short d, unsigned char e, signed char f) { return (unsigned long) a + b + c + d + e + f; } unsigned long l; unsigned long f2 (void) { return f1 (l + 41, l + 41, l + 41, l + 41, l + 41, l + 41) + 1; } unsigned long f3 (unsigned int a, int b, unsigned short c, short d, unsigned char e, signed char f) { return f1 (a, b, c, d, e, f); } unsigned long f4 (int a, unsigned int b, short c, unsigned short d, signed char e, unsigned char f) { return f1 (a, b, c, d, e, f); } at -O2 shows in f4 that we can't trust that the sign/zero extension is done on the caller side, at least we can't trust that it is sign/zero extended into 64-bits.
If the conclusion is that the callee can rely on the caller having done the extension then you need to watch out for security issues in the kernel syscall ABI when building with a compiler that generates code relying on this. http://lkml.org/lkml/2007/6/4/376 http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2009-0029 If there were target-specific aspects to the fix to that security issue, they may not have included x86_64 changes.
I think the ABI guarantees that the caller did the sign/zero extension, so we don't have to repeat it in the callee. Of course we don't model this at the tree level, so probably RTL has to figure this out and optimize. (Yeah, that's one of the areas where lowering call ABI related promitions earlier would be nice).
I proposed to update x86-64 psABI with --- When a value of type _Bool is returned in a register, bit 0 contains the truth value and bits 1 to 7 shall be zero. When an argument of type _Bool is passed in a register or on the stack, bit 0 contains the truth value and bits 1 to 31 shall be zero. When a value of type signed/unsigned char or short is returned in a register, bits 0 to 7 for char and bits 0 to 15 for short contain the value and other bits are left unspecified. When an argument of signed/unsigned type char or short is passed in a register or on the stack, it shall be sign/zero extended to signed/unsigned int. ---
And upper 32 bits are undefined if the argument is 8/16/32 bit (i.e. callee must sign/zero extend, instead of caller)?
(In reply to comment #5) > And upper 32 bits are undefined if the argument is 8/16/32 bit (i.e. callee > must sign/zero extend, instead of caller)? If callee wants 64bit, it has to sign/zero extend it to 64bit.
For void f1(char c, char d, char e, char f, char g, char h, char i); char x; void f2() { f1(x, x, x, x, x, x, x); } ICC generates this assembly, where we only store 8 bits to the stack for the final parameter. f2: pushq %rsi addq $-16, %rsp movsbl x(%rip), %edi movb %dil, (%rsp) movl %edi, %esi movl %edi, %edx movl %edi, %ecx movl %edi, %r8d movl %edi, %r9d call f1 addq $16, %rsp popq %rcx ret
GCC 4.6.0 is being released, adjusting target milestone.
What is the status of this issue? Did the psABI ever get updated? Is the intent of this issue to modify GCC to remove the sign extension from the callee?
Well, the ABI doc doesn't appear to say anything about this still. GCC as of 5.1 seems to still do sign/zero extension of 8/16-bit arguments to 32-bit on the callee side. Clang does not. Both do extension of 32-bit arguments to 64-bit on the callee side. Both sign/zero-extend 8/16-bit values to 32-bits, and do /not/ truncate 64-bit values to 32-bit on the caller side. So it looks like GCC could still generate more optimal code by taking advantage of the "de-facto" ABI that lets you assume 32-bit sign/zero-extension has happened on arguments. But it'd also be real nice for this all to be actually documented, so there's something to point people to. :) BTW: This undocumentedness came up recently with an optimizer change in clang: libjpeg-turbo has some assembly code which was using the full 64-bit value of an argument register, assuming the upper bits would be zeroed, while on the C side, the function was declared as taking an "int". The upper bits are thus left undefined (as is correct, per the unwritten ABI rules), which broke the asm. https://github.com/libjpeg-turbo/libjpeg-turbo/pull/20 http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20150907/138253.html
The ABI document was updated: Clarify the unspecified nature of excess bits in INTEGER type arguments <https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/e1ce098331da5dbd66e1ffc74162380bcc213236> The ABI did not previously say that the caller needs to sign-extended or zero-extend, which means that the callee has to extend. As far as I can see, both GCC and Clang generate code in some cases that assumes the callee extends.
(In reply to Florian Weimer from comment #11) > The ABI document was updated: > > Clarify the unspecified nature of excess bits in INTEGER type arguments > <https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/ > e1ce098331da5dbd66e1ffc74162380bcc213236> > > The ABI did not previously say that the caller needs to sign-extended or > zero-extend, which means that the callee has to extend. As far as I can see, > both GCC and Clang generate code in some cases that assumes the callee > extends. I'm not aware of any cases where GCC does assume that on x86_64. As mentioned in #c0, often we extend both on the caller and callee side, on the caller side it is just our problem doing something that doesn't have to be done (but guess often is faster that way).
(In reply to Jakub Jelinek from comment #12) > (In reply to Florian Weimer from comment #11) > > The ABI document was updated: > > > > Clarify the unspecified nature of excess bits in INTEGER type arguments > > <https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/ > > e1ce098331da5dbd66e1ffc74162380bcc213236> > > > > The ABI did not previously say that the caller needs to sign-extended or > > zero-extend, which means that the callee has to extend. As far as I can see, > > both GCC and Clang generate code in some cases that assumes the callee > > extends. > > I'm not aware of any cases where GCC does assume that on x86_64. > As mentioned in #c0, often we extend both on the caller and callee side, on > the caller side it is just our problem doing something that doesn't have to > be done (but guess often is faster that way). Not sure if I understand. What about this? void f (int); void g (long int x) { return f (x); } It gives me (with -fno-asynchronous-unwind-tables -O2): .file "u.c" .text .p2align 4 .globl g .type g, @function g: jmp f .size g, .-g .ident "GCC: (GNU) 14.2.1 20240912 (Red Hat 14.2.1-3)" .section .note.GNU-stack,"",@progbits I think this means that the implementation of f must extend (if needed) because g did not sign-extend or zero-extend.
(In reply to Florian Weimer from comment #13) > (In reply to Jakub Jelinek from comment #12) > > (In reply to Florian Weimer from comment #11) > > > The ABI document was updated: > > > > > > Clarify the unspecified nature of excess bits in INTEGER type arguments > > > <https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/ > > > e1ce098331da5dbd66e1ffc74162380bcc213236> > > > > > > The ABI did not previously say that the caller needs to sign-extended or > > > zero-extend, which means that the callee has to extend. As far as I can see, > > > both GCC and Clang generate code in some cases that assumes the callee > > > extends. > > > > I'm not aware of any cases where GCC does assume that on x86_64. > > As mentioned in #c0, often we extend both on the caller and callee side, on > > the caller side it is just our problem doing something that doesn't have to > > be done (but guess often is faster that way). > > Not sure if I understand. What about this? > > void f (int); > void g (long int x) > { > return f (x); > } > > It gives me (with -fno-asynchronous-unwind-tables -O2): > > .file "u.c" > .text > .p2align 4 > .globl g > .type g, @function > g: > jmp f > .size g, .-g > .ident "GCC: (GNU) 14.2.1 20240912 (Red Hat 14.2.1-3)" > .section .note.GNU-stack,"",@progbits > > I think this means that the implementation of f must extend (if needed) > because g did not sign-extend or zero-extend. If I read your psABI change right, this case is just fine, the upper 32 bits of %rdi upon entry to f are unspecified, so no extension is needed. Where you need to extend is void h (long int x); void i (int x) { return h (x); } and gcc does that.
(In reply to Jakub Jelinek from comment #14) > > I think this means that the implementation of f must extend (if needed) > > because g did not sign-extend or zero-extend. > > If I read your psABI change right, this case is just fine, the upper 32 bits > of %rdi upon entry to f are unspecified, so no extension is needed. Agreed. I think I just misunderstood the GCC “GCC does [not] assume that” part in comment 12. The psABI wording change should not require any GCC code generation changes.