Bug 53110 - GCC-4.7 generates stupid x86_64 asm
Summary: GCC-4.7 generates stupid x86_64 asm
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.7.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-04-25 10:12 UTC by peterz
Modified: 2017-12-11 17:41 UTC (History)
7 users (show)

See Also:
Host:
Target: x86_64
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
small C file reproducing the issue (365 bytes, text/x-csrc)
2012-04-25 10:12 UTC, peterz
Details
gcc48-pr53110.patch (530 bytes, patch)
2012-04-25 13:02 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description peterz 2012-04-25 10:12:48 UTC
Created attachment 27233 [details]
small C file reproducing the issue

The below output of: gcc -O2 -S gcc-bug.c, shows gcc generating
pointless mask instructions, an and with all bits set.

	.file	"gcc-bug.c"
	.text
	.p2align 4,,15
	.globl	mult_u128
	.type	mult_u128, @function
mult_u128:
.LFB1:
	.cfi_startproc
	movq	%rdi, %r8
	movq	%rsi, %rcx
	andl	$4294967295, %edi
	shrq	$32, %r8
	shrq	$32, %rcx
	andl	$4294967295, %esi
	movq	%rcx, %rax
	movq	%rsi, %rdx
	imulq	%rdi, %rcx
	imulq	%r8, %rsi
	imulq	%r8, %rax
	salq	$32, %rcx
	salq	$32, %rsi
	imulq	%rdi, %rdx
	addq	%rsi, %rcx
	adcq	$0, %rax
	addq	%rcx, %rdx
	adcq	$0, %rax
	ret
	.cfi_endproc
.LFE1:
	.size	mult_u128, .-mult_u128
	.ident	"GCC: (GNU) 4.7.1 20120425 (prerelease)"
	.section	.note.GNU-stack,"",@progbits
Comment 1 Richard Biener 2012-04-25 11:03:55 UTC
    movq    %rdi, %r8
    movq    %rsi, %rcx
    andl    $4294967295, %edi

I'm not sure it's pointless - %rdi contains a 64bit value and we want
to access the zero-extended %rdi later:

    imulq    %rdi, %rcx

it could have used a move to another register which implicitely zero-extends
or maybe a zero-extension on the same register (if that exists).  The
and is only 3 bytes in size.  A mov %edi, %edi is 2 bytes but I'm not sure
that implicitely zero-extends.
Comment 2 peterz 2012-04-25 11:11:19 UTC
I'll have to let Linus and Peter Anvin argue this (they're on CC), this report is
the direct result of their complaints:

"If you can *ever* get gcc to generate those andl instructions on x86,
then please file a gcc bug report: the constant is 0xffffffff and those
are plain zero-extension instructions which is much better done with mov."

 https://lkml.org/lkml/2012/4/24/524

"What insane version of gcc is that? Can you please make a gcc bug-report?"

 https://lkml.org/lkml/2012/4/24/549
 
I'm just the sorry sod who wrote the code... :-)
Comment 3 Markus Trippelsdorf 2012-04-25 11:23:59 UTC
For comparison this is what clang generates:

        .file   "gcc-bug.c"
        .text
        .globl  mult_u128
        .align  16, 0x90
        .type   mult_u128,@function
mult_u128:                              # @mult_u128
        .cfi_startproc
# BB#0:
        movabsq $-4294967296, %rdx      # imm = 0xFFFFFFFF00000000
        andq    %rdi, %rdx
        movl    %esi, %ecx
        movq    %rdi, %rax
        shrq    $32, %rax
        shrq    $32, %rsi
        imulq   %rsi, %rax
        imulq   %rcx, %rdx
        imull   %edi, %esi
        shlq    $32, %rsi
        addq    %rdx, %rsi
        adcq    $0, %rax
        movl    %edi, %edx
        imulq   %rcx, %rdx
        addq    %rsi, %rdx
        adcq    $0, %rax
        ret
.Ltmp0:
        .size   mult_u128, .Ltmp0-mult_u128
        .cfi_endproc


        .section        ".note.GNU-stack","",@progbits
Comment 4 Richard Biener 2012-04-25 12:35:08 UTC
I'm not sure what the function computes, but for

u128 mult_u128(u64 a, u64 b)
{
  u128 t1;
  __uint128_t r = (__uint128_t)a * (__uint128_t)b;
  memcpy (&t1, &r, sizeof (u128));
  return t1;
}

we generate

mult_u128:
.LFB1:
        .cfi_startproc
        movq    %rsi, %rax
        mulq    %rdi
        movq    %rax, -56(%rsp)
        movq    %rdx, -48(%rsp)
        movq    -48(%rsp), %rdx
        ret
Comment 5 peterz 2012-04-25 12:47:31 UTC
Yes that's what it computes.. and no the function won't ever get used on x86_64, but I ran it through the compiler anyway :-)

Thing is we need u128 to work on all archs linux supports on all gcc version we support, so we can't use __int128. Anyway, if you're interested in what we're doing and want to see the current patches click:

  https://lkml.org/lkml/2012/4/25/149
Comment 6 peterz 2012-04-25 13:00:47 UTC
OK rectification, that's what it _should_ compute, I just noticed
add_u128() is missing: a.hi += b.hi; Anyway, that's all besides the point, the issue is that gcc shouldn't generate those andl 0xFFFFFFFF, %r thingies, regardless if the C makes sense or not.
Comment 7 Jakub Jelinek 2012-04-25 13:02:26 UTC
Created attachment 27235 [details]
gcc48-pr53110.patch

Totally untested patch.  We already have a splitter to handle and by 0xffffffff,
0xffff and 0xff, but it only triggers if the register is different and IMHO it is quite late anyway.  This attempts to optimize this already during expansion, perhaps giving the register allocator more freedom (unfortunately it doesn't seem to improve the code in this case and the movl %edi, %edi (and for %esi too) stays.  Ideally if RA swapped the two, it could movl %edi, %r8d and shrl $32, %rdi instead of movq %rdi, %r8; shrl $32, %r8; movl %edi, %edi).
Comment 8 peterz 2012-04-25 13:15:56 UTC
Jakub's patch seems to improve the situation:

--- gcc-bug-4.7.s       2012-04-25 14:58:21.494815266 +0200
+++ gcc-bug-4.7+.s      2012-04-25 15:14:13.784243427 +0200
@@ -22,12 +22,12 @@
        .cfi_startproc
        movq    %rdi, %r8
        movq    %rsi, %r9
-       andl    $4294967295, %esi
+       movl    %esi, %esi
        shrq    $32, %r9
        shrq    $32, %r8
        movq    %rsi, %rdx
        imulq   %r8, %rdx
-       andl    $4294967295, %edi
+       movl    %edi, %edi
        movq    %r9, %rax
        imulq   %rdi, %rax
        imulq   %r9, %r8
Comment 9 Jakub Jelinek 2012-04-25 19:40:39 UTC
Author: jakub
Date: Wed Apr 25 19:40:31 2012
New Revision: 186839

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=186839
Log:
	PR target/53110
	* config/i386/i386.md (and<mode>3): For andq $0xffffffff, reg
	instead expand it as zero extension.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.md
Comment 10 H. Peter Anvin 2012-04-25 20:32:29 UTC
There still seems to be a redundant copy in there, but that's pretty common in gcc-generated code; the movl %esi, %esi could get completely elided and the zero extend folded into "movq %rsi, %rdx" by changing it to movl %esi, %edx.

This is a second-order issue though...
Comment 11 Jakub Jelinek 2012-04-26 05:36:00 UTC
CCing Vlad for the RA decision.
Comment 12 uros 2012-07-15 08:13:57 UTC
Author: uros
Date: Sun Jul 15 08:13:47 2012
New Revision: 189491

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=189491
Log:
	PR target/53961

	Backport from mainline
	2012-04-25  Jakub Jelinek  <jakub@redhat.com>

	PR target/53110
	* config/i386/i386.md (and<mode>3): For andq $0xffffffff, reg
	instead expand it as zero extension.


Modified:
    branches/gcc-4_7-branch/gcc/ChangeLog
    branches/gcc-4_7-branch/gcc/config/i386/i386.md
Comment 13 tejohnson 2012-07-25 20:11:23 UTC
Author: tejohnson
Date: Wed Jul 25 20:11:13 2012
New Revision: 189866

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=189866
Log:
Backport the following patches from trunk to ensure that
"andw $0xff, $reg" is always converted to a zero extend
to avoid LCP stalls on core2/corei7 (b/6615073):
r184891, r186839, r186979, r186993, r188630, r188634, r188648

    r184891:
        2012-03-04  Uros Bizjak  <ubizjak@gmail.com>
        * config/i386/constraints.md (Ya): New internal constraint.
        * config/i386/i386.md (zero_extendsidi2): Remove expansion.
        (*zero_extendsidi2_rex64): Add x,x alternative.
        (*zero_extendsidi2): Ditto.  Add o,0 alternative.
        Remove flags reg clobber.  Adjust corresponding splits.
        (zero_extend<mode>si2): Macroize expander from zero_extendhisi2 and
        zero_extendqisi2 expanders using SWI12 mode iterator.
        (zero_extend<mode>si2_and): Macroize insn from
        zero_extendhisi2_and and zero_extendqisi2_and.  Merge corresponding
        splitters.
        (*zero_extend<mode>si2):  Macroize insn from
        *zero_extendhisi2_movzbl and *zero_extendqisi2_movzbl.
        (*zero_extend*2_movzbl_and): Remove insn patterns.
        (zero_extendqihi2_and): Merge corresponding splitter.
        (*zero_extendqihi2): Rename from *zero_extendqihi2_movzbl.
        (*zero_extend*2_movzbl_and): Remove insn patterns.
        (*anddi_1): Split TYPE_IMOVX instructions.
        (*andsi_1): Use Ya for alternative 2.  Split TYPE_IMOVX instructions.
        (*andhi_1): Ditto.
        (and->zext splitter): Add splitter pattern.
        (zero extend with andsi3 splitter): Adjust zero_extend pattern.

    r186839:
        2012-04-25  Jakub Jelinek  <jakub@redhat.com>
        PR target/53110
        * config/i386/i386.md (and<mode>3): For andq $0xffffffff, reg
        instead expand it as zero extension.

    r186979:
        2012-04-30  Uros Bizjak  <ubizjak@gmail.com>
        * config/i386/i386.md (and<mode>3): Expand masking operations with
        0xff, 0xffff or 0xffffffff immediates to corresponding zero_extend RTX.
        (and splitter): Split to DImode zero_extend RTX for DImode operand[0].

    r186993:
        2012-04-30  Uros Bizjak  <ubizjak@gmail.com>
        * config/i386/i386.md (and<mode>3): Change runtime operand mode checks
        to compile-time "mode == <MODE>mode" checks.
        (and splitter): Ditto.

    r188630:
        2012-06-14  Uros Bizjak  <ubizjak@gmail.com>
        * config/i386/i386.md (*zero_extendsidi2): Remove x,x alternative.
        (*zero_extendsidi2_rex64): Ditto.  Remove isa attribute.

    r188634:
        2012-06-14  Uros Bizjak  <ubizjak@gmail.com>
        Fix my previous commit to:
        * config/i386/i386.md (*zero_extendsidi2): Remove x,x alternative.
        (*zero_extendsidi2_rex64): Ditto.  Remove isa attribute.

    r188648:
        2012-06-14  Uros Bizjak  <ubizjak@gmail.com>
       (*zero_extendsidi2_rex64): Remove isa attribute.


Modified:
    branches/google/gcc-4_7/gcc/ChangeLog.google-4_7
    branches/google/gcc-4_7/gcc/config/i386/constraints.md
    branches/google/gcc-4_7/gcc/config/i386/i386.md
Comment 14 Jeffrey A. Law 2017-12-11 17:41:23 UTC
The trunk no longer generates masking with andl for this testcase.  I didn't try to track it down any further than that.