This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: X86_64 insns combination is not working well
- From: Jakub Jelinek <jakub at redhat dot com>
- To: lin zuojian <manjian2006 at gmail dot com>
- Cc: gcc at gcc dot gnu dot org
- Date: Mon, 3 Mar 2014 09:40:15 +0100
- Subject: Re: X86_64 insns combination is not working well
- Authentication-results: sourceware.org; auth=none
- References: <20140303030214 dot GB9144 at ubuntu>
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
On Mon, Mar 03, 2014 at 11:02:14AM +0800, lin zuojian wrote:
> I wrote a test code like this:
> void foo(int * a)
> {
> a[0] = 0xfafafafb;
> a[1] = 0xfafafafc;
> a[2] = 0xfafafafe;
> a[3] = 0xfafafaff;
> a[4] = 0xfafafaf0;
> a[5] = 0xfafafaf1;
> a[6] = 0xfafafaf2;
> a[7] = 0xfafafaf3;
> a[8] = 0xfafafaf4;
> a[9] = 0xfafafaf5;
> a[10] = 0xfafafaf6;
> a[11] = 0xfafafaf7;
> a[12] = 0xfafafaf8;
> a[13] = 0xfafafaf9;
> a[14] = 0xfafafafa;
> a[15] = 0xfafaf0fa;
> }
> that was what gcc generated:
> movl $-84215045, (%rdi)
> movl $-84215044, 4(%rdi)
> movl $-84215042, 8(%rdi)
> movl $-84215041, 12(%rdi)
> movl $-84215056, 16(%rdi)
> ...
> that was what LLVM/clang generated:
> movabsq $-361700855600448773, %rax # imm = 0xFAFAFAFCFAFAFAFB
> movq %rax, (%rdi)
> movabsq $-361700842715546882, %rax # imm = 0xFAFAFAFFFAFAFAFE
> movq %rax, 8(%rdi)
> movabsq $-361700902845089040, %rax # imm = 0xFAFAFAF1FAFAFAF0
> movq %rax, 16(%rdi)
> movabsq $-361700894255154446, %rax # imm = 0xFAFAFAF3FAFAFAF2
> ...
> I ran the code on my i7 machine for 10000000000 times.Here was the result:
> gcc:
> real 0m50.613s
> user 0m50.559s
> sys 0m0.000s
>
> LLVM/clang:
> real 0m32.036s
> user 0m32.001s
> sys 0m0.000s
>
> That mean movabsq did do a better job!
> Should gcc peephole pass add such a combine?
This sounds like PR22141, but a microbenchmark isn't the right thing
to decide this. From what I remember when playing with the patches,
movabsq has been mostly bad for performance, at least on the CPUs I've tried
it back then. In addition to whether movabsq + movq compared to two movl
is more beneficial, also alignment plays role here, say if this is in an
inner loop and not aligned to 64-bits whether it won't slow things down too
much.
Jakub