[PATCH] combine: Allow combining two insns to two insns

H.J. Lu hjl.tools@gmail.com
Tue Jul 31 12:39:00 GMT 2018


On Wed, Jul 25, 2018 at 1:28 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, Jul 24, 2018 at 7:18 PM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
>>
>> This patch allows combine to combine two insns into two.  This helps
>> in many cases, by reducing instruction path length, and also allowing
>> further combinations to happen.  PR85160 is a typical example of code
>> that it can improve.
>>
>> This patch does not allow such combinations if either of the original
>> instructions was a simple move instruction.  In those cases combining
>> the two instructions increases register pressure without improving the
>> code.  With this move test register pressure does no longer increase
>> noticably as far as I can tell.
>>
>> (At first I also didn't allow either of the resulting insns to be a
>> move instruction.  But that is actually a very good thing to have, as
>> should have been obvious).
>>
>> Tested for many months; tested on about 30 targets.
>>
>> I'll commit this later this week if there are no objections.
>
> Sounds good - but, _any_ testcase?  Please! ;)
>

Here is a testcase:

For

---
#define N 16
float f[N];
double d[N];
int n[N];

__attribute__((noinline)) void
f3 (void)
{
  int i;
  for (i = 0; i < N; i++)
    d[i] = f[i];
}
---

r263067 improved -O3 -mavx2 -mtune=generic -m64 from

.cfi_startproc
vmovaps f(%rip), %xmm2
vmovaps f+32(%rip), %xmm3
vinsertf128 $0x1, f+16(%rip), %ymm2, %ymm0
vcvtps2pd %xmm0, %ymm1
vextractf128 $0x1, %ymm0, %xmm0
vmovaps %xmm1, d(%rip)
vextractf128 $0x1, %ymm1, d+16(%rip)
vcvtps2pd %xmm0, %ymm0
vmovaps %xmm0, d+32(%rip)
vextractf128 $0x1, %ymm0, d+48(%rip)
vinsertf128 $0x1, f+48(%rip), %ymm3, %ymm0
vcvtps2pd %xmm0, %ymm1
vextractf128 $0x1, %ymm0, %xmm0
vmovaps %xmm1, d+64(%rip)
vextractf128 $0x1, %ymm1, d+80(%rip)
vcvtps2pd %xmm0, %ymm0
vmovaps %xmm0, d+96(%rip)
vextractf128 $0x1, %ymm0, d+112(%rip)
vzeroupper
ret
.cfi_endproc

to

.cfi_startproc
vcvtps2pd f(%rip), %ymm0
vmovaps %xmm0, d(%rip)
vextractf128 $0x1, %ymm0, d+16(%rip)
vcvtps2pd f+16(%rip), %ymm0
vmovaps %xmm0, d+32(%rip)
vextractf128 $0x1, %ymm0, d+48(%rip)
vcvtps2pd f+32(%rip), %ymm0
vextractf128 $0x1, %ymm0, d+80(%rip)
vmovaps %xmm0, d+64(%rip)
vcvtps2pd f+48(%rip), %ymm0
vextractf128 $0x1, %ymm0, d+112(%rip)
vmovaps %xmm0, d+96(%rip)
vzeroupper
ret
.cfi_endproc

This is:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86752

H.J.



More information about the Gcc-patches mailing list