This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Missed optimization opportunity wrt load chains

From: Jeff Law <law at redhat dot com>
To: Mason <slash dot tmp at free dot fr>, GCC help <gcc-help at gcc dot gnu dot org>
Date: Wed, 20 Sep 2017 11:33:20 -0600
Subject: Re: Missed optimization opportunity wrt load chains
Authentication-results: sourceware.org; auth=none
Authentication-results: ext-mx01.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
Authentication-results: ext-mx01.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=law at redhat dot com
Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 2E3AF81DF1
References: <c1d088e4-b0a4-03d7-c844-f0d05a1c533b@free.fr>

On 09/20/2017 09:54 AM, Mason wrote:
> Hello,
> 
> Consider the following test case.
> 
> typedef unsigned int u32;
> u32 foo(const u32 *u, const u32 *v)
> {
> 	u32 t0 = u[0] + u[3] + u[6] + u[9];
> 	u32 t1 = v[1] + v[3] + v[5] + v[7];
> 	return t0 + t1;
> }
> 
> AFAIU, for several years, x86 implementations have been able
> to issue two loads per cycle, and I expected gcc to compute
> t0 and t1 in parallel. But instead, it creates a single
> dependency chain.
> 
> $ gcc-7 -march=skylake -O3 -S testcase.c
> 
> foo:
> 	movl	12(%rsi), %eax
> 	addl	4(%rsi), %eax
> 	addl	20(%rsi), %eax
> 	addl	28(%rsi), %eax
> 	addl	(%rdi), %eax
> 	addl	12(%rdi), %eax
> 	addl	24(%rdi), %eax
> 	addl	36(%rdi), %eax
> 	ret
> 
> I don't think this code would benefit from SSE or auto-vectorization.
> But computing t0 and t1 in parallel might give a non-trivial speedup,
> especially for longer chains. What do you think?
It should.  However, the reassociation pass has comments that indicate
that these situations are fairly rare in practice.  As a result it just
punts these chains given the cost in complexity to get them right
(particularly when you include the interactions with CSE) it just punts.

jeff

Follow-Ups:
- Re: Missed optimization opportunity wrt load chains
  - From: Mikhail Maltsev

References:
- Missed optimization opportunity wrt load chains
  - From: Mason

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]