This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Vectorization regression on s390x GCC6 vs GCC5

From: Richard Biener <richard dot guenther at gmail dot com>
To: "Bin.Cheng" <amker dot cheng at gmail dot com>
Cc: Robin Dapp <rdapp at linux dot vnet dot ibm dot com>, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
Date: Thu, 26 Jan 2017 12:01:09 +0100
Subject: Re: Vectorization regression on s390x GCC6 vs GCC5
Authentication-results: sourceware.org; auth=none
References: <f4e2933b-5aa1-f900-eb5f-f8230e1cf171@linux.vnet.ibm.com> <CAHFci29L-CxxEHgQki6xcCviMeybY8+C23MSd1MxHCE9wh6eAQ@mail.gmail.com>

On Thu, Jan 26, 2017 at 11:36 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Thu, Jan 26, 2017 at 10:18 AM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
>> Hi,
>>
>> while analyzing a test case with a lot of nested loops (>7) and double
>> floating point operations I noticed a performance regression of GCC 6/7
>> vs GCC 5 on s390x. It seems due to GCC 6 vectorizing something GCC 5
>> couldn't.
>>  Basically, each loop iterates over three dimensions, we fully unroll
>> some of the inner loops until we have straight-line code of roughly 2000
>> insns that are being executed three times in GCC 5. GCC 6 vectorizes two
>> iterations and adds a scalar epilogue for the third iteration. The
>> epilogue code is so bad that it slows down the execution by at least
>> 50%, using only two hard registers and lots of spill slots.
>> Although my analysis is not completed, I believe this is because
>> register pressure is high in the epilogue and the live ranges span the
>> vectorized code as well as the epilogue.
>>
>> Even reduced, the test case is huge, therefore I didn't include it. Some
>> high-level questions instead:
>>
>> - Has anybody else observed similar problems and got around them?
> Yes, I think so.  Also we have case that GCC vectorizes with larger
> vect_factor, which causes regression too.
>
>>
>> - Is there some way around the register pressure/long live ranges?
> I am doing some experiments calculating coarse-grained register
> pressure for GIMPLE loop, but the motivation is not from vectorizer,
> but predcom/pre, like PR77498.
>
>> Perhaps something we could/should fix in the s390 backend? (Probably
>> hard to tell without source)
>>
>> - Would it make sense to allow a backend to specify the minimal number
>> of loop iterations considered for vectorization? Is this
>> perhaps already possible somehow? I added a check to disable
>> vectorization for loops with <= 3 iterations that shows no regressions
>> and improves two SPEC benchmarks noticeably. I'm even considering <=5,
>> since a vectorization factor of 4 should exhibit the same problematic
>> pattern.
> Is the niter number known at compilation time?  if yes, I am surprised
> GCC's behavior here on such small iteration loops.  Cost-model?

Yes, looking at the cost model decision makes sense here.  Note there is
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69873 you might run into
if the costmodel looks sensible.

Richard.

> Thanks,
> bin
>>
>> Regards
>>  Robin
>>

References:
- Vectorization regression on s390x GCC6 vs GCC5
  - From: Robin Dapp
- Re: Vectorization regression on s390x GCC6 vs GCC5
  - From: Bin.Cheng

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]