[Bug tree-optimization/83651] [7/8 regression] 20% slowdown of linux kernel AES cipher

Wed Jan 17 19:36:00 GMT 2018

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83651

--- Comment #4 from Arnd Bergmann <arnd at linaro dot org> ---
(In reply to Aldy Hernandez from comment #3)
> (In reply to Arnd Bergmann from comment #0)
> 
> > If there is enough interest in addressing the slowdown, it should be
> > possible to create a version of the kernel AES implementation that can be
> > run in user space, as the current method of reproducing the results is
> > fairly tedious.
> 
> I would say that a 20% slowdown is significant enough that we should
> definitely look into this.  A user space version would help immensely here.

The 20% number I got was from 7.1.1 to 7.2.1, but I can't reproduce the
7.1.1 performance any more, so it's possible that this was supposed to be
15.3 cycles instead of 13.5 cycles, but we'd still have a 13% regression
using the kernel implementation, and a 9% regression with libressl, which is
probably still significant.

> > The source code is apparently derived from a common source, but has evolved
> > in different ways, and the version from the kernel appears to be much faster
> > overall. 
> 
> It looks like you have various benchmarks based on different code bases. 
> This is not good for reproduceability and diagnosing the problem.  Could we
> settle on one, and ideally a (simple) user space version?  This will
> drastically increase the likelihood of finding a solution :).

I'd suggest sticking with the libressl test case from comment 1, and ignoring
the kernel version until the libressl one is fully understood. It seems very
likely that fixing one will also address the other.

Are you able to start with the test procedure from comment 1, or do you need
something that can be scripted better, e.g. in a single C file?

> Also, is this a GCC 8 regression?  It looks like in most of the benchmarks
> you post, GCC 8 performs pretty close to 4.x.  Again, settling on one
> benchmark, preferably in user space, would really help.

I had originally classified it as "7.2 regression", Richard changed it to "7/8
regression", which I think is correct: The problem is almost certainly the
"-fcode-hoisting" optimization step, and both gcc-7 and gcc-8 show about a 10%
difference between the normal "-O2" and "-O2 -fno-code-hoisting", it's just
that gcc-8 is faster overall.