Bug 81828 - Cilkplus performance regression on ARM...
Summary: Cilkplus performance regression on ARM...
Status: RESOLVED WONTFIX
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 7.1.0
: P5 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2017-08-11 18:12 UTC by Eric
Modified: 2017-12-04 11:05 UTC (History)
1 user (show)

See Also:
Host:
Target: arm
Build:
Known to work:
Known to fail:
Last reconfirmed: 2017-09-20 00:00:00


Attachments
Graph showing performance regression... (14.28 KB, image/png)
2017-08-11 18:12 UTC, Eric
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eric 2017-08-11 18:12:53 UTC
Created attachment 41979 [details]
Graph showing performance regression...

Code for gcc version 7.1 using Cilkplus parallel programming extensions on ARM is running much slower than the same code with version 6.2.  Details may by viewed graphically as

    http://fractal.math.unr.edu/~ejolson/bench/dotprod/gcc71-8.png

which consistently shows a loss of performance using any combination of 1 to 8 cores on a Samsung/Nexell S5P6818 based SBC.  More information and example code is available at

    https://www.raspberrypi.org/forums/viewtopic.php?p=711196#p1197225

My impression is that this regression affects almost all Cilkplus code on ARM and is possibly the result unaligned cactus stack additional overhead in switching tasks that was not present in the 6.2 version.  It is likely that performance-based tests for ARM Cilkplus are needed to insure such regressions do not happen in the future.  Note that the performance of serial code is not affected.

The test code was compiled for 32-bit mode using options

    -fcilkplus -O3 -mcpu=cortex-a7 -mfpu=neon-vfpv4 -mfloat-abi=hard -ffast-math

and run under identical circumstances in both cases.
Comment 1 Andrew Pinski 2017-08-11 18:37:09 UTC
Note Cilk+ have been deprecated:
https://gcc.gnu.org/gcc-7/changes.html
Comment 2 Eric 2017-08-12 02:47:18 UTC
(In reply to Andrew Pinski from comment #1)
> Note Cilk+ have been deprecated:
> https://gcc.gnu.org/gcc-7/changes.html

As 48-core ARM chips have just been announced by Qualcomm, now seems like the wrong time to be deprecating built-in support for parallel processing in gcc.  I'd like to think that gcc is continuing to evolve to support new hardware and not turning into a retro-computing project.

Presumably OpenMP is still supported and I know new versions of OpenMP support task barriers which suspend until child tasks are complete in a way similar to Cilkplus.  Still, the syntax of Cilkplus is easier to read and the vector notation would be useful if it worked.

While there doesn't appear to be much interest or expertise in maintaining Cilkplus, my opinion is that parallel programming techniques are essential to make efficient use of modern hardware and deprecating a convenient way of programming multi-core hardware at this point is a mistake.  To this end, maybe it would be a good idea to figure out what is causing the slowdown in gcc version 7.1 ARM Cilkplus even though it is deprecated.
Comment 3 Eric 2017-08-12 02:49:08 UTC
(In reply to Andrew Pinski from comment #1)
> Note Cilk+ have been deprecated:
> https://gcc.gnu.org/gcc-7/changes.html

As 48-core ARM chips have just been announced by Qualcomm, now seems like the wrong time to be deprecating built-in support for parallel processing in gcc.  I'd like to think that gcc is continuing to evolve to support new hardware and not turning into a retro-computing project.

Presumably OpenMP is still supported and I know new versions of OpenMP support task barriers which suspend until child tasks are complete in a way similar to Cilkplus.  Still, the syntax of Cilkplus is easier to read and the vector notation would be useful if it worked.

While there doesn't appear to be much interest or expertise in maintaining Cilkplus, my opinion is that parallel programming techniques are essential to make efficient use of modern hardware and deprecating a convenient way of programming multi-core hardware at this point is a mistake.  To this end, maybe it would be a good idea to figure out what is causing the slowdown in gcc version 7.1 ARM Cilkplus even though it is deprecated.
Comment 4 Andrew Pinski 2017-08-12 02:59:13 UTC
>As 48-core ARM chips have just been announced by Qualcomm,

I have been using a 48 core ThunderX which is an ARMv8-a for almost 3 years now :)  So don't bring this up really.

Cilk+ is deprecated as nobody is using it and Intel seems like added it to GCC and then disappeared.  

See https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01209.html

> I didn't want to look into cilkplus too deeply as to why we have different
> types, because (a) I don't care (b) we're probably going to deprecate
> Cilk Plus, no?

https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01211.html

> And the more important question is if Intel is willing to maintain Cilk+ in
> GCC, or if we should deprecate it (and, if the latter, if already in GCC7
> deprecate, remove in GCC8, or deprecate in GCC8, remove in GCC9).
> There are various Cilk+ related PRs around on which nothing has been done
> for many months.

https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01220.html

> As discussed on IRC, we will probably deprecate CilkPlus for GCC7 and remove it
> for GCC8 unless someone is interested in maintaining it. So...committing as is.

And then nobody from Intel stepped up.
Comment 5 Ramana Radhakrishnan 2017-09-20 13:19:38 UTC
Assuming this to be true but setting this at P5 as I'm not sure where Cilkplus support is in GCC . ..
Comment 6 Paolo Carlini 2017-12-04 11:05:43 UTC
Cilk Plus, deprecated for 7.x, will not be in 8.x.