Created attachment 41979 [details] Graph showing performance regression... Code for gcc version 7.1 using Cilkplus parallel programming extensions on ARM is running much slower than the same code with version 6.2. Details may by viewed graphically as http://fractal.math.unr.edu/~ejolson/bench/dotprod/gcc71-8.png which consistently shows a loss of performance using any combination of 1 to 8 cores on a Samsung/Nexell S5P6818 based SBC. More information and example code is available at https://www.raspberrypi.org/forums/viewtopic.php?p=711196#p1197225 My impression is that this regression affects almost all Cilkplus code on ARM and is possibly the result unaligned cactus stack additional overhead in switching tasks that was not present in the 6.2 version. It is likely that performance-based tests for ARM Cilkplus are needed to insure such regressions do not happen in the future. Note that the performance of serial code is not affected. The test code was compiled for 32-bit mode using options -fcilkplus -O3 -mcpu=cortex-a7 -mfpu=neon-vfpv4 -mfloat-abi=hard -ffast-math and run under identical circumstances in both cases.
Note Cilk+ have been deprecated: https://gcc.gnu.org/gcc-7/changes.html
(In reply to Andrew Pinski from comment #1) > Note Cilk+ have been deprecated: > https://gcc.gnu.org/gcc-7/changes.html As 48-core ARM chips have just been announced by Qualcomm, now seems like the wrong time to be deprecating built-in support for parallel processing in gcc. I'd like to think that gcc is continuing to evolve to support new hardware and not turning into a retro-computing project. Presumably OpenMP is still supported and I know new versions of OpenMP support task barriers which suspend until child tasks are complete in a way similar to Cilkplus. Still, the syntax of Cilkplus is easier to read and the vector notation would be useful if it worked. While there doesn't appear to be much interest or expertise in maintaining Cilkplus, my opinion is that parallel programming techniques are essential to make efficient use of modern hardware and deprecating a convenient way of programming multi-core hardware at this point is a mistake. To this end, maybe it would be a good idea to figure out what is causing the slowdown in gcc version 7.1 ARM Cilkplus even though it is deprecated.
>As 48-core ARM chips have just been announced by Qualcomm, I have been using a 48 core ThunderX which is an ARMv8-a for almost 3 years now :) So don't bring this up really. Cilk+ is deprecated as nobody is using it and Intel seems like added it to GCC and then disappeared. See https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01209.html > I didn't want to look into cilkplus too deeply as to why we have different > types, because (a) I don't care (b) we're probably going to deprecate > Cilk Plus, no? https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01211.html > And the more important question is if Intel is willing to maintain Cilk+ in > GCC, or if we should deprecate it (and, if the latter, if already in GCC7 > deprecate, remove in GCC8, or deprecate in GCC8, remove in GCC9). > There are various Cilk+ related PRs around on which nothing has been done > for many months. https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01220.html > As discussed on IRC, we will probably deprecate CilkPlus for GCC7 and remove it > for GCC8 unless someone is interested in maintaining it. So...committing as is. And then nobody from Intel stepped up.
Assuming this to be true but setting this at P5 as I'm not sure where Cilkplus support is in GCC . ..
Cilk Plus, deprecated for 7.x, will not be in 8.x.