This patch introduces a problem: https://gcc.gnu.org/ml/gcc-patches/2012-09/msg00076.html and should be at least reverted. See https://github.com/mono/mono/pull/3549 and http://www.mono-project.com/news/2016/09/12/arm64-icache/ There is also some work going on to fix this on kernel level: https://gcc.gnu.org/ml/gcc-patches/2012-09/msg00076.html
Confirmed.
really there needs to be no caching at all because the code could migrate between the cores.
That is even reverting the patch can still cause issues. Really this code should just hard code 4 as the smallest size. Or change the loop to get the sizes through the whole loop.
(In reply to Bernhard Urban from comment #0) > This patch introduces a problem: > https://gcc.gnu.org/ml/gcc-patches/2012-09/msg00076.html and should be at > least reverted. > > See https://github.com/mono/mono/pull/3549 and > http://www.mono-project.com/news/2016/09/12/arm64-icache/ > > There is also some work going on to fix this on kernel level: > https://gcc.gnu.org/ml/gcc-patches/2012-09/msg00076.html Neither reverting the patch, nor any of the other suggestions in this report (other than hardcoding the size to 4) would remove the potential race condition between reading the size of the cache line and being migrated to a CPU with a smaller cache line size. The proposed Linux kernel workaround ( http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1227904.html ) will remove the need for a change to GCC in a Linux environment - the value will read as the minimum required for the system. The Kernel is also best placed to enable the workaround only for those systems which require it. Handling this in user-space is extremely fragile, and will penalize a large number of systems. Such a workaround would require serious thought as to how it was designed and implemented, and which subtargets it was enabled for. To my mind, the only appropriate place to implement a workaround is the kernel.
This is something the kernel has to address. If the system has processors with differently sized cache line lengths, then the OS needs to virtualize the relevant system registers to report an appropriate (safe) line length. Any other approach will likely have race conditions.