I fill in to track the problem of cacheline conflicts which is also dicussed in LLVM variant in http://lists.llvm.org/pipermail/llvm-dev/2014-April/072172.html
Can you please attach WIP patch you have?
Created attachment 45703 [details] patch for tls counters (incomplete - no runtime bits) Also I think google's code to reduce cacheline conflicts is https://gcc.gnu.org/ml/gcc-patches/2012-05/msg00959.html
(In reply to Jan Hubicka from comment #2) > Created attachment 45703 [details] > patch for tls counters (incomplete - no runtime bits) Isn't the patch only a refactoring that is eliminating tls_model from tree_decl_with_vis and moving that into cgraph_node?
I'm just looking at the google/gcc-4.9 branch: https://android.googlesource.com/toolchain/gcc/+/master/gcc-4.9/ and they have a sampling approach: /* Transform: ORIGINAL CODE Into: __gcov_sample_counter++; if (__gcov_sample_counter >= __gcov_sampling_period) { __gcov_sample_counter = 0; ORIGINAL CODE } which effectively updates edge counters just for a limited time. I would expect size increase: Removing basic block 9 Removing basic block 10 main (int argc) { unsigned int PROF_sample.2; unsigned int PROF_sample.1; long int PROF_edge_counter_6; long int PROF_edge_counter_7; long int PROF_edge_counter_8; long int PROF_edge_counter_9; <bb 2>: __gcov_indirect_call_profiler_v2 (1005944783, main); __gcov_indirect_call_callee = 0B; if (argc_2(D) != 0) goto <bb 3>; else goto <bb 6>; <bb 3>: a = 123; PROF_sample.2_13 = __gcov_sample_counter; PROF_sample.2_14 = PROF_sample.2_13 + 1; __gcov_sample_counter = PROF_sample.2_14; PROF_sample.2_15 = __gcov_sampling_period; if (PROF_sample.2_14 >= PROF_sample.2_15) goto <bb 5>; else goto <bb 4>; <bb 4>: goto <bb 8>; <bb 5>: __gcov_sample_counter = 0; PROF_edge_counter_6 = __gcov0.main[0]; PROF_edge_counter_7 = PROF_edge_counter_6 + 1; __gcov0.main[0] = PROF_edge_counter_7; goto <bb 8>; <bb 6>: a = 0; PROF_sample.1_10 = __gcov_sample_counter; PROF_sample.1_11 = PROF_sample.1_10 + 1; __gcov_sample_counter = PROF_sample.1_11; PROF_sample.1_12 = __gcov_sampling_period; if (PROF_sample.1_11 >= PROF_sample.1_12) goto <bb 7>; else goto <bb 4>; <bb 7>: __gcov_sample_counter = 0; PROF_edge_counter_8 = __gcov0.main[1]; PROF_edge_counter_9 = PROF_edge_counter_8 + 1; __gcov0.main[1] = PROF_edge_counter_9; <bb 8>: return 0; }
> > which effectively updates edge counters just for a limited time. I would > expect Ah now, it's really doing sampling. I guess it can lead to quite some profile inconsistencies..
> Ah now, it's really doing sampling. I guess it can lead to quite some profile > inconsistencies.. Yep, it is not coolest solution. I would not worry too much about precision loss unless you get some weird interference between the sampling counter and actual program behaviour. Adding conditionals everywhere is not very good and I am not sure how well CPU will predict such branches. Honza
Btw, use of TLS has * size of counters overhead (one could use char sized TLS counters and update the global ones with locking on overflow) * tear-down/build-up cost at thread termination/creation the advantage is of course it's simple implementation-wise.
GCC 10.1 has been released.
GCC 10.2 is released, adjusting target milestone.
GCC 10.3 is being released, retargeting bugs to GCC 10.4.
GCC 10.4 is being released, retargeting bugs to GCC 10.5.