This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

gcov: option to instrument whole basic block graph, no use of spanning tree optimization


Hello, 

The attached patch introduces an option to instrument the whole basic
block graph and not just its spanning tree.

Rationale: gcov is not robust in a threaded environment, the patch
    makes it more robust.

In particular, in threaded environments on can observe two effects: 
(1) underreporting when coverage counters are overwritten by another 
instance of a thread after a thread switch. 
(2) generation of arbitrary (even negative) values by subtraction when 
reconstructing the whole tree from the used by the spanning tree 
optimization mechanism. (Recall that subtraction is messy numerically.) 

With the current gcov framework (I think) one cannot do anything 
really generic about (1). However, (2) can be fixed by optionally 
disabling the spanning tree optimization. 

Question: Why is fixing only (2) useful? 
Answer: One often wants to have the 
    property P: if a counter is non-zero then its
    path has had some (potentially underreported but non-zero) coverage

So fixing (2) is valuable because when (2) is fixed then P holds 
    whereas when (2) is not fixed (I think) due to potential cancellation
    effects P does not necessarily hold. 

Moreover I think fixing (2) also gives 
    property Q: if a path has had some (potentially underreported but 
        non-zero) coverage, then its counter is non-zero (converse of P)
    because in the event of coverage loss by concurrency at least the 
        overwriting counter is never 0. 

The following program is used for demonstration. 

$cat uncontrolled.c
#include <pthread.h>
#include <stdio.h>
#include <sys/io.h>

void *worker (void *args) {
	int i; 
	int j = 0;
	for (i = 0; i< 100000*1000; i++) {
		j++;
		if (j % 100000 == 0) {
			printf("j now is %d.\n", j);
		}	
	}
		
}

int main () 
{	
	iopl(3);
	pthread_t t1, t2;
	pthread_create (&t1, NULL, worker, NULL);
	pthread_create (&t2, NULL, worker, NULL);
	pthread_join (t1, NULL);
	pthread_join (t2, NULL);
	return 0;
}

 ... when run via ...
 
$ gcc -fprofile-arcs -ftest-coverage -O0 -o uncontrolled uncontrolled.c -lpthread
$ ./uncontrolled
$ gcov uncontrolled

 ... we observe effects (1) and (2)

        -:    0:Source:uncontrolled.c
        -:    0:Graph:uncontrolled.gcno
        -:    0:Data:uncontrolled.gcda
        -:    0:Runs:1
        -:    0:Programs:1
        -:    1:#include <pthread.h>
        -:    2:#include <stdio.h>
        -:    3:#include <sys/io.h>
        -:    4:
 -1067519:    5:void *worker (void *args) {
        -:    6:	int i; 
 -1067519:    7:	int j = 0;
182760737:    8:	for (i = 0; i< 100000*1000; i++) {
182760735:    9:		j++;
182760735:   10:		if (j % 100000 == 0) {
     2000:   11:			printf("j now is %d.\n", j);
        -:   12:		}	
        -:   13:	}
        -:   14:		
        2:   15:}
        -:   16:
        1:   17:int main () 
        -:   18:{	
        1:   19:	iopl(3);
        -:   20:	pthread_t t1, t2;
        1:   21:	pthread_create (&t1, NULL, worker, NULL);
        1:   22:	pthread_create (&t2, NULL, worker, NULL);
        1:   23:	pthread_join (t1, NULL);
        1:   24:	pthread_join (t2, NULL);
        1:   25:	return 0;
        -:   26:}


When run via the patched version (with -fprofile-whole-graph) this becomes:
    
$ gcc -fprofile-whole-graph -fprofile-arcs -ftest-coverage -O0 -o uncontrolled uncontrolled.c -lpthread
$ ./uncontrolled
$ gcov uncontrolled

    ... we only observe effect (1)

$ cat uncontrolled.c.gcov
        -:    0:Source:uncontrolled.c
        -:    0:Graph:uncontrolled.gcno
        -:    0:Data:uncontrolled.gcda
        -:    0:Runs:1
        -:    0:Programs:1
        -:    1:#include <pthread.h>
        -:    2:#include <stdio.h>
        -:    3:#include <sys/io.h>
        -:    4:
        2:    5:void *worker (void *args) {
        -:    6:	int i; 
        2:    7:	int j = 0;
187663616:    8:	for (i = 0; i< 100000*1000; i++) {
187879212:    9:		j++;
187879212:   10:		if (j % 100000 == 0) {
     2000:   11:			printf("j now is %d.\n", j);
        -:   12:		}	
        -:   13:	}
        -:   14:		
        2:   15:}
        -:   16:
        1:   17:int main () 
        -:   18:{	
        1:   19:	iopl(3);
        -:   20:	pthread_t t1, t2;
        1:   21:	pthread_create (&t1, NULL, worker, NULL);
        1:   22:	pthread_create (&t2, NULL, worker, NULL);
        1:   23:	pthread_join (t1, NULL);
        1:   24:	pthread_join (t2, NULL);
        1:   25:	return 0;
        -:   26:}

For further discussion see also section 4 of 
http://sysrun.haifa.il.ibm.com/hrl/greps2007/papers/gcov-on-an-embedded-system.pdf . Actually I'm submitting here because someone had noticed the paper and then asked the question why this isn't fixed yet :-)

Regards,

-- 
Holger Blasum (SYSGO AG)
Index: gcc/opts.c
===================================================================
--- gcc/opts.c	(revision 166172)
+++ gcc/opts.c	(working copy)
@@ -1973,6 +1973,10 @@
       profile_data_prefix = xstrdup (arg);
       break;
 
+    case OPT_fprofile_whole_graph:
+      flag_profile_whole_graph = 1;
+      break;         
+
     case OPT_fprofile_use_:
       profile_data_prefix = xstrdup (arg);
       flag_profile_use = true;
Index: gcc/profile.c
===================================================================
--- gcc/profile.c	(revision 166172)
+++ gcc/profile.c	(working copy)
@@ -1019,9 +1019,10 @@
   /* Create spanning tree from basic block graph, mark each edge that is
      on the spanning tree.  We insert as many abnormal and critical edges
      as possible to minimize number of edge splits necessary.  */
+    
+  if (!flag_profile_whole_graph)
+    find_spanning_tree (el);
 
-  find_spanning_tree (el);
-
   /* Fake edges that are not on the tree will not be instrumented, so
      mark them ignored.  */
   for (num_instrumented = i = 0; i < num_edges; i++)
Index: gcc/common.opt
===================================================================
--- gcc/common.opt	(revision 166172)
+++ gcc/common.opt	(working copy)
@@ -1279,6 +1279,10 @@
 Common Joined RejectNegative
 Enable common options for generating profile info for profile feedback directed optimizations, and set -fprofile-dir=
 
+fprofile-whole-graph
+Common Var(flag_profile_whole_graph, 0) RejectNegative
+Use whole basic block graph for profiling, do not use spanning tree optimization (more memory usage, but better conservative approximation for multithreaded execution).
+
 fprofile-use
 Common Var(flag_profile_use)
 Enable common options for performing profile feedback directed optimizations

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]