This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: GCC Commit Stats [was: [GCC Steering Committee attention] [PING] [PING] [PING] libgomp [...]]


On Fri, Aug 05, 2016 at 04:38:30PM +0100, Manuel López-Ibáñez wrote:
 
> I think those conclusions are debatable:

I won't respond to all your points (I'm busy this evening), but I can
regenerate my table with some of your suggestions.

> * GCC has also grown over the years, there is a lot more code and
> areas, specially more targets, which attract their own temporary
> developers who do not contribute to the rest of the compiler (much
> less review patches for the rest of the compiler).
> 
> * Your analysis includes Ada, Go and Fortran. I'd suggest to exclude
> them, since in terms of developers and reviewing, they seem to be
> doing fine. They also tend to organise themselves mostly independently
> of the rest of the compiler. This is also mostly true for targets.

Excluding this is tricky, but in principle is just matter of tweaking
the git shortlog command. If that's something you want to do, I'd be
interested to see. I didn't get reasonable results in time for the
history back to 1998 to present these numbers, in a few rough tests
they didn't look vastly different (when filtering on gcc/*.[ch]).

I've given the 2012-2015 numbers below, just to show that (for the files
in gcc/*.[ch]) your hypothesis doesn't hold. The vast majority of
committers make <20 commits in a year.

Year            | 2012 | 2013 | 2014 | 2015
Commits         | 1816 | 1632 | 2148 | 2362
Committers      |   98 |  110 |  109 |  114 
Average commits |   19 |   15 |   20 |   21
Number of committers achieving N commits by bucket...
1-19            |   78 |   96 |   92 |   94
20-39           |   12 |    4 |    5 |    7
40-59           |    2 |    3 |    7 |    3
60-79           |    1 |    3 |    0 |    0
80-100          |    1 |    1 |    0 |    2
100-199         |    2 |    1 |    2 |    6
200+            |    2 |    2 |    3 |    2
Percentage of committers achieving N commits by bucket...
1-19            |   80 |   87 |   84 |   82
20-39           |   12 |    4 |    5 |    6
40-59           |    2 |    3 |    6 |    3
60-79           |    1 |    3 |    0 |    0
80-99           |    1 |    1 |    0 |    2
100-199         |    2 |    1 |    2 |    5
200+            |    2 |    2 |    3 |    2


> * 100 commits is less than 2%. Quite a low threshold. Perhaps 1%, 25%,
> 50%, 75%, 90% are more informative.

Again, just done for time. I've changed the last two buckets to 100-199
and 200+ in this run. If you'd like to do, I'd be happy to see the
results.

> * https://www.openhub.net/p/taezaza/contributors/summary shows that
> more than 25% of the commits in the last 12 months were made by 6
> people. Note that those people are also the most active reviewers.

True, but as you point out below, few data samples tell us little.

> * If I adjust the numbers by the total number of contributors, then we
> get a different picture:

I've added that to my table.

> that is, most of the commits are done by smaller fraction of the
> total.

For 2015 I found the 4 "25%" marks to be:

  26%    1-4
  25%    5-13
  25%    14-39
  23%    40+

So 75% of the work is being done by people who commit fewer than 40
patches in a year. Encouragingly 50% of the people who committed in
2015 committed at least one patch per month (on average).

> * Numbers for other years might shed more light. 2010, 2013 and 2015
> might have been especial in one sense or another.

I compressed this for space. The full table is below (and attached - just
in case your mail client gets zealous with the text and re-wraps it).

Year            | 1998 | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 
Commits         | 4997 | 5531 | 7031 | 6850 | 6704 | 7961 | 9137 | 7646 | 5039 | 6633 | 5667 | 6244 | 7582 | 8181 | 6463 | 5970 | 7497 | 7742 | 
Committers      |   44 |   65 |   89 |  116 |  128 |  153 |  153 |  167 |  163 |  167 |  172 |  174 |  171 |  176 |  181 |  176 |  204 |  190 | 
Average commits |  114 |   85 |   79 |   59 |   52 |   52 |   60 |   46 |   31 |   40 |   33 |   36 |   44 |   46 |   36 |   34 |   37 |   41 | 
Number of committers achieving N commits by bucket...
1-19            |   16 |   29 |   38 |   55 |   64 |   80 |   71 |   91 |   97 |   91 |  111 |  107 |  103 |  114 |  116 |  110 |  131 |  116 | 
20-39           |    8 |   12 |   19 |   19 |   18 |   21 |   26 |   18 |   28 |   26 |   25 |   29 |   28 |   18 |   26 |   28 |   32 |   31 | 
40-59           |    4 |    5 |    4 |    9 |    8 |   13 |   10 |   15 |   13 |   19 |   15 |   15 |   13 |    9 |   15 |   16 |   16 |   12 | 
60-79           |    3 |    4 |    4 |    7 |   11 |    8 |   10 |   13 |    4 |    9 |    6 |    6 |    3 |   12 |    7 |    4 |    6 |    5 | 
80-100          |    2 |    3 |    4 |    6 |    7 |   12 |   10 |    7 |    7 |    5 |    4 |    3 |    4 |    4 |    0 |    4 |    3 |    4 | 
100-199         |    6 |    6 |    8 |   11 |   13 |    8 |   13 |   16 |   12 |   10 |    6 |    6 |    9 |    8 |    8 |    8 |    6 |   11 | 
200+            |    5 |    6 |   12 |   10 |    8 |   12 |   14 |    8 |    3 |    8 |    6 |    9 |   12 |   12 |   10 |    7 |   11 |   12 | 
Percentage of committers achieving N commits by bucket...
1-19            |   36 |   45 |   43 |   47 |   50 |   52 |   46 |   54 |   60 |   54 |   65 |   61 |   60 |   65 |   64 |   62 |   64 |   61 | 
20-39           |   18 |   18 |   21 |   16 |   14 |   14 |   17 |   11 |   17 |   16 |   15 |   17 |   16 |   10 |   14 |   16 |   16 |   16 | 
40-59           |    9 |    8 |    4 |    8 |    6 |    8 |    7 |    9 |    8 |   11 |    9 |    9 |    8 |    5 |    8 |    9 |    8 |    6 | 
60-79           |    7 |    6 |    4 |    6 |    9 |    5 |    7 |    8 |    2 |    5 |    3 |    3 |    2 |    7 |    4 |    2 |    3 |    3 | 
80-99           |    5 |    5 |    4 |    5 |    5 |    8 |    7 |    4 |    4 |    3 |    2 |    2 |    2 |    2 |    0 |    2 |    1 |    2 | 
100-199         |   14 |    9 |    9 |    9 |   10 |    5 |    8 |   10 |    7 |    6 |    3 |    3 |    5 |    5 |    4 |    5 |    3 |    6 | 
200+            |   11 |    9 |   13 |    9 |    6 |    8 |    9 |    5 |    2 |    5 |    3 |    5 |    7 |    7 |    6 |    4 |    5 |    6 | 

Personally, I think that looks like a fairly stable and healthy community,
but you're welcome to draw your own conclusions from the data.

The raw data for these tables can be generated with:

  for i in {1989..2015}; do printf "%d\t" $i; git shortlog -s -n --since=01/01/$i --until=01/01/$((i+1)) | awk '{if ($2 != "gccadmin") {sum+=$1;count+=1};  if ($1 < 20) {bucket1+=1} else if ($1 < 40) {bucket2+=1} else if ($1 < 60) {bucket3+=1} else if ($1 < 80) {bucket4+=1} else if ($1 < 100) {bucket5+=1} else if ($1 < 200) {bucket6+=1} else  {bucket7+=1} } END {printf "%d\t%d\t%.0f\t%d\t%d\t%d\t%d\t%d\t%d\t%d\t%.0f\t%.0f\t%.0f\t%.0f\t%.0f\t%.0f\t%.0f\n", sum, count, (sum/count), bucket1, bucket2, bucket3, bucket4, bucket5, bucket6, bucket7, (bucket1/count) *100, (bucket2/count)*100, (bucket3/count)*100, (bucket4/count)*100, (bucket5/count)*100, (bucket6/count)*100, (bucket7/count)*100}'; done

in your git checkout.

Hope that helps.

Thanks,
James

 

Attachment: commit-stats.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]