Breakage varies from bootstrap to bootstrap. Here is an example: ... In file included from ../../gcc/gcc/config/i386/i386.c:40910:0: ./gt-i386.h: In function ‘ix86_set_reg_reg_cost(machine_mode)’: ./gt-i386.h:168:2: error: corrupted profile info: edge from 39 to 40 exceeds maximal count }; ^ ./gt-i386.h:168:2: error: corrupted profile info: edge from 40 to 41 exceeds maximal count ./gt-i386.h:168:2: error: corrupted profile info: edge from 41 to 43 exceeds maximal count ./gt-i386.h:168:2: error: corrupted profile info: edge from 42 to 43 exceeds maximal count ./gt-i386.h:168:2: error: corrupted profile info: profile data is not flow-consistent ./gt-i386.h:168:2: error: corrupted profile info: number of executions for edge 30-42 thought to be -133660 ./gt-i386.h:168:2: error: corrupted profile info: number of executions for edge 30-31 thought to be 133660 ./gt-i386.h:168:2: error: corrupted profile info: number of iterations for basic block 31 thought to be 267320 ./gt-i386.h:168:2: error: corrupted profile info: number of executions for edge 31-32 thought to be 133660 ./gt-i386.h:168:2: error: corrupted profile info: number of executions for edge 31-34 thought to be 133660 ./gt-i386.h:168:2: error: corrupted profile info: number of iterations for basic block 34 thought to be 244050 ./gt-i386.h:168:2: error: corrupted profile info: number of executions for edge 34-35 thought to be 110390 ./gt-i386.h:168:2: error: corrupted profile info: number of executions for edge 34-38 thought to be 133660 ./gt-i386.h:168:2: error: corrupted profile info: number of iterations for basic block 38 thought to be 226620 ./gt-i386.h:168:2: error: corrupted profile info: number of executions for edge 38-39 thought to be 92960 ./gt-i386.h:168:2: error: corrupted profile info: number of executions for edge 38-43 thought to be 133660 ./gt-i386.h:168:2: error: corrupted profile info: number of executions for edge 39-42 thought to be 92960 ./gt-i386.h:168:2: error: corrupted profile info: number of executions for edge 39-40 thought to be 0 ./gt-i386.h:168:2: error: corrupted profile info: number of executions for edge 40-42 thought to be 0 ./gt-i386.h:168:2: error: corrupted profile info: number of executions for edge 40-41 thought to be 0 ./gt-i386.h:168:2: error: corrupted profile info: number of executions for edge 41-42 thought to be 0 ./gt-i386.h:168:2: error: corrupted profile info: number of executions for edge 41-43 thought to be 0 make[3]: *** [i386.o] Error Happens with (I'm a Gentoo ricer): make -j4 BOOT_CFLAGS="-march=native -O3 -pipe" STAGE1_CFLAGS="-march=native -O3 -pipe" CFLAGS_FOR_TARGET="-march=native -O3 -pipe" CXXFLAGS_FOR_TARGET="-march=native -O3 -pipe" profiledbootstrap (and ../gcc/configure --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.8.0 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.0/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.8.0 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.8.0/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.8.0/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.0/include/g++-v4 --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --disable-altivec --disable-fixed-point --without-ppl --without-cloog --enable-lto --enable-nls --without-included-gettext --with-system-zlib --disable-werror --enable-initfini-array --with-gold --enable-secureplt --disable-multilib --enable-libmudflap --disable-libssp --disable-libgomp --enable-cld --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/4.8.0/python --enable-checking=release --disable-libgcj --enable-languages=c,c++ --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --with-build-config=bootstrap-lto --with-boot-ldflags="-Wl,-O1,--hash-style=gnu,--as-needed,--gc-sections,--icf=safe,--icf-iterations=3" --enable-version-specific-runtime-libs --disable-libstdcxx-pch --enable-libstdcxx-time=yes )
CCing author of patch.
Same here on x86-64: http://gcc.gnu.org/ml/gcc-regression/2012-09/msg00080.html
Also happens with revision 190982 on Fedora 18/x86-64. I configured GCC with --prefix=/usr/local --enable-clocale=gnu --with-system-zlib --enable-shared --with-demangler-in-ld --with-build-config=bootstrap-lto --with-fpmath=sse --enable-languages=c,c++,fortran,java,lto,objc and used make -j8 profiledbootstrap on a 8-core machine.
Created attachment 28133 [details] bad .gcda file I've attached a bad .gcda file. Please note that both H.J. and I use --with-build-config=bootstrap-lto.
I can reproduce it with only --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --enable-languages=c,c++ --prefix=/usr/local --enable-gnu-indirect-function --with-fpmath=sse
I finally got a reproducer for the error that H.J. reported. I will work on fixing that first. Markus, I looked at the gcda file you sent but don't see anything obviously wrong with it. gcov-dump reports that most of the counts in that file are 0 or very small, so I am not sure why there were messages about exceeding the maximal count.
(In reply to comment #6) > Markus, I looked at the gcda file you sent but don't see anything obviously > wrong with it. gcov-dump reports that most of the counts in that file are 0 or > very small, so I am not sure why there were messages about exceeding the > maximal count. Yes, as I wrote in my first message the point of failure always varies (for each bootstrap). The attached cp-demangle.gcda was from a failure similar to the one H.J. reported in Comment 2. I guess fixing H.J.'s issue will fix the other failures, too.
I think I have a solution for the issue that H.J. is encountering. Details below. Markus and H.J., would you be able to try the following patch to see if it addresses the failure you were seeing? Markus, were you only seeing failures when using a parallel make? Index: libgcc/libgcov.c =================================================================== --- libgcc/libgcov.c (revision 191035) +++ libgcc/libgcov.c (working copy) @@ -707,7 +707,9 @@ gcov_exit (void) memcpy (cs_all, cs_prg, sizeof (*cs_all)); else if (!all_prg.checksum && (!GCOV_LOCKED || cs_all->runs == cs_prg->runs) - && memcmp (cs_all, cs_prg, sizeof (*cs_all))) + && memcmp (cs_all, cs_prg, + sizeof (*cs_all) - (sizeof (gcov_bucket_type) + * GCOV_HISTOGRAM_SIZE))) { fprintf (stderr, "profiling:%s:Invocation mismatch - some data files may have been removed%s\n", gi_filename, GCOV_LOCKED After looking at the cp-demangle matching issue for awhile, I finally realized that in my case at least, it was a valid issue with the preprocessed cp-demangle source code not matching the existing cp-demangle.gcda file. I tracked that down to different includes being done due to a difference in the libiberty configure. The libiberty config.log showed that there were some failures in some of the checks which were using the instrumented prev-gcc/xgcc that was giving errors like: profiling:/home/tejohnson/extra/gcc_trunk_4_validate4/gcc/dwarf2out.gcda:Invocation mismatch - some data files may have been removed configure:3427: $? = 0 configure: failed program was: ... When profile merging happens, there are some sanity checks to ensure that the merged summaries for all object files are the same. These tests are failing in some cases due to small differences in the merged histograms, resulting in the above message. The total of the counter values in the histogram is the same, but there are slight differences in the cumulative counter values assigned to consecutive buckets. This could happen due to the way the cumulative counter values are apportioned out to the counters when merging. If the summaries are merged in different orders by the parallel runs, the integer division truncation may result in small differences. Ultimately these differences don't matter much as the sum of all the counter values saved in the histograms is consistent and the differences will be small and not significant. The best solution is to ignore the histogram when doing the sanity check, and just compare the high-level summary info (sum_all, sum_max, run_max, etc). That seems to be addressing the issue I was having. At least, I haven't been able to reproduce it yet.
(In reply to comment #8) > I think I have a solution for the issue that H.J. is encountering. Details > below. Markus and H.J., would you be able to try the following patch to see if > it addresses the failure you were seeing? I just successfully completed a lto/profiledbootstrap with your patch applied. Thanks Teresa. > Markus, were you only seeing failures when using a parallel make? I never use non-parallel make, so I can't tell.
That's good news. I will finish testing the patch and send it for review. Thanks, Teresa
(In reply to comment #8) > I think I have a solution for the issue that H.J. is encountering. Details > below. Markus and H.J., would you be able to try the following patch to see if > it addresses the failure you were seeing? Markus, were you only seeing failures > when using a parallel make? > > Index: libgcc/libgcov.c > =================================================================== > --- libgcc/libgcov.c (revision 191035) > +++ libgcc/libgcov.c (working copy) > @@ -707,7 +707,9 @@ gcov_exit (void) > memcpy (cs_all, cs_prg, sizeof (*cs_all)); > else if (!all_prg.checksum > && (!GCOV_LOCKED || cs_all->runs == cs_prg->runs) > - && memcmp (cs_all, cs_prg, sizeof (*cs_all))) > + && memcmp (cs_all, cs_prg, > + sizeof (*cs_all) - (sizeof (gcov_bucket_type) > + * GCOV_HISTOGRAM_SIZE))) > { > fprintf (stderr, "profiling:%s:Invocation mismatch - some data files > may have been removed%s\n", > gi_filename, GCOV_LOCKED > > I applied it by hand: diff --git a/libgcc/libgcov.c b/libgcc/libgcov.c index fce8587..daf95af 100644 --- a/libgcc/libgcov.c +++ b/libgcc/libgcov.c @@ -707,7 +707,10 @@ gcov_exit (void) memcpy (cs_all, cs_prg, sizeof (*cs_all)); else if (!all_prg.checksum && (!GCOV_LOCKED || cs_all->runs == cs_prg->runs) - && memcmp (cs_all, cs_prg, sizeof (*cs_all))) + && memcmp (cs_all, cs_prg, sizeof (*cs_all)) + && memcmp (cs_all, cs_prg, + sizeof (*cs_all) - (sizeof (gcov_bucket_type) + * GCOV_HISTOGRAM_SIZE))) { fprintf (stderr, "profiling:%s:Invocation mismatch - some data files may have been removed%s\n", gi_filename, GCOV_LOCKED and it still failed for me on Fedora/18 24 core x86-64 with -j 12: /export/gnu/import/git/gcc/gcc/tree-cfg.c: In function ‘bool gimple_can_merge_blocks_p(basic_block, basic_block)’: /export/gnu/import/git/gcc/gcc/tree-cfg.c:7904:1: error: corrupted profile info: edge from 52 to 53 exceeds maximal count } ^ /export/gnu/import/git/gcc/gcc/tree-cfg.c:7904:1: error: corrupted profile info: edge from 54 to 55 exceeds maximal count /export/gnu/import/git/gcc/gcc/tree-cfg.c:7904:1: error: corrupted profile info: edge from 55 to 56 exceeds maximal count /export/gnu/import/git/gcc/gcc/tree-cfg.c:7904:1: error: corrupted profile info: edge from 56 to 57 exceeds maximal count /export/gnu/import/git/gcc/gcc/tree-cfg.c:7904:1: internal compiler error: in cgraph_create_edge_1, at cgraph.c:669 Please submit a full bug report, with preprocessed source if appropriate. See <http://gcc.gnu.org/bugs.html> for instructions. make[4]: *** [tree-cfg.o] Error 1 make[4]: *** Waiting for unfinished jobs.... rm gcov.pod gcc.pod cpp.pod fsf-funding.pod gfdl.pod make[4]: Leaving directory `/export/build/gnu/gcc-fdo/build-x86_64-linux/gcc' make[3]: *** [all-stagefeedback-gcc] Error 2
On Thu, Sep 6, 2012 at 1:06 PM, hjl.tools at gmail dot com <gcc-bugzilla@gcc.gnu.org> wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54487 > > --- Comment #11 from H.J. Lu <hjl.tools at gmail dot com> 2012-09-06 20:06:55 UTC --- > (In reply to comment #8) >> I think I have a solution for the issue that H.J. is encountering. Details >> below. Markus and H.J., would you be able to try the following patch to see if >> it addresses the failure you were seeing? Markus, were you only seeing failures >> when using a parallel make? >> >> Index: libgcc/libgcov.c >> =================================================================== >> --- libgcc/libgcov.c (revision 191035) >> +++ libgcc/libgcov.c (working copy) >> @@ -707,7 +707,9 @@ gcov_exit (void) >> memcpy (cs_all, cs_prg, sizeof (*cs_all)); >> else if (!all_prg.checksum >> && (!GCOV_LOCKED || cs_all->runs == cs_prg->runs) >> - && memcmp (cs_all, cs_prg, sizeof (*cs_all))) >> + && memcmp (cs_all, cs_prg, >> + sizeof (*cs_all) - (sizeof (gcov_bucket_type) >> + * GCOV_HISTOGRAM_SIZE))) >> { >> fprintf (stderr, "profiling:%s:Invocation mismatch - some data files >> may have been removed%s\n", >> gi_filename, GCOV_LOCKED >> >> > > I applied it by hand: > diff --git a/libgcc/libgcov.c b/libgcc/libgcov.c > index fce8587..daf95af 100644 > --- a/libgcc/libgcov.c > +++ b/libgcc/libgcov.c > @@ -707,7 +707,10 @@ gcov_exit (void) > memcpy (cs_all, cs_prg, sizeof (*cs_all)); > else if (!all_prg.checksum > && (!GCOV_LOCKED || cs_all->runs == cs_prg->runs) > - && memcmp (cs_all, cs_prg, sizeof (*cs_all))) > + && memcmp (cs_all, cs_prg, sizeof (*cs_all)) > + && memcmp (cs_all, cs_prg, > + sizeof (*cs_all) - (sizeof (gcov_bucket_type) > + * GCOV_HISTOGRAM_SIZE))) > { > fprintf (stderr, "profiling:%s:Invocation mismatch - some data files > may have been removed%s\n", > gi_filename, GCOV_LOCKED > > and it still failed for me on Fedora/18 24 core x86-64 > with -j 12: > > /export/gnu/import/git/gcc/gcc/tree-cfg.c: In function ‘bool > gimple_can_merge_blocks_p(basic_block, basic_block)’: > /export/gnu/import/git/gcc/gcc/tree-cfg.c:7904:1: error: corrupted profile > info: edge from 52 to 53 exceeds maximal count > } > ^ > /export/gnu/import/git/gcc/gcc/tree-cfg.c:7904:1: error: corrupted profile > info: edge from 54 to 55 exceeds maximal count > /export/gnu/import/git/gcc/gcc/tree-cfg.c:7904:1: error: corrupted profile > info: edge from 55 to 56 exceeds maximal count > /export/gnu/import/git/gcc/gcc/tree-cfg.c:7904:1: error: corrupted profile > info: edge from 56 to 57 exceeds maximal count > /export/gnu/import/git/gcc/gcc/tree-cfg.c:7904:1: internal compiler error: in > cgraph_create_edge_1, at cgraph.c:669 Unfortunately I haven't been able to reproduce this one yet. Can you send me the corrupt gcda file? Thanks, Teresa > Please submit a full bug report, > with preprocessed source if appropriate. > See <http://gcc.gnu.org/bugs.html> for instructions. > make[4]: *** [tree-cfg.o] Error 1 > make[4]: *** Waiting for unfinished jobs.... > rm gcov.pod gcc.pod cpp.pod fsf-funding.pod gfdl.pod > make[4]: Leaving directory `/export/build/gnu/gcc-fdo/build-x86_64-linux/gcc' > make[3]: *** [all-stagefeedback-gcc] Error 2 > > -- > Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are on the CC list for the bug.
It works for me now after syncing with revision 191037.
On Thu, Sep 6, 2012 at 1:49 PM, hjl.tools at gmail dot com <gcc-bugzilla@gcc.gnu.org> wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54487 > > --- Comment #13 from H.J. Lu <hjl.tools at gmail dot com> 2012-09-06 20:49:02 UTC --- > It works for me now after syncing with revision 191037. Unfortunately, I have now hit this myself a couple times in further testing, so I am still digging... Teresa > > -- > Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are on the CC list for the bug.
Author: tejohnson Date: Fri Sep 7 13:49:47 2012 New Revision: 191074 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=191074 Log: This fixes part of the issue described in PR gcov-profile/54487 where there were warnings about mismatches due to slight differences in the merged histograms in different object files. This can happen due to the truncating integer division in the merge routine, which could result in slightly different histograms when summaries are merged in different orders. 2012-09-07 Teresa Johnson <tejohnson@google.com> PR gcov-profile/54487 * libgcc/libgcov.c (gcov_exit): Avoid warning on histogram differences. Modified: trunk/libgcc/ChangeLog trunk/libgcc/libgcov.c
On Thu, Sep 6, 2012 at 10:18 PM, Teresa Johnson <tejohnson@google.com> wrote: > On Thu, Sep 6, 2012 at 1:49 PM, hjl.tools at gmail dot com > <gcc-bugzilla@gcc.gnu.org> wrote: >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54487 >> >> --- Comment #13 from H.J. Lu <hjl.tools at gmail dot com> 2012-09-06 20:49:02 UTC --- >> It works for me now after syncing with revision 191037. > > Unfortunately, I have now hit this myself a couple times in further > testing, so I am still digging... After digging into this for several days now, I am convinced that the gcda file is changing out from underneath the profile-use compile during the read. I just don't understand how, since fcntl is being used to lock the file. Details below. I am hoping someone has some ideas or advice on where to go. First, some info on the compile step and gcda file locking, then I will explain why I belive the gcda file is being written mid-read. The failure is occurring sporadically during the profile-use step of a parallel profiledbootstrap. In this step, not only are the gcc profiles being read in by the profile-use compilation, but the gcc binaries being used in the compile are instrumented, so the gcda files are also being written by the parallel build processes. However, the fcntl file locking should prevent interference between the reads and the writes of the same gcda file. According to the fcntl documentation, the lock is lost on any fclose on the file by the same process, even if it uses a different file descriptor. But for a given profile use compile, the read step should be complete before the write step when the gcc exits and updates the gcda files with libgcov. So I don't think the lock should be lost this way. The failure is not reproducible manually. Therefore, I have been adding instrumentation to the compiler to spit out information at both the point of failure ("error: corrupted profile info: edge from 54 to 55 exceeds maximal count" from read_profile_edge_counts()), and then at the point when the gcda counts are being read in by read_counts_file(). This is the loop that looks like: for (ix = 0; ix != n_counts; ix++) entry->counts[ix] += gcov_read_counter (); I modified the above loop to check each count as it is read in against the sum_max, and if it exceeds sum_max to print the summary and counters at that point. I then compared this to the counters obtained for that function from a gcov-dump of the gcda file afterwards. The counters in the gcda file from gcov-dump are slightly higher because of subsequent merges into it by other parallel builds before compilation aborts, but it is still fairly easy to correlate the counter values between them. What I found is that the counter values being read by read_counts_file look good up to a certain point, and then they go bad. Looking at the huge bad values in hex, they are actually valid values from the gcda file, but it looks like we suddenly jumped back or ahead several locations. So for example, in some cases where it looks like we suddenly jumped ahead in the gcda file, I see that some of the last counter values being read into the array as counters are actually the tag values from just after the counter array in the gcda file. Or in some cases, the counter values are being read mis-aligned by one word, so they look huge because instead of having 0x00000000 in the most significant of the 2 counter words, the 0x00000000 is in the least significant of the 2 counter words. In other words, we jumped some odd number of words ahead (or behind) in the gcda file. If another process is writing into the gcda file at the same time, this could happen. Specifically, since the histogram now included in the gcda file summaries with my patch only write non-zero values, after a merge the size of the histogram, and therefore the size of the summaries, could change. This will cause the starting offsets of different tags/sections throughout the rest of the gcda file to shift. This shouldn't be a problem if the file locking is working though. But if the file locking is not working, then that would explain why suddenly during the read we suddenly seem to jump to a different spot. A couple of other notes: - I added some code after each counter read in the above loop to seek back to the offset where we read the tag, re-read it, and compare it to the tag we read earlier (then re-seek back to the current location in the counter array before continuing the read). Normally this check succeeds, but in the cases where I am hitting the above errors, the "tag" at that location has changed to something that doesn't look like a tag. - Jumping to a different spot should corrupt the reads of the rest of the file. But the main loop in read_counts_file will simply ignore any tag it doesn't understand, and exits when it reads a '0' tag. That's why there was no error being flagged during the read when this was happening. - If something like this was happening on trunk before my patch, it most likely would have been hidden because the size of the summaries was static, and therefore so were the offsets within the file, except in the case when a new summary was being added during the merge (which is not very often). So it is possible that this problem existed prior but was hidden. Any suggestions or ideas? Thanks, Teresa > > Teresa > >> >> -- >> Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email >> ------- You are receiving this mail because: ------- >> You are on the CC list for the bug. > > > > -- > Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Thanks for looking into it. This is a long standing problem. I have seen random profiledbootstrap failures for a long time.
On Tue, Sep 11, 2012 at 10:29 AM, hjl.tools at gmail dot com <gcc-bugzilla@gcc.gnu.org> wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54487 > > --- Comment #17 from H.J. Lu <hjl.tools at gmail dot com> 2012-09-11 17:29:15 UTC --- > Thanks for looking into it. This is a long standing problem. > I have seen random profiledbootstrap failures for a long time. Thanks for confirming that this has happened prior. Unfortunately the addition of the histogram is likely making this more frequent, due to the changing summary sizes after merging. One way to deal with this for now might be to write all histogram entries (even the 0 ones) into the summary to keep the size static. Honza, what do you think? Teresa > > -- > Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are on the CC list for the bug.
How much saving do we get by not writing out the 0 entries? With the proposed change, how less frequent is the problem occuring? David On Tue, Sep 11, 2012 at 10:38 AM, Teresa Johnson <tejohnson@google.com> wrote: > On Tue, Sep 11, 2012 at 10:29 AM, hjl.tools at gmail dot com > <gcc-bugzilla@gcc.gnu.org> wrote: >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54487 >> >> --- Comment #17 from H.J. Lu <hjl.tools at gmail dot com> 2012-09-11 17:29:15 UTC --- >> Thanks for looking into it. This is a long standing problem. >> I have seen random profiledbootstrap failures for a long time. > > Thanks for confirming that this has happened prior. Unfortunately the > addition of the histogram is likely making this more frequent, due to > the changing summary sizes after merging. One way to deal with this > for now might be to write all histogram entries (even the 0 ones) into > the summary to keep the size static. > > Honza, what do you think? > > Teresa > >> >> -- >> Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email >> ------- You are receiving this mail because: ------- >> You are on the CC list for the bug. > > > > -- > Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
On Tue, Sep 11, 2012 at 10:44 AM, Xinliang David Li <davidxl@google.com> wrote: > How much saving do we get by not writing out the 0 entries? With the > proposed change, how less frequent is the problem occuring? Let me get back with some stats. Each histogram entry requires 5 words, and there are a max of 252 entries. In the few cases I checked just now, we were printing about 60 entries per summary, with 3 summaries per gcda file. So printing the whole thing in these cases would require 5*(252-60)*3 = 2880 extra words, or 11520 bytes. Unfortunately, that is a significant increase over the current sizes of those files, which are currently only double or triple that. I also need to verify that changing this would reduce the frequency. A couple other possibilities since this is not frequent: - change the existing error to a warning (as we do under flag_profile_correction) - after finishing reading the counts, re-read the tag as I am doing in my debugging, and if it is no longer valid, throw everything away and re-read the file. - check the counters after reading each one, and if it is > sum_max, ignore it and abort the profile read with a warning but continue compiling. Obviously the best solution would be to figure out how the lock is being lost/ignored and fix that, but that may take some time. Teresa > > David > > On Tue, Sep 11, 2012 at 10:38 AM, Teresa Johnson <tejohnson@google.com> wrote: >> On Tue, Sep 11, 2012 at 10:29 AM, hjl.tools at gmail dot com >> <gcc-bugzilla@gcc.gnu.org> wrote: >>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54487 >>> >>> --- Comment #17 from H.J. Lu <hjl.tools at gmail dot com> 2012-09-11 17:29:15 UTC --- >>> Thanks for looking into it. This is a long standing problem. >>> I have seen random profiledbootstrap failures for a long time. >> >> Thanks for confirming that this has happened prior. Unfortunately the >> addition of the histogram is likely making this more frequent, due to >> the changing summary sizes after merging. One way to deal with this >> for now might be to write all histogram entries (even the 0 ones) into >> the summary to keep the size static. >> >> Honza, what do you think? >> >> Teresa >> >>> >>> -- >>> Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email >>> ------- You are receiving this mail because: ------- >>> You are on the CC list for the bug. >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Assuming the size of histogram for the same file does not vary that much, is it better to round the size to the next power of 2 -- 60 entries will need print out 64 etc? David On Tue, Sep 11, 2012 at 11:04 AM, Teresa Johnson <tejohnson@google.com> wrote: > On Tue, Sep 11, 2012 at 10:44 AM, Xinliang David Li <davidxl@google.com> wrote: >> How much saving do we get by not writing out the 0 entries? With the >> proposed change, how less frequent is the problem occuring? > > Let me get back with some stats. Each histogram entry requires 5 > words, and there are a max of 252 entries. In the few cases I checked > just now, we were printing about 60 entries per summary, with 3 > summaries per gcda file. So printing the whole thing in these cases > would require 5*(252-60)*3 = 2880 extra words, or 11520 bytes. > Unfortunately, that is a significant increase over the current sizes > of those files, which are currently only double or triple that. > > I also need to verify that changing this would reduce the frequency. > > A couple other possibilities since this is not frequent: > - change the existing error to a warning (as we do under > flag_profile_correction) > - after finishing reading the counts, re-read the tag as I am doing in > my debugging, and if it is no longer valid, throw everything away and > re-read the file. > - check the counters after reading each one, and if it is > sum_max, > ignore it and abort the profile read with a warning but continue > compiling. > > Obviously the best solution would be to figure out how the lock is > being lost/ignored and fix that, but that may take some time. > > Teresa > >> >> David >> >> On Tue, Sep 11, 2012 at 10:38 AM, Teresa Johnson <tejohnson@google.com> wrote: >>> On Tue, Sep 11, 2012 at 10:29 AM, hjl.tools at gmail dot com >>> <gcc-bugzilla@gcc.gnu.org> wrote: >>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54487 >>>> >>>> --- Comment #17 from H.J. Lu <hjl.tools at gmail dot com> 2012-09-11 17:29:15 UTC --- >>>> Thanks for looking into it. This is a long standing problem. >>>> I have seen random profiledbootstrap failures for a long time. >>> >>> Thanks for confirming that this has happened prior. Unfortunately the >>> addition of the histogram is likely making this more frequent, due to >>> the changing summary sizes after merging. One way to deal with this >>> for now might be to write all histogram entries (even the 0 ones) into >>> the summary to keep the size static. >>> >>> Honza, what do you think? >>> >>> Teresa >>> >>>> >>>> -- >>>> Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email >>>> ------- You are receiving this mail because: ------- >>>> You are on the CC list for the bug. >>> >>> >>> >>> -- >>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 > > > > -- > Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
(In reply to comment #20) > > Obviously the best solution would be to figure out how the lock is > being lost/ignored and fix that, but that may take some time. > Can we use a lockfile to verify that fcntl lock is working correctly?
gcc/gcov-io.h has: #if defined (HOST_HAS_F_SETLKW) #define GCOV_LOCKED 1 #else #define GCOV_LOCKED 0 #endif But HOST_HAS_F_SETLKW isn't defined anywhere else AFAICS: gcc % git grep HOST_HAS_F_SETLKW gcc/gcov-io.h:#if defined (HOST_HAS_F_SETLKW) gcc %
On Tue, Sep 11, 2012 at 11:14 AM, markus at trippelsdorf dot de <gcc-bugzilla@gcc.gnu.org> wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54487 > > --- Comment #23 from Markus Trippelsdorf <markus at trippelsdorf dot de> 2012-09-11 18:14:52 UTC --- > gcc/gcov-io.h has: > #if defined (HOST_HAS_F_SETLKW) > #define GCOV_LOCKED 1 > #else > #define GCOV_LOCKED 0 > #endif > > But HOST_HAS_F_SETLKW isn't defined anywhere else AFAICS: > gcc % git grep HOST_HAS_F_SETLKW > gcc/gcov-io.h:#if defined (HOST_HAS_F_SETLKW) > gcc % Maybe it is as simple as that?! I thought I saw that GCOV_LOCKED was set for my compile, but that may have been on the libgcov compile. In fact, just above the code Markus shows from gcov-io.h, when IN_LIBGCOV, GCOV_LOCKED is set based on TARGET_POSIX_IO: #if defined (TARGET_POSIX_IO) #define GCOV_LOCKED 1 #else #define GCOV_LOCKED 0 #endif Indeed, when I look at the preprocessed libgcov.c output from its compile command, the GCOV_LOCKED is clearly set (by looking at the preprocessed gcov_open() code). But when I use the compile command for coverage.c, which includes gcov-io.c but is !IN_LIBGCOV (so GCOV_LOCKED is set based on HOST_HAS_F_SETLKW), the preprocessed gcov_open code is that of a !GCOV_LOCKED compile, without the call to fcntl. So perhaps it is just the case that the libgcov code is that writes the gcda files is doing the locking, but the read on profile-use is not! Anyone know how HOST_HAS_F_SETLKW was supposed to be set? I do see that my configure is setting HAVE_FCNTL_H, perhaps that was intended? Teresa > > -- > Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are on the CC list for the bug.
Indeed, seems http://gcc.gnu.org/ml/gcc-patches/2003-05/msg00571.html has introduced use of that macro, but didn't bother to actually define it anywhere.
For the check, I guess you want to check that you can actually compile on host something like: #include <fcntl.h> int main () { struct flock fl; fl.l_whence = 0; fl.l_start = 0; fl.l_len = 0; fl.l_pid = 0; return fcntl (1, F_SETLKW, &fl); }
Thanks for the pointers, Jakub. I'll work on adding this check. Teresa On Tue, Sep 11, 2012 at 12:04 PM, jakub at gcc dot gnu.org <gcc-bugzilla@gcc.gnu.org> wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54487 > > Jakub Jelinek <jakub at gcc dot gnu.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |jakub at gcc dot gnu.org > > --- Comment #26 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-09-11 19:04:47 UTC --- > For the check, I guess you want to check that you can actually compile on host > something like: > #include <fcntl.h> > > int > main () > { > struct flock fl; > fl.l_whence = 0; > fl.l_start = 0; > fl.l_len = 0; > fl.l_pid = 0; > return fcntl (1, F_SETLKW, &fl); > } > > -- > Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are on the CC list for the bug.
Author: tejohnson Date: Thu Sep 13 04:59:14 2012 New Revision: 191238 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=191238 Log: This fixes PR gcov-profile/54487 where the gcda files were not locked by the profile-use read, enabling writes by other instrumented compiles to change the profile in the middle of the profile use read. The GCOV_LOCKED macro was not set because it was guarded by HOST_HAS_F_SETLKW, which was never set. The fix is to add a compile test in the configure to set it. 2012-09-12 Teresa Johnson <tejohnson@google.com> PR gcov-profile/54487 * configure.ac (HOST_HAS_F_SETLKW): Set based on compile test using F_SETLKW with fcntl. * configure, config.in: Regenerate. Modified: trunk/gcc/ChangeLog trunk/gcc/config.in trunk/gcc/configure trunk/gcc/configure.ac
Fixed. Thanks all.
Author: tejohnson Date: Thu Sep 13 13:32:31 2012 New Revision: 191254 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=191254 Log: 2012-09-12 Teresa Johnson <tejohnson@google.com> Backport from mainline. 2012-09-12 Teresa Johnson <tejohnson@google.com> PR gcov-profile/54487 * configure.ac (HOST_HAS_F_SETLKW): Set based on compile test using F_SETLKW with fcntl. * configure, config.in: Regenerate. Modified: branches/gcc-4_7-branch/gcc/ChangeLog branches/gcc-4_7-branch/gcc/config.in branches/gcc-4_7-branch/gcc/configure branches/gcc-4_7-branch/gcc/configure.ac
Huh, 9 years for sucha nonsence is quite impressive. Thanks for tracking it down! :)
Author: tejohnson Date: Fri Sep 14 21:06:49 2012 New Revision: 191312 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=191312 Log: Backport from trunk r190952 to add counter histogram to gcov program summary, and follow-on fixes for PR gcov-profile/54487 (r191074 and r191238). 2012-09-14 Teresa Johnson <tejohnson@google.com> * libgcc/libgcov.c (gcov_histogram_insert): New function. (gcov_compute_histogram): Ditto. (sort_by_reverse_gcov_value): Remove function. (gcov_compute_cutoff_values): Ditto. (gcov_merge_gcda_file): Merge histogram while merging summary. (gcov_gcda_file_size): Include histogram in summary size computation. (gcov_write_gcda_file): Remove assert that is no longer valid. (gcov_exit_init): Invoke gcov_compute_histogram. * gcc/gcov-io.c (gcov_write_summary): Write out non-zero histogram entries to function summary along with an occupancy bit vector. (gcov_read_summary): Read in the histogram entries. (gcov_histo_index): New function. (gcov_histogram_merge): Ditto. * gcc/gcov-io.h (gcov_type_unsigned): New type. (struct gcov_bucket_type): Ditto. (struct gcov_ctr_summary): Include histogram. (GCOV_TAG_SUMMARY_LENGTH): Update to include histogram entries. (GCOV_HISTOGRAM_SIZE): New macro. (GCOV_HISTOGRAM_BITVECTOR_SIZE): Ditto. (gcov_gcda_file_size): New parameter. * gcc/profile.c (NUM_GCOV_WORKING_SETS): Ditto. (gcov_working_sets): New global variable. (compute_working_sets): New function. (find_working_set): Ditto. (get_exec_counts): Invoke compute_working_sets. * gcc/loop-unroll.c (code_size_limit_factor): Call new function find_working_set to obtain working set information. * gcc/coverage.c (read_counts_file): Merge histograms, and fix bug with accessing summary info for non-summable counters. * gcc/basic-block.h (gcov_type_unsigned): New type. (struct gcov_working_set_info): Ditto. (find_working_set): Declare. * gcc/gcov-dump.c (tag_summary): Dump out histogram. * gcc/configure.ac (HOST_HAS_F_SETLKW): Set based on compile test using F_SETLKW with fcntl. * gcc/configure, gcc/config.in: Regenerate. Modified: branches/google/gcc-4_7/gcc/ChangeLog.google-4_7 branches/google/gcc-4_7/gcc/basic-block.h branches/google/gcc-4_7/gcc/config.in branches/google/gcc-4_7/gcc/configure branches/google/gcc-4_7/gcc/configure.ac branches/google/gcc-4_7/gcc/coverage.c branches/google/gcc-4_7/gcc/gcov-dump.c branches/google/gcc-4_7/gcc/gcov-io.c branches/google/gcc-4_7/gcc/gcov-io.h branches/google/gcc-4_7/gcc/loop-unroll.c branches/google/gcc-4_7/gcc/profile.c branches/google/gcc-4_7/libgcc/ChangeLog.google-4_7 branches/google/gcc-4_7/libgcc/libgcov.c