This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH 4/4] c-format.c: suggest the correct format string to use (PR c/64955)

From: David Malcolm <dmalcolm at redhat dot com>
To: Jeff Law <law at redhat dot com>, gcc-patches at gcc dot gnu dot org
Cc: Joseph Myers <joseph at codesourcery dot com>
Date: Thu, 04 Aug 2016 17:06:49 -0400
Subject: Re: [PATCH 4/4] c-format.c: suggest the correct format string to use (PR c/64955)
Authentication-results: sourceware.org; auth=none
References: <1469841382.17384.65.camel@redhat.com> <1470239113-42666-1-git-send-email-dmalcolm@redhat.com> <1470239113-42666-4-git-send-email-dmalcolm@redhat.com> <f5319406-64a7-76aa-836d-1121dfbe204e@redhat.com>

On Thu, 2016-08-04 at 13:55 -0600, Jeff Law wrote:
> On 08/03/2016 09:45 AM, David Malcolm wrote:
> > This adds fix-it hints to c-format.c so that it can (sometimes)
> > suggest
> > the format string the user should have used.
> > 
> > The patch adds selftests for the new code in c-format.c.  These
> > selftests are thus lang-specific.  This is the first time we've had
> > lang-specific selftests, and hence the patch also adds a langhook
> > for
> > running them.  (Note that currently the Makefile only invokes the
> > selftests for cc1).
> > 
> > Successfully bootstrapped&regrtested in conjunction with the rest
> > of the
> > patch kit on x86_64-pc-linux-gnu.
> > 
> > (The v2 version of the patch had a successful selftest run for
> > stage 1 on
> > powerpc-ibm-aix7.1.3.0 (gcc111) in conjunction with the rest of the
> > patch
> > kit, and a successful build of stage1 for all targets via config
> > -list.mk;
> > the patch has only been rebased since)
> > 
> > OK for trunk if it passes testing?
> > 
> > gcc/c-family/ChangeLog:
> > 	PR c/64955
> > 	* c-common.h (selftest::c_format_c_tests): New declaration.
> > 	(selftest::run_c_tests): New declaration.
> > 	* c-format.c: Include "selftest.h.
> > 	(format_warning_va): Add param "corrected_substring" and use
> > 	it to add a replacement fix-it hint.
> > 	(format_warning_at_substring): Likewise.
> > 	(format_warning_at_char): Update for new param of
> > 	format_warning_va.
> > 	(check_format_info_main): Pass "fki" to check_format_types.
> > 	(check_format_types): Add param "fki" and pass it to
> > 	format_type_warning.
> > 	(deref_n_times): New function.
> > 	(get_modifier_for_format_len): New function.
> > 	(selftest::test_get_modifier_for_format_len): New function.
> > 	(get_format_for_type): New function.
> > 	(format_type_warning): Add param "fki" and use it to attempt
> > 	to provide hints for argument types when calling
> > 	format_warning_at_substring.
> > 	(selftest::get_info): New function.
> > 	(selftest::assert_format_for_type_streq): New function.
> > 	(ASSERT_FORMAT_FOR_TYPE_STREQ): New macro.
> > 	(selftest::test_get_format_for_type_printf): New function.
> > 	(selftest::test_get_format_for_type_scanf): New function.
> > 	(selftest::c_format_c_tests): New function.
> > 
> > gcc/c/ChangeLog:
> > 	PR c/64955
> > 	* c-lang.c (LANG_HOOKS_RUN_LANG_SELFTESTS): If CHECKING_P, wire
> > 	this up to selftest::run_c_tests.
> > 	(selftest::run_c_tests): New function.
> > 
> > gcc/ChangeLog:
> > 	PR c/64955
> > 	* langhooks-def.h (LANG_HOOKS_RUN_LANG_SELFTESTS): New default
> > 	do-nothing langhook.
> > 	(LANG_HOOKS_INITIALIZER): Add LANG_HOOKS_RUN_LANG_SELFTESTS.
> > 	* langhooks.h (struct lang_hooks): Add run_lang_selftests.
> > 	* selftest-run-tests.c: Include "tree.h" and "langhooks.h".
> > 	(selftest::run_tests): Call lang_hooks.run_lang_selftests.
> > 
> > gcc/testsuite/ChangeLog:
> > 	PR c/64955
> > 	* gcc.dg/format/diagnostic-ranges.c: Add fix-it hints to
> > expected
> > 	output.
> So presumably we always use the type of the argument as the "correct"
> type and assume the format string is what needs to be fixed (with the
> exception of getting the right amount of *s handled).  That seems 
> intuitively the right thing to do, but do we have a hit rate better
> than 
> 50% in practice?
> 
> This is OK.  I'm really just curious about your thoughts/experiences
> on 
> the heuristics.

Our current behavior is to emit a warning of the form:

test.c:9:18: warning: format ‘%i’ expects argument of type ‘int’, but
argument 2 has type ‘const char *’ [-Wformat=]
   printf("hello %i", msg);
                  ^

Note that the text of the diagnostic tells the user the two types
involved within the message.

My experience with printf-style issues of this form is that I know want
expression I want to print, but I don't necessarily know exactly
whether it's say, an int vs a long: it's not that I want to print "an
int", it's that I wanted to print some specific expression.  Hence 
 (assuming this experience is typical) for mismatches of printf-style
calls, I think it's most helpful to the user to tell the user what
format code they need for the expression they want to print, which the
patch does, via the fix-it:

test.c:9:18: warning: format ‘%i’ expects argument of type ‘int’, but
argument 2 has type ‘const char *’ [-Wformat=]
   printf("hello %i",
msg);
                 ~^
                 %s

Note how the text of the diagnostic is unchanged; it's just the fixit that's new (and the underline, which is patch 3 of the kit).

For scanf-style calls this argument may not be so strong, but I think it still holds for the "do I have an int or a long?" cases (giving int * vs long *).   It would be nice for scanf to detect the "you forgot to put an & in front of the destination lvalue" case and offer a fix-it hint for it, but I think that's a followup.

So I think it's likely to be much better than 50%, but that's a gut feeling, based on the above arguments.

I'm not aware of anyone who's done formal usability testing of a compiler's diagnostics.  I think there are two distinct types of activity:
(a) the edit->try to compile->edit->try to compile -> "it compiles!" cycle (nested within a debug cycle)
(b) compiling pre-existing code, perhaps with a different configuration than it was written on, often written by someone else

I'd be very interested in seeing usability studies of both aspects of a compiler, but in particular of (a): is there a published corpus somewhere of the kind of half-written non-compiling code that happens during (a) out there?  (I know this invites various sarcastic responses, but I'm serious :) )

(For myself, I attempt to run with a relatively recent build of gcc trunk as my day-to-day compiler, and if I see a diagnostic that could be improved whilst I'm hacking on something I make a note of it, and try to come back to it later as an RFE; bug reports of this form are most welcome).

Dave

References:
- [PATCH 1/4] selftest.h: Add ASSERT_TRUE_AT and ASSERT_FALSE_AT
  - From: David Malcolm
- [PATCH 4/4] c-format.c: suggest the correct format string to use (PR c/64955)
  - From: David Malcolm
- Re: [PATCH 4/4] c-format.c: suggest the correct format string to use (PR c/64955)
  - From: Jeff Law

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]