Bug 78822 - [cleanup] replace static char buffers by build_message_string
Summary: [cleanup] replace static char buffers by build_message_string
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: fortran (show other bugs)
Version: 7.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-12-15 17:50 UTC by janus
Modified: 2017-09-18 18:16 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-12-15 00:00:00


Attachments
draft patch (5.78 KB, patch)
2016-12-16 10:48 UTC, janus
Details | Diff
patch using build_message_string (5.47 KB, patch)
2016-12-16 15:03 UTC, janus
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description janus 2016-12-15 17:50:47 UTC
Several functions in interface.c use static-size char buffers for constructing error messages, e.g.:

* gfc_compare_interfaces
* gfc_check_result_characteristics
* gfc_check_dummy_characteristics#
* ...

It would be more efficient to replace those by std::string, in order to allocate the memory dynamically as needed (and only if needed at all).
Comment 1 Jakub Jelinek 2016-12-16 06:34:43 UTC
You would need to make sure it uses a xmalloc based allocator first or at least calls xmalloc_failed upon allocation failure, otherwise it will be a serious regression.
Comment 2 janus 2016-12-16 09:41:22 UTC
(In reply to Jakub Jelinek from comment #1)
> You would need to make sure it uses a xmalloc based allocator first or at
> least calls xmalloc_failed upon allocation failure, otherwise it will be a
> serious regression.

Could you please elaborate in which respect it would be a regression?

To be honest I'm not extremely familiar with the memory management principles in GCC. Basically any dynamical allocation of memory should use xmalloc instead of the standard malloc, right? (Could you remind me why that is so?)

Apparently std::string is not used in the Fortran front end up to now, but indeed it is being used in other parts of GCC (e.g. it appears extensively in the Go front end). Doesn't your comment apply there, or are they actually doing what you describe?
Comment 3 Jakub Jelinek 2016-12-16 09:47:09 UTC
I have no idea what the Go FE is doing, in the middle-end there are indeed 3-4 spots that use std::string in one file, but I'd just call it a mistake.
The reason for ensuring xmalloc_failed is used is to get sane and consistent behavior on memory allocation failures (same error message with the same detail), rather than just crashing the compiler, or saying terminate has been called and similar.  I don't know what exactly you want to clean up with std::string, if all you want is just a heap concatenation of strings, we have also the concat function.  Of course not in all places a heap allocation is desirable, but at that point std::string which will always do a heap allocation is not desirable either.
Comment 4 janus 2016-12-16 10:44:49 UTC
(In reply to Jakub Jelinek from comment #3)
> I don't know what exactly you want to clean up with std::string

Well, my main motivation was to get rid of some statically-sized buffers (used for error messages, typically 200 bytes large), which are either
* never used (because no errors occurred), thus wasting memory, or
* they might as well be too small for the error message (I think currently they are sufficient for typical cases, but one can construct cases where they are not).

std::string seemed like a convenient tool here, but if you say it's a bad idea, there are other ways to go, I guess.

(Btw, how would I actually make std::string use a xmalloc-based allocator?)
Comment 5 janus 2016-12-16 10:48:34 UTC
Created attachment 40346 [details]
draft patch

Here is a draft patch that implements what I had in mind (regtests cleanly).
Comment 6 Jakub Jelinek 2016-12-16 11:10:59 UTC
Comment on attachment 40346 [details]
draft patch

Dunno, you can perhaps try it (put a breakpoint somewhere before you construct the std::string, when you reach it, put a breakpoint on malloc and when you reach it, change the return value from it to NULL and see what happens.

Also, including STL headers after gcc headers is highly problematic (mainly because of all the poisoning in system.h).

E.g. the
+      ss << "INTENT mismatch in argument '" << s1->name << "'";
+      errmsg = ss.str();
stuff could be easily written as:
       errmsg = concat ("INTENT mismatch in argument '", s1->name, "'", NULL);
and just free it after use, but for the %i you'd have to write some helper routines (e.g. if you ever use at most two %i, then you could have 2 routines
which just snprintf %i into a smallish static buffer (3 * sizeof (int) should be good enough) and return address of that buffer, or one routine that has extra argument which says which of the two buffers you want).
Comment 7 janus 2016-12-16 11:54:35 UTC
(In reply to Jakub Jelinek from comment #6)
> Dunno, you can perhaps try it (put a breakpoint somewhere before you
> construct the std::string, when you reach it, put a breakpoint on malloc and
> when you reach it, change the return value from it to NULL and see what
> happens.

So that would tell me what happens in the rare event that malloc returns NULL because it cannot allocate the memory I need (i.e. something that xmalloc does not do), right? What would I learn from that?


> Also, including STL headers after gcc headers is highly problematic (mainly
> because of all the poisoning in system.h).

Ouch, ok. So if I include STL first, I'm fine?

OTOH, it looks like the 'poisoning' in system.h redefines malloc:

#define malloc xmalloc

I guess that is not sufficient to make the STL use xmalloc instead of malloc, right?

 
> E.g. the
> +      ss << "INTENT mismatch in argument '" << s1->name << "'";
> +      errmsg = ss.str();
> stuff could be easily written as:
>        errmsg = concat ("INTENT mismatch in argument '", s1->name, "'",
> NULL);
> and just free it after use, but for the %i you'd have to write some helper
> routines (e.g. if you ever use at most two %i, then you could have 2 routines
> which just snprintf %i into a smallish static buffer (3 * sizeof (int)
> should be good enough) and return address of that buffer, or one routine
> that has extra argument which says which of the two buffers you want).

Well, that is essentially why I said that std::string is convenient. I really don't wanna reimplement all of std::string and std::stringstream here, and I don't think anyone would wanna do that.
Comment 8 janus 2016-12-16 12:00:39 UTC
In general, doesn't this whole memory issue severely limit the usefulness of C++ as an implementation language for GCC? It seems to imply that we cannot use *any* STL containers? Or do I misunderstand something here?

Again, in the go FE, I see not only extensive usage of std::string, but also std::vector etc. I'd like to know why that is ok in the go FE, but not in gfortran?
Comment 9 Jakub Jelinek 2016-12-16 12:08:06 UTC
(In reply to janus from comment #8)
> In general, doesn't this whole memory issue severely limit the usefulness of
> C++ as an implementation language for GCC? It seems to imply that we cannot
> use *any* STL containers? Or do I misunderstand something here?

C++ can be useful even without the STL containers.  Though I'm certainly not one of the proponents of conversion to C++.

> Again, in the go FE, I see not only extensive usage of std::string, but also
> std::vector etc. I'd like to know why that is ok in the go FE, but not in
> gfortran?

E.g. for std::vector we have an alternative, vec.h, so it is better not to use std::vector, but vec, for consistency.  About the allocation, I don't know if somebody hasn't done anything about it, best discuss this on the mailing list rather than in the PR where others can more easily see it and comment on it.
Comment 10 janus 2016-12-16 12:23:01 UTC
(In reply to Jakub Jelinek from comment #9)
> best discuss this on the
> mailing list rather than in the PR where others can more easily see it and
> comment on it.

Agreed. Will do.
Comment 11 Jakub Jelinek 2016-12-16 12:27:09 UTC
BTW, yet another option (and I'd say much more readable) is just using
build_message_string function, so
	  snprintf (errmsg, err_len, "Rank mismatch in argument '%s' (%i/%i)",
		    s1->name, symbol_rank (s1), symbol_rank (s2));
would be converted to
          *errmsg
            = build_message_string ("Rank mismatch in argument '%s' (%i/%i)",
                                    s1->name, symbol_rank (s1), symbol_rank (s2));
etc.
The caller would need to free it afterwards.  Note, the Fortran FE already uses this function as well.
Comment 12 Jakub Jelinek 2016-12-16 12:32:49 UTC
Oh, and yet another thing, if these messages is something you pass to the user, then the fact that you mix up the diagnostic text with expanded arguments precludes translation.  With using std::stringstream etc. I'm afraid it is out of luck to translate it, with build_message_string you'd need to wrap the format strings you want to translate into _(...), or add a wrapper for build_message_string that would have a *gmsgid called first argument and would invoke _(gmsgid) on it before passing it over to build_message_string (and verify exgettext picks those up properly).
Comment 13 janus 2016-12-16 12:41:27 UTC
(In reply to Jakub Jelinek from comment #11)
> BTW, yet another option (and I'd say much more readable) is just using
> build_message_string function

Yes, I think that's a good suggestion. Thanks for all your comments.
Comment 14 janus 2016-12-16 15:03:43 UTC
Created attachment 40350 [details]
patch using build_message_string

So here is another patch, similar to the one in comment 5, but using build_message_string instead of std::string.

Apparently this works pretty nicely as well (at least it regtests cleanly). Regarding readability, I'd say it's a matter of taste (and what you're used to).

One advantage of the std::string version is obviously that it cleans up after itself and one doesn't need to free up things manually. But in general I'm still not sure how well it plays along with GCC's ideas of memory management.
Comment 15 Jakub Jelinek 2016-12-16 15:15:35 UTC
Comment on attachment 40350 [details]
patch using build_message_string

Some comments:
1) this still doesn't solve the translations issue in many places (gfc_compare_interfaces etc.); in some places you use _(...) already because it used to be there before as well; note as I said earlier, you can either always use it explicitly, or add build_message_string/xasvprintf wrapper that will automatically mark its argument for translation and translate it
2) if in the different functions that return bool you always return true if you set *errmsg and return false if you don't set it (or vice versa), then you could
change those functions to return const char * and in the caller test for == NULL (meaning success) and != NULL (return value is the error string to emit and free afterwards.
3)
+      *errmsg = build_message_string ("CONTIGUOUS attribute mismatch in "
 		"function result");
the formatting looks wrong, "function should be below "CONTIGUOUS.
Comment 16 janus 2016-12-16 15:28:43 UTC
(In reply to Jakub Jelinek from comment #15) 
> Some comments:

Thanks!


> 1) this still doesn't solve the translations issue in many places
> (gfc_compare_interfaces etc.);

True. I don't wanna open up too many building sites at once, therefore I'm sticking with the status quo here. It seems there are several inconsistencies in this area, so maybe we want a follow-up PR for that.


> 2) if in the different functions that return bool you always return true if
> you set *errmsg and return false if you don't set it (or vice versa), then
> you could
> change those functions to return const char * and in the caller test for ==
> NULL (meaning success) and != NULL (return value is the error string to emit
> and free afterwards.

Yes, might be a good idea.


> 3)
> +      *errmsg = build_message_string ("CONTIGUOUS attribute mismatch in "
>  		"function result");
> the formatting looks wrong, "function should be below "CONTIGUOUS.

Right. Will fix.
Comment 17 Dominique d'Humieres 2016-12-16 15:33:57 UTC
There are several PRs open for translation: pr52274 and links.
Comment 18 janus 2016-12-16 15:37:13 UTC
(In reply to Dominique d'Humieres from comment #17)
> There are several PRs open for translation: pr52274 and links.

Thanks of the hint. I think in particular PR 38573 is relevant for our discussion here.
Comment 19 Pedro Alves 2016-12-16 17:22:05 UTC
> The reason for ensuring xmalloc_failed is used is to get sane and consistent 
> behavior on memory allocation failures (same error message with the same 
> detail), rather than just crashing the compiler, or saying terminate has been 
> called and similar.

You can replace the global operator new/new[] to call xmalloc instead of malloc.  Then memory allocation by std::string etc. ends up in xmalloc -> xmalloc_failed.  That's what I did for GDB:

  https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gdb/common/new-op.c;h=c67239cbe87c1f7e22e234a1a2cc2920f9ed22a4;hb=HEAD

Replacing global new is better than installing a new_handler that calls xmalloc_failed with std::set_new_handler, IMO, since replacing global new handles allocation in global static constructors too.
OTOH, dynamic allocation in global objects is best avoided anyway, so you can always go the std::set_new_handler route if replacing global new doesn't work on some non-confirming host.
Comment 20 Pedro Alves 2016-12-16 17:30:34 UTC
And in addition, since GCC is already using new/new[] to heap allocate its own classes, GCC is _already_ inconsistent with allocation failure -- if one of those currently fails, you'll end up with a C++ exception that nobody is catching, so GCC dies a horrible death.  Replacing operator new of course would handle those too.
Comment 21 janus 2016-12-17 11:07:23 UTC
Given the amount of discussion about std::string (and resistance against it), I will for now go with using build_message_string (thus changing PR title) ...
Comment 22 Jerry DeLisle 2016-12-17 16:54:06 UTC
(In reply to janus from comment #0)
> Several functions in interface.c use static-size char buffers for
> constructing error messages, e.g.:
> 
> * gfc_compare_interfaces
> * gfc_check_result_characteristics
> * gfc_check_dummy_characteristics#
> * ...
> 
> It would be more efficient to replace those by std::string, in order to
> allocate the memory dynamically as needed (and only if needed at all).

Efficient in what way? Memory use? Insignificant. Performance? No.

I suggest leaving this alone and close it as dont bother. If its not broke don't fix it. Just my 2 cents worth.