Bug 77488 - Proposal for __FILENAME_ONLY__
Summary: Proposal for __FILENAME_ONLY__
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: preprocessor (show other bugs)
Version: unknown
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-09-05 13:48 UTC by R. Diez
Modified: 2020-04-25 21:16 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description R. Diez 2016-09-05 13:48:53 UTC
Hi all:

I am writing embedded software for a memory-constraint embedded target. However, I would still like to use assert() in debug builds to help debug the software.

The assert() macro includes in the final binary a lot of text, like the source code filename, the function name and assert expression. So much text is blowing my memory budget.

This is newlib's definition of assert:

# define assert(__e) ((__e) ? (void)0 : __assert_func (__FILE__, __LINE__, \
						       __ASSERT_FUNC, #__e))

Most of the time, I only need a filename and a line number, so I wrote a small patch that builds my toolchain without the rest. The patch is here:

https://github.com/rdiez/JtagDue/blob/master/Toolchain/NewlibSmallerAssert.patch

The trouble is, the only built-in symbol that yields the filename of a source file is __FILE__, which includes the full path. But the full path is often too long, and its length varies depending on where the software is built, so the generated binary may or may not fit in the target's program memory depending on the source code's path length at compilation time.

This is not obvious and rather annoying. For example, when the overnight build suddently blows the flash memory (program memory) budget, it is not obvious the reason is that the source path on the server is longer than on the developer's PC.

A related question has been asked before:

  __FILE__ macro shows full path
  https://stackoverflow.com/questions/8487986/file-macro-shows-full-path

I cannot easily control the path depth, because  I am using the autotools in my project in order to generate the makefiles.

In any case, the advise regarding filenames on makefiles and so on is often that you should be using absolute paths in order to avoid surprises.

The following suggested solution does not work for me either:

  #define __FILENAME__ (strrchr(__FILE__, '/') ? strrchr(__FILE__, '/') + 1 : __FILE__)

Function strrchr() is recognised as a GCC intrinsic, but it is not optimised away at compilation time like the strlen() case. That alone could be an improvement to GCC.

A new predefined symbol like __FILENAME_ONLY__, which should yield __FILE__ without the full path, would be the most comfortable solution for me.
Sure, some filenames are going to have the same name, but it is fairly easy to tell which assert failed from a filename and a line number even if 2 or 3 source files happen to have the same name.

A more advanced solution would be to have predefined symbol like __FILENAME_WITHOUT_PREFIX__( filename, prefix_to_remove ), but I do not think that this is worth the trouble.

I have seen source code bases where each .cpp file got assigned a textual ID or a number manually, so that the strings passed to assert() are always of a determined length, but this is a pain to maintain. Every time you add a new source file, you need to update the ID table, which tends to trigger a recompilation of all files. But maybe the compiler could automatically assign an ID per file and then generate a map file with those IDs for later look-up.

Or maybe I could write an __assert_func() that prints the __builtin_return_address() instead of a source filename, so that I can manually look-up the function's name in the generated linker .map file.

However, I have to enable some optimisation even on debug builds for my memory-constrained targets, and I wonder if even minor optimisations could render those addresses meaningless for the purposes of correlating to a source code line.

Any other ideas?

Thanks in advance,
  rdiez
Comment 1 Jakub Jelinek 2016-09-05 15:35:42 UTC
__FILE__ expands to whatever you pass on the command line as the base file (and for headers whatever the directory is where they are found + the header filename.  So, if you compile with gcc -c something.c -I subdir/ , then __FILE__ will be "something.c" or e.g. "subdir/something.h".  If you use absolute filenames, that is what you get.  So, if you care about the lengths of these, just tweak your Makefiles so that you do the right thing (compile in the directory with the sources, with -o to the directory of object files if different), rather than compiling in the object directory, with absolute filenames as sources.  Adding yet another __FILE__-like builtin preprocessor constant is IMHO undesirable, especially in the spelling you're proposing.

As for strrchr folding, I see it folded properly, e.g. __builtin_strrchr (__FILE__, '/') is optimized into "foo/bar/baz.c" + 7.  Optimizing it into
"baz.c" would be incorrect, you can do later on with the returned pointer e.g. ptr -= 4; etc., which would be undefined behavior if everything before the last / is removed from the string literal.
Comment 2 R. Diez 2016-09-06 12:58:54 UTC
> __FILE__ expands to whatever you pass on the command line as the base file
> (and for headers whatever the directory is where they are found + the header
> filename.  So, if you compile with gcc -c something.c -I subdir/ , then
> __FILE__ will be "something.c" or e.g. "subdir/something.h".  If you use
> absolute filenames, that is what you get.  So, if you care about the lengths
> of these, just tweak your Makefiles so that you do the right thing (compile
> in the directory with the sources, with -o to the directory of object files
> if different), rather than compiling in the object directory, with absolute
> filenames as sources.  Adding yet another __FILE__-like builtin preprocessor
> constant is IMHO undesirable,

I already wrote that I cannot easily control the path depth, because  I am using the autotools in my project in order to generate the makefiles. Some libraries come with their own (complex) makefiles or build systems, so your advice is not practicable in real life. Moreover, the GCC toolchain itself decides where some of the include files lie and how their paths look.

I also said that using absolute paths in the makefiles may be desirable for other reasons. I am concerned that the binary easily depends on the source code and toolchain location, which makes it hard to get a reproducible build. With __FILE__, there is no practicable solution.


> especially in the spelling you're proposing.

I do not actually mind about the spelling. I am just looking for some workable way (in real life) to include filename (source code position) information in assert messages that does not depend on full paths (and enables easily reproducible builds).


> As for strrchr folding, I see it folded properly, e.g. __builtin_strrchr
> (__FILE__, '/') is optimized into "foo/bar/baz.c" + 7.  Optimizing it into
> "baz.c" would be incorrect, you can do later on with the returned pointer
> e.g. ptr -= 4; etc., which would be undefined behavior if everything before
> the last / is removed from the string literal.

OK, now I understand the issue with strrchr(), thanks.
Comment 3 Martin Sebor 2016-09-06 17:17:40 UTC
Perhaps it might be possible to generalize the enhancement request to a space optimization for GCC to store only the trailing part of strings that are ever referenced in the program.  For instance, since in the file below only the range [s + 5, s + 10] is ever used it should be possible to store just "56789" instead of all of "0123456789" (this would only be possible if all computations involving the literal were known and would have to be avoided if a pointer to the literal escaped to an external function that could derive from it a pointer to the initial substring).

static const char* const s = "0123456789" + 5;

void f (void)
{
  __builtin_printf ("%s\n", s);
}
Comment 4 Jakub Jelinek 2016-09-06 17:23:49 UTC
(In reply to Martin Sebor from comment #3)
> Perhaps it might be possible to generalize the enhancement request to a
> space optimization for GCC to store only the trailing part of strings that
> are ever referenced in the program.  For instance, since in the file below
> only the range [s + 5, s + 10] is ever used it should be possible to store
> just "56789" instead of all of "0123456789" (this would only be possible if
> all computations involving the literal were known and would have to be
> avoided if a pointer to the literal escaped to an external function that
> could derive from it a pointer to the initial substring).
> 
> static const char* const s = "0123456789" + 5;
> 
> void f (void)
> {
>   __builtin_printf ("%s\n", s);
> }

How would that help here?  You obviously pass the address of the string literal to the __assert_func or how is the assertion passed, thus it escapes.

Though, especially when using autoconf/automake, it really is very easy, not hard, to tweak the Makefiles to use short paths.  And, in many cases, the basename of source files isn't unique and sufficient to recognize where the problem is.  E.g. glibc ships with tons of headers like <string.h>, <bits/string.h>, <elf.h>, <sys/elf.h> etc.
Comment 5 Martin Sebor 2016-09-06 17:35:53 UTC
> How would that help here?  You obviously pass the address of the string
> literal to the __assert_func or how is the assertion passed, thus it escapes.

With the optimization in place the assert macro could be defined similarly to the one below.  __builtin_printf is known not to escape string arguments of its %s directives (or form pointers prior to their beginning).

#define FILE  (__builtin_strrchr (__FILE__, '/') ? __builtin_strrchr (__FILE__, '/') + 1 : __FILE__)

#define assert(e) ((e) ? (void)0 : (void)(__builtin_printf ("%s:%i: assertion %s failed\n", FILE, __LINE__, #e), __builtin_abort ()))

To make the optimization more generally applicable (so that it could work with arbitrary functions including __assert_func) a new attribute might need to be introduced to tell GCC about the same guarantee.
Comment 6 Piotr Henryk Dabrowski 2018-08-17 12:26:17 UTC
You can use:

#line 2 "FileName.cpp"

at the very top (!) of all your files
to change the content of __FILE__.
This also affects compiler messages.
Comment 7 R. Diez 2018-08-17 13:08:15 UTC
(In reply to Piotr Henryk Dabrowski from comment #6)
> You can use:
> 
> #line 2 "FileName.cpp"
> 
> at the very top (!) of all your files
> to change the content of __FILE__.
> This also affects compiler messages.

I do not want to override __FILE__. Its original content may be needed for something else. I just want an alternative to generate smaller asserts.

Besides, maintaining such a "#line" hack in all files is uncomfortable. I already mentioned that I am including other libraries with their own source code and build systems. I need a way to tweak the assert definition for all of them in Newlib. Otherwise, I have to patch all components everywhere.
Comment 8 Fangrui Song 2020-04-25 21:16:44 UTC
Clang since version 9 supports `__FILE_NAME__` (basename) as an extension https://reviews.llvm.org/D61756

I don't know whether it has been proposed on WG14 or WG21 mailing lists, though (seems not).