Bug 56549 - #pragma once ineffective with BOM in include file
Summary: #pragma once ineffective with BOM in include file
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: preprocessor (show other bugs)
Version: 4.6.3
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: compile-time-hog
: 49837 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-03-06 09:47 UTC by Yuri Khan
Modified: 2021-09-03 06:45 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
Minimal example demonstrating the problem (495 bytes, application/zip)
2013-03-06 09:47 UTC, Yuri Khan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yuri Khan 2013-03-06 09:47:16 UTC
Created attachment 29594 [details]
Minimal example demonstrating the problem

When an include file starts with a UTF-8 byte order mark and is included in a precompiled header, it is not protected against repeated inclusion.

Environment:
 * Ubuntu 12.04 amd64
 * GCC 4.6.3 from Ubuntu repo

To reproduce:

$ unzip pragma-once-bom.zip
$ make

The contents of the archive are listed below for convenience of discussion.

Foo.h:
===
<BOM>#pragma once

struct Foo {};
===

stable.h:
===
#include "Foo.h"
===

main.cpp:
===
#include "stable.h"

int main() { Foo foo; }
===

Makefile:
===
all: main.o

pragma-once.gch/c++: stable.h Foo.h
	@test -d pragma-once.gch || mkdir pragma-once.gch
	g++ -x c++-header -c stable.h -o pragma-once.gch/c++

main.o: main.cpp stable.h Foo.h pragma-once.gch/c++
	g++ -c -include pragma-once -o main.o main.cpp

clean:
	@rm -rf pragma-once.gch main.o
===

Expected behavior:

===
$ make
g++ -x c++-header -c stable.h -o pragma-once.gch/c++
g++ -c -include pragma-once -o main.o main.cpp
===

Observed behavior:

===
$ make
g++ -x c++-header -c stable.h -o pragma-once.gch/c++
g++ -c -include pragma-once -o main.o main.cpp
In file included from stable.h:1:0,
                 from main.cpp:1:
Foo.h:3:8: error: redefinition of ‘struct Foo’
Foo.h:3:8: error: previous definition of ‘struct Foo’
make: *** [main.o] Error 1
===

Workarounds/observations:

* If the header file does not contain a BOM, the problem does not occur.
* With include guards instead of #pragma once, the problem does not occur.
* In a real project, changing an #include <FooLib/Foo.h> to #include <FooLib/./Foo.h> in the precompiled header stable.h, or changing an #include <FooLib/Foo.h> to #include "Foo.h" in a different header included by stable.h, also fixes the immediate problem.
Comment 1 Yuri Khan 2013-03-06 09:52:01 UTC
Also reproduced on 4.7.2.
Comment 2 Danil Ilinykh 2013-09-25 05:22:14 UTC
*** Bug 49837 has been marked as a duplicate of this bug. ***
Comment 3 Tom Tanner 2015-08-10 20:46:41 UTC
confirmed with g++ 4.2.1, 4.7.4, 4.8.5 and 4.9.3
Comment 4 leonard gerard 2019-01-14 02:59:30 UTC
Confirmed with g++ (Ubuntu 7.3.0-27ubuntu1~18.04)
Comment 5 xaizek 2020-06-21 17:34:42 UTC
The bug is still present on current master of GCC 11 (8ee2640bfdc62f835ec9740278f948034bc7d9f1).
Comment 6 Julien Ruffin 2020-11-09 17:45:17 UTC
I have been having the same issue with GCC 9.2.0 for a while and ended up finding the cause of this error. It can be traced back to function _cpp_save_file_entries in gcc/libcpp/files.c.

Short explanation: the function saves the sizes and MD5 checksums of files without any encoding conversion or BOM removal into the PCH's file list, even though it should.

Long explanation: the function fills the PCH's files list which contains, among other things, the sizes and MD5 checksums of all files in the PCH. Later, when using the PCH, the compiler compares the files it loads with the files in that list. If it finds an entry with the same size and checksum as the loaded file, it is in the PCH and the compiler skips processing it: see check_file_against_entries for the implementation, also in files.c.

The issue here is that the matching never succeeds for headers that contain a BOM. The PCH entry is always 3 Bytes longer than the file loaded by the compiler and the checksums always differ. The following code in _cpp_save_file_entries is why:

      if (f->buffer_valid)
	md5_buffer ((const char *)f->buffer,
		    f->st.st_size, result->entries[count].sum);
      else
	{
	  FILE *ff;
	  int oldfd = f->fd;

	  if (!open_file (f))
	    {
	      open_file_failed (pfile, f, 0, 0);
	      free (result);
	      return false;
	    }
	  ff = fdopen (f->fd, "rb");
	  md5_stream (ff, result->entries[count].sum);
	  fclose (ff);
	  f->fd = oldfd;
	}
      result->entries[count].size = f->st.st_size;

libcpp caches the contents of the files it reads into their own buffers, here f->buffer. The read_file function implements this loading and converts the file's encoding on the fly with _cpp_convert_input. *This conversion strips the BOM,* so the contents of f->buffer differ from those of the file whenever a BOM is used.

If f->buffer_valid is not true, which seems to always be the case in the code above as far as I could test it, the function reopens the file by hand and computes the MD5 checksum directly from it, without any conversion. open_file() also overwrites the data size in f->st.st_size with the size of the unconverted file. That is why the checksum and size of the unconverted file end up in the PCH's file list.

The compiler later compares those with the files it loads through read_files. There never is a match because the checksums and sizes differ and the compiler thinks it it has loaded a different file, so it processes the header with the BOM a second time and the error we have been observing happens.

I have managed to solve this issue by replacing the manual loading of the unconverted file in the else block above with a loading through read_file, yielding the converted buffer and the correct size and, in the end, the correct checksum. I do not have a patch to offer yet for various reasons but my amateurish attempt at a fix made me able to build a large C++ code base successfully with precompiled headers, so it is rather encouraging.

Somebody with more experience in the preprocessor might want to take a look at this.
Comment 7 Julien Ruffin 2020-11-30 12:51:38 UTC
Here is the patch for the current master. I have tested it on large C++ code bases. So far, it builds successfully and significantly faster.

diff --git a/libcpp/files.c b/libcpp/files.c
index 301b2379a23..cbc2b0f4540 100644
--- a/libcpp/files.c
+++ b/libcpp/files.c
@@ -1978,25 +1978,28 @@ _cpp_save_file_entries (cpp_reader *pfile, FILE *fp)
       result->entries[count].once_only = f->once_only;
       /* |= is avoided in the next line because of an HP C compiler bug */
       result->have_once_only = result->have_once_only | f->once_only;
+
       if (f->buffer_valid)
-	md5_buffer ((const char *)f->buffer,
-		    f->st.st_size, result->entries[count].sum);
+        {
+          md5_buffer ((const char *)f->buffer,
+                      f->st.st_size, result->entries[count].sum);
+        }
       else
-	{
-	  FILE *ff;
-	  int oldfd = f->fd;
-
-	  if (!open_file (f))
-	    {
-	      open_file_failed (pfile, f, 0, 0);
-	      free (result);
-	      return false;
-	    }
-	  ff = fdopen (f->fd, "rb");
-	  md5_stream (ff, result->entries[count].sum);
-	  fclose (ff);
-	  f->fd = oldfd;
-	}
+        {
+          if (!read_file (pfile, f, 0))
+            {
+              return false;
+            }
+
+          md5_buffer ((const char *)f->buffer,
+                      f->st.st_size, result->entries[count].sum);
+
+          const void* to_free = f->buffer_start;
+          f->buffer_start = NULL;
+          f->buffer = NULL;
+          f->buffer_valid = false;
+          free ((void*) to_free);
+        }
       result->entries[count].size = f->st.st_size;
     }