Bug 41041 - Documentation: -fwide-exec-charset defaults to UCS-4/UCS-2, not UTF-32/UTF-16
Summary: Documentation: -fwide-exec-charset defaults to UCS-4/UCS-2, not UTF-32/UTF-16
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: c (show other bugs)
Version: 4.3.0
: P3 normal
Target Milestone: 10.5
Assignee: Jonathan Wakely
URL:
Keywords: documentation
: 41040 41042 (view as bug list)
Depends on:
Blocks:
 
Reported: 2009-08-12 08:44 UTC by Samuel Thibault
Modified: 2022-11-05 12:45 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2022-11-03 00:00:00


Attachments
testcase (180 bytes, text/plain)
2009-08-12 08:45 UTC, Samuel Thibault
Details
fix (345 bytes, patch)
2009-08-12 08:45 UTC, Samuel Thibault
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Samuel Thibault 2009-08-12 08:44:27 UTC
Hello,  the manual page says `Set the wide execution character set, used for wide string and character constants.  The default is UTF-32 or UTF-16, whichever corresponds to the width of "wchar_t".'  This should read UCS-4 or UCS-2 instead. See attached program behavior when compiled without -fwide-exec-charset and with -fwide-exec-charset=UTF-32 , UTF-16, UCS-4 or UCS-2. Attached patch fixes it.  Samuel
Comment 1 Samuel Thibault 2009-08-12 08:45:13 UTC
Created attachment 18342 [details]
testcase
Comment 2 Samuel Thibault 2009-08-12 08:45:32 UTC
Created attachment 18343 [details]
fix
Comment 3 Samuel Thibault 2009-08-12 08:46:27 UTC
*** Bug 41042 has been marked as a duplicate of this bug. ***
Comment 4 Ludovic Brenta 2009-08-18 12:01:50 UTC
Please change the component of this PR from "ada" to "c".
Comment 5 Eric Botcazou 2015-12-05 17:52:11 UTC
.

*** This bug has been marked as a duplicate of bug 41040 ***
Comment 6 Jonathan Wakely 2022-11-03 11:04:08 UTC
I'm reopening this one, and closing 41040 as the dup, because this has all the attachments.

Samuel, please send patches to the gcc-patches mailing list (as documented in the contribution docs) instead of attaching them in bugzilla where they get ignored for over a decade.
Comment 7 Jonathan Wakely 2022-11-03 11:04:47 UTC
*** Bug 41040 has been marked as a duplicate of this bug. ***
Comment 8 Jonathan Wakely 2022-11-03 11:10:19 UTC
The difference with an explicit -fwide-exec-charset=UTF-32 seems to be the BOM. It looks like the default is UTF-32LE, are you sure it's UCS4?
Comment 9 Samuel Thibault 2022-11-03 13:38:56 UTC
It seems it indeed is by default a UTF encoding rather than a UCS encoding:

$ LANG= gcc -fshort-wchar test.c -o test
$ LANG= gcc -fshort-wchar test.c -o test   -fwide-exec-charset=UTF-16LE 
$ LANG= gcc -fshort-wchar test.c -o test   -fwide-exec-charset=UCS-2LE 
test.c: In function `main':
test.c:7:27: error: converting to execution character set: Invalid or incomplete multibyte or wide character
    7 |         wchar_t s[] = L"𝄞";
      |                           ^

Now there is indeed the question of the BOM. Ideally the text could mention all of UTF-32LE, UTF-32BE, UTF-16LE, UTF-16BE, but not sure it's really worth it.
Comment 10 Jonathan Wakely 2022-11-04 10:29:54 UTC
Now that we have macros exposing the execution character set, we can check it easily:

$ gcc -E -dM -x c /dev/null | grep EXEC
#define __GNUC_WIDE_EXECUTION_CHARSET_NAME "UTF-32LE"
#define __GNUC_EXECUTION_CHARSET_NAME "UTF-8"

So the docs are misleading. I think I'll take this bug myself and try to document it without too much verbosity.
Comment 11 Jonathan Wakely 2022-11-04 10:53:08 UTC
SOmething like this:

--- a/gcc/doc/cppopts.texi
+++ b/gcc/doc/cppopts.texi
@@ -318,9 +318,10 @@ supported by the system's @code{iconv} library routine.
 @opindex fwide-exec-charset
 @cindex character set, wide execution
 Set the wide execution character set, used for wide string and
-character constants.  The default is UTF-32 or UTF-16, whichever
-corresponds to the width of @code{wchar_t}.  As with
-@option{-fexec-charset}, @var{charset} can be any encoding supported
+character constants.  The default is one of UTF-32BE, UTF-32LE, UTF-16BE,
+or UTF-16LE, whichever corresponds to the width of @code{wchar_t} and the
+big-endian or little-endian byte order being used for code generation.  As
+with @option{-fexec-charset}, @var{charset} can be any encoding supported
 by the system's @code{iconv} library routine; however, you will have
 problems with encodings that do not fit exactly in @code{wchar_t}.
Comment 12 GCC Commits 2022-11-05 12:37:11 UTC
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:e50ea3a42f058c14ee29327d5277ab0435e3d36b

commit r13-3694-ge50ea3a42f058c14ee29327d5277ab0435e3d36b
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Fri Nov 4 12:10:32 2022 +0000

    doc: Document correct -fwide-exec-charset defaults [PR41041]
    
    As shown in the PR, the default is not UTF-32 but rather UTF-32BE or
    UTF-32LE, avoiding the need for a byte order mark in literals.
    
    gcc/ChangeLog:
    
            PR c/41041
            * doc/cppopts.texi: Document -fwide-exec-charset defaults
            correctly.
Comment 13 GCC Commits 2022-11-05 12:38:06 UTC
The releases/gcc-12 branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:1342c7f46e6e3f8f29d7971531a0af18cd8429bc

commit r12-8893-g1342c7f46e6e3f8f29d7971531a0af18cd8429bc
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Fri Nov 4 12:10:32 2022 +0000

    doc: Document correct -fwide-exec-charset defaults [PR41041]
    
    As shown in the PR, the default is not UTF-32 but rather UTF-32BE or
    UTF-32LE, avoiding the need for a byte order mark in literals.
    
    gcc/ChangeLog:
    
            PR c/41041
            * doc/cppopts.texi: Document -fwide-exec-charset defaults
            correctly.
    
    (cherry picked from commit e50ea3a42f058c14ee29327d5277ab0435e3d36b)
Comment 14 GCC Commits 2022-11-05 12:38:36 UTC
The releases/gcc-11 branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:ae31f6acb2cf9d43a265f42c12f95e4687ac1fa4

commit r11-10365-gae31f6acb2cf9d43a265f42c12f95e4687ac1fa4
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Fri Nov 4 12:10:32 2022 +0000

    doc: Document correct -fwide-exec-charset defaults [PR41041]
    
    As shown in the PR, the default is not UTF-32 but rather UTF-32BE or
    UTF-32LE, avoiding the need for a byte order mark in literals.
    
    gcc/ChangeLog:
    
            PR c/41041
            * doc/cppopts.texi: Document -fwide-exec-charset defaults
            correctly.
    
    (cherry picked from commit e50ea3a42f058c14ee29327d5277ab0435e3d36b)
Comment 15 GCC Commits 2022-11-05 12:45:20 UTC
The releases/gcc-10 branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:87b0935ed43d971a6eeebca963fb673628f138dd

commit r10-11071-g87b0935ed43d971a6eeebca963fb673628f138dd
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Fri Nov 4 12:10:32 2022 +0000

    doc: Document correct -fwide-exec-charset defaults [PR41041]
    
    As shown in the PR, the default is not UTF-32 but rather UTF-32BE or
    UTF-32LE, avoiding the need for a byte order mark in literals.
    
    gcc/ChangeLog:
    
            PR c/41041
            * doc/cppopts.texi: Document -fwide-exec-charset defaults
            correctly.
    
    (cherry picked from commit e50ea3a42f058c14ee29327d5277ab0435e3d36b)
Comment 16 Jonathan Wakely 2022-11-05 12:45:46 UTC
Docs fixed for 10.5, 11.4 and 12.3