This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] New pragma exec_charset


On Fri, Oct 20, 2017 at 1:19 PM, Andreas Krebbel
<krebbel@linux.vnet.ibm.com> wrote:
> On 10/20/2017 10:28 AM, Richard Biener wrote:
>> On Fri, Oct 20, 2017 at 9:53 AM, Jakub Jelinek <jakub@redhat.com> wrote:
>>> On Fri, Oct 20, 2017 at 09:48:38AM +0200, Richard Biener wrote:
>>>> How does it work semantically to have different exec charsets?  That is,
>>>> if "strings" flow from a region with one -fexec-charset setting to a region
>>>> with another one is that undefined behavior?  Do we now require
>>>> external function declarations to be in the proper region (declared under
>>>> the appropriate exec charset flag)?  This would mean that passing
>>>> the exec charset in effect as additional argument isn't a possibility.
>>>>
>>>> Or do we have to treat -fexec-charset similar to -frounding-math, that is,
>>>> we can't ever _interpret_ any string in the compiler?  [unless -fexec-charset
>>>> is the same everywhere]
>>>>
>>>> I think the -frounding-math route is probably the easiest (and wisest
>>>> given the quite low test coverage we'll get) route.  Thus, add a -fmixed-charset
>>>> flag and reject any exec-charset attribute/pragma if that flag is not set?
>>>> With LTO we could always add this and/or merge -fexec-charset flags
>>>> appropriately,
>>>> injecting -fmixed-charset in case TUs use different settings.
>>>
>>> It wouldn't have to be an option, simply mark in cfun all functions that
>>> have more than one exec charset and give up on all optimizations/warnings
>>> that require to read the characters and merge that unknown exec_charset
>>> flag during inlining etc.  Though, that might still not be enough, e.g.
>>> the whole function might have one exec charset, but a global const char []
>>> variable might have another one and during optimization we might be looking
>>> at that.  So perhaps it would need to be a per-TU flag merged during LTO.
>>
>> There's also IPA flow of strings between functions so unless mixing
>> exec charsets
>> invokes undefined behavior I can't see how a per-function flag would help.
>>
>> But yes, if we can reliably detect whether multiple exec charsets are
>> used in a TU
>> we can make this a flag that doesn't have to be set by the user.  But that means
>> the pragma probably _always_ forces that flag given we have that
>> forced pre-included
>> file on some targest and the pragma token would occur after that...
>
> Would it make sense to mark the string literals itself as not using the default charset? Then we
> could disable all interpretations only for these strings instead of disabling it for the entire TU?

I think that would work, too.  Though I'd then rather explicitely
state the charset the string literal is in
(for efficiency we'd then need some mapping of charset id to actual
charset we store globally somewhere
and which we'd need to stream and merge for LTO - the "default" would
then always get zero and
the default charset being streamed to LTO).  Looks like
tree_base.u.bits is unused for STRING_CST
in the middle-end, you'd have to check FEs if they use a lang-specific
flag though.  Then we could
stick the exec charset number there (32bit index even - whoo).  Bah,
C++ of course uses a single
lang flag (PAREN_STRING_LITERAL_P).  Sticking it in the literals type
would work as well but
I find that a bit ugly.  We could reuse bits.address_space for a max
of 256 exec charsets,
a special value of 255 could indicate 'unknown, too many charsets'
also used in an initial implementation
without providing the actual mapping just distinguishing default from
non-default.

The interesting part is of course libcpp/cc1 interaction and getting
this all right.

Richard.

> -Andreas-
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]