This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] New pragma exec_charset


On Fri, Oct 20, 2017 at 1:34 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Fri, Oct 20, 2017 at 1:19 PM, Andreas Krebbel
> <krebbel@linux.vnet.ibm.com> wrote:
>> On 10/20/2017 10:28 AM, Richard Biener wrote:
>>> On Fri, Oct 20, 2017 at 9:53 AM, Jakub Jelinek <jakub@redhat.com> wrote:
>>>> On Fri, Oct 20, 2017 at 09:48:38AM +0200, Richard Biener wrote:
>>>>> How does it work semantically to have different exec charsets?  That is,
>>>>> if "strings" flow from a region with one -fexec-charset setting to a region
>>>>> with another one is that undefined behavior?  Do we now require
>>>>> external function declarations to be in the proper region (declared under
>>>>> the appropriate exec charset flag)?  This would mean that passing
>>>>> the exec charset in effect as additional argument isn't a possibility.
>>>>>
>>>>> Or do we have to treat -fexec-charset similar to -frounding-math, that is,
>>>>> we can't ever _interpret_ any string in the compiler?  [unless -fexec-charset
>>>>> is the same everywhere]
>>>>>
>>>>> I think the -frounding-math route is probably the easiest (and wisest
>>>>> given the quite low test coverage we'll get) route.  Thus, add a -fmixed-charset
>>>>> flag and reject any exec-charset attribute/pragma if that flag is not set?
>>>>> With LTO we could always add this and/or merge -fexec-charset flags
>>>>> appropriately,
>>>>> injecting -fmixed-charset in case TUs use different settings.
>>>>
>>>> It wouldn't have to be an option, simply mark in cfun all functions that
>>>> have more than one exec charset and give up on all optimizations/warnings
>>>> that require to read the characters and merge that unknown exec_charset
>>>> flag during inlining etc.  Though, that might still not be enough, e.g.
>>>> the whole function might have one exec charset, but a global const char []
>>>> variable might have another one and during optimization we might be looking
>>>> at that.  So perhaps it would need to be a per-TU flag merged during LTO.
>>>
>>> There's also IPA flow of strings between functions so unless mixing
>>> exec charsets
>>> invokes undefined behavior I can't see how a per-function flag would help.
>>>
>>> But yes, if we can reliably detect whether multiple exec charsets are
>>> used in a TU
>>> we can make this a flag that doesn't have to be set by the user.  But that means
>>> the pragma probably _always_ forces that flag given we have that
>>> forced pre-included
>>> file on some targest and the pragma token would occur after that...
>>
>> Would it make sense to mark the string literals itself as not using the default charset? Then we
>> could disable all interpretations only for these strings instead of disabling it for the entire TU?
>
> I think that would work, too.  Though I'd then rather explicitely
> state the charset the string literal is in
> (for efficiency we'd then need some mapping of charset id to actual
> charset we store globally somewhere
> and which we'd need to stream and merge for LTO - the "default" would
> then always get zero and
> the default charset being streamed to LTO).  Looks like
> tree_base.u.bits is unused for STRING_CST
> in the middle-end, you'd have to check FEs if they use a lang-specific
> flag though.  Then we could
> stick the exec charset number there (32bit index even - whoo).  Bah,
> C++ of course uses a single
> lang flag (PAREN_STRING_LITERAL_P).  Sticking it in the literals type
> would work as well but
> I find that a bit ugly.  We could reuse bits.address_space for a max
> of 256 exec charsets,
> a special value of 255 could indicate 'unknown, too many charsets'
> also used in an initial implementation
> without providing the actual mapping just distinguishing default from
> non-default.
>
> The interesting part is of course libcpp/cc1 interaction and getting
> this all right.

Oh, and there are plenty of bits unused for STRING_CST so if the C++
FE could stop using lang specific tree bits we could shrink tree_string
by moving length to tree_base.u.  Re-using address_space would block
this improvement.  Finding a single bit for default vs. non-default wouldn't.

Richard.

> Richard.
>
>> -Andreas-
>>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]