This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] [Ada] Make middle-end string literals NUL terminated
- From: Olivier Hainque <hainque at adacore dot com>
- To: Bernd Edlinger <bernd dot edlinger at hotmail dot de>
- Cc: Olivier Hainque <hainque at adacore dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Richard Biener <rguenther at suse dot de>, Eric Botcazou <ebotcazou at adacore dot com>, Arnaud Charlet <charlet at adacore dot com>, Jakub Jelinek <jakub at redhat dot com>
- Date: Mon, 6 Aug 2018 20:01:08 +0200
- Subject: Re: [PATCH] [Ada] Make middle-end string literals NUL terminated
- References: <AM5PR0701MB2657F9BAF49A613759D90D13E42E0@AM5PR0701MB2657.eurprd07.prod.outlook.com> <7D840828-B2C9-4C20-BFE9-57249E324ED3@adacore.com> <AM5PR0701MB2657A54CA2CDBA64E39DBDA8E4230@AM5PR0701MB2657.eurprd07.prod.outlook.com>
Hi Bernd,
Things are moving fast so I'll answer your
messages incrementally :-)
> On 3 Aug 2018, at 20:37, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
[...]
> Oh, how interesting.
>
> My patch only intends to add a dummy byte that is stripped again
> in the assembly. The purpose is to make it trivial to find out if
> a string constant is NUL-terminated in the middle-end by comparing
> TYPE_SIZE_UNIT (TREE_TYPE (decl)) with TREE_STRING_LENGTH (decl).
Ah, Ok. I hadn't quite realized that yet.
> TYPE_SIZE_UNIT >= TREE_STRING_LENGTH means zero terminated
> TYPE_SIZE_UNIT < TREE_STRING_LENGTH means not necessarily zero terminated,
That is opposite to the situation we were having
for Ada before the patch series:
> Such strings shall never chop off more than one nul character.
> Ada strings are generally not zero terminated, right?
Right. Strings have explicit bounds in Ada, and a NUL character
is like any other. It can appear anywhere. For example, the
excerpt below constructs a string of 5 characters, bounds 1 to 5,
with a nul character in second position:
X : String (1 .. 5) := "1" & Ascii.Nul & "345";
('&' is the catenation operator)
The problem we had was that string literals not explicitly
NUL terminated (as in X : String := "123" & Ascii.NUL) would
never go in mergeable sections. For example:
Ada.Text_IO.Put_Line ("Hello World!");
Ada has so called "generic" units that are sometimes
instanciated many many times so there is real interest
in improving the situation there.
> So in your search loop you would not have to look after
> type_len, because that byte would be guaranteed to be zero.
>
> That is if we agree that we want to restrict the string constants
> in that way as I proposed.
>
> In the case str_len > type_len I do not understand if that is right.
In your new model, I don't know what sense it would make, indeed.
In the previous model, it was clearly expected to be a possibility.
> Because varasm will only output type_size bytes,
> this seems only to select a string section
> but the last nul byte will not be assembled.
> Something that is not contained in the type_size
> should not be assembled.
>
> What exactly do you want to accomplish?
Based on the previous (say, gcc-7 or gcc-8) code base, arrange
for most Ada string literals to end up in a mergeable section
on ELF targets.
In at least gcc-7 and gcc-8, our gigi adjustment + the
patch I posted achieve this effect.
For example, for the following source:
procedure Hello is
procedure Process (S : String) with Import, Convention => C;
begin
Process ("1234");
end;
Without the gigi change, I get (x86_64-linux,
-O2 -fmerge-all-constants):
.section .rodata
.LC0:
.ascii "1234"
Regular section, no terminating zero.
The gigi change alone gets us the terminating nul (*),
but not yet a mergeable section, because the extra nul
isn't covered by the type bounds:
.section .rodata
.LC0:
.string "1234"
With the varasm change on top, checking beyond the
type bounds when the string constant isn't an initializer,
we get:
.section .rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "1234"
(*) The terminating zero, part of the string value,
not spanned by the type bounds, gets assembled through:
assemble_constant_contents (...)
...
size = get_constant_size (exp);
then:
get_constant_size (tree exp)
size = int_size_in_bytes (TREE_TYPE (exp));
if (TREE_CODE (exp) == STRING_CST)
size = MAX (TREE_STRING_LENGTH (exp), size);
Again, this is just providing more context on the
original patch I posted, based on your questions.
I understand there are proposals to change things
pretty significantly in this area, so ...
now on to your next messages and ideas :-)
Olivier