This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] [Ada] Make middle-end string literals NUL terminated


Hi Olivier,

On 08/06/18 20:01, Olivier Hainque wrote:
> Hi Bernd,
> 
> Things are moving fast so I'll answer your
> messages incrementally :-)
> 
>> On 3 Aug 2018, at 20:37, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
> [...]
>> Oh, how interesting.
>>
>> My patch only intends to add a dummy byte that is stripped again
>> in the assembly.  The purpose is to make it trivial to find out if
>> a string constant is NUL-terminated in  the middle-end by comparing
>> TYPE_SIZE_UNIT (TREE_TYPE (decl)) with TREE_STRING_LENGTH (decl).
> 
> Ah, Ok. I hadn't quite realized that yet.
> 
>> TYPE_SIZE_UNIT >= TREE_STRING_LENGTH means zero terminated
>> TYPE_SIZE_UNIT < TREE_STRING_LENGTH means not necessarily zero terminated,
> 
> That is opposite to the situation we were having
> for Ada before the patch series:
> 
>> Such strings shall never chop off more than one nul character.
>> Ada strings are generally not zero terminated, right?
> 
> Right. Strings have explicit bounds in Ada, and a NUL character
> is like any other. It can appear anywhere. For example, the
> excerpt below constructs a string of 5 characters, bounds 1 to 5,
> with a nul character in second position:
> 
>   X : String (1 .. 5) := "1" & Ascii.Nul & "345";
> 
> ('&' is the catenation operator)
> 
> The problem we had was that string literals not explicitly
> NUL terminated (as in X : String := "123" & Ascii.NUL) would
> never go in mergeable sections. For example:
> 
> Ada.Text_IO.Put_Line ("Hello World!");
> 
> Ada has so called "generic" units that are sometimes
> instanciated many many times so there is real interest
> in improving the situation there.
> 
>> So in your search loop you would not have to look after
>> type_len, because that byte would be guaranteed to be zero.
>>
>> That is if we agree that we want to restrict the string constants
>> in that way as I proposed.
>>
>> In the case str_len > type_len I do not understand if that is right.
> 
> In your new model, I don't know what sense it would make, indeed.
> 
> In the previous model, it was clearly expected to be a possibility.
> 
>> Because varasm will only output type_size bytes,
>> this seems only to select a string section
>> but the last nul byte will not be assembled.
>> Something that is not contained in the type_size
>> should not be assembled.
>>
>> What exactly do you want to accomplish?
> 
> Based on the previous (say, gcc-7 or gcc-8) code base, arrange
> for most Ada string literals to end up in a mergeable section
> on ELF targets.
> 
> In at least gcc-7 and gcc-8, our gigi adjustment + the
> patch I posted achieve this effect.
> 
> For example, for the following source:
> 
> procedure Hello is
>    procedure Process (S : String) with Import, Convention => C;
> begin
>   Process ("1234");
> end;
> 

Yes, thanks for this example,

$ cat hello.adb
procedure Hello is
    procedure puts  (S : String) with Import, Convention => C;
    X : String(1..8) := "abcdefg" & Ascii.Nul;
begin
   puts ("1234" & Ascii.Nul );
   puts (X);
end;

produces again identical output with the latest patch that I posted
right now.



> Without the gigi change, I get (x86_64-linux,
> -O2 -fmerge-all-constants):
> 
>    .section .rodata
>    .LC0:
>        .ascii "1234"
> 
> Regular section, no terminating zero.
> 
> The gigi change alone gets us the terminating nul (*),
> but not yet a mergeable section, because the extra nul
> isn't covered by the type bounds:
> 
>    .section .rodata
>    .LC0:
>        .string "1234"
> 
> With the varasm change on top, checking beyond the
> type bounds when the string constant isn't an initializer,
> we get:
> 
>    .section .rodata.str1.1,"aMS",@progbits,1
>    .LC0:
>        .string "1234"
> 
> (*) The terminating zero, part of the string value,
> not spanned by the type bounds, gets assembled through:
> 
> assemble_constant_contents (...)
> ...
> size = get_constant_size (exp);
> 
> then:
> 
> get_constant_size (tree exp)
> 
>    size = int_size_in_bytes (TREE_TYPE (exp));
>    if (TREE_CODE (exp) == STRING_CST)
>      size = MAX (TREE_STRING_LENGTH (exp), size);
> 
> Again, this is just providing more context on the
> original patch I posted, based on your questions.
> 

Ah, good point. That needs to be changed...
Actually I would like to have get_constant_size
return just the type size.

> I understand there are proposals to change things
> pretty significantly in this area, so ...
> 
> now on to your next messages and ideas :-)
> 
> 

When I try this example:

$ cat array9.adb
-- { dg-do run }

procedure Array9 is

   V1 : String(1..10) := "1234567890";
   V2 : String(1..-1) := "";

   procedure Compare (S : String) is
   begin
     if S'Size /= 8*S'Length then
       raise Program_Error;
     end if;
   end;

begin
   Compare ("");
   Compare ("1234");
   Compare (V1);
   Compare (V2);
end;

I see that "1234" is put in the merge section,
but V1 is not.  Maybe because of the alignment requirement?

But it sould not be much different from the C test case,
which is now able to merge the non-zero terminated strings.


Thanks
Bernd.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]