This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] PR 78534 Change character length from int to size_t


On Mon, Dec 26, 2016 at 12:32 PM, FX <fxcoudert@gmail.com> wrote:
> Hi Janne,
>
> Thanks for the patch, it is hard and tedious work. Here is the formal review. I don’t want to be a pain, but I have several questions about the patch, and given its size and the importance I think we should be double-sure :)

Thanks for the in-depth review, I appreciate it!

>> I also changed the _size member in vtables from int to size_t, as
>> there were some cases where character lengths and sizes were
>> apparently mixed up and caused regressions otherwise. Although I
>> haven't tested, this might enable very large derived types as well.
>
> Regarding that one, why are you making it an explicit size_t and not a charlen type? I know the two will be the same, at least for now, but given that it’s explicitly a character length we should use that variable type. This is a preexisting issue with the front-end and library, where we generally use a mix of types (because they end up being the same anyway, such as C int and GFC_INTEGER_4).

The _size member is the size of the object; for polymorphic objects
the size isn't known at compile time, so it is to be stored in the
vtable (in principle, since the vtable tells the exact class of a
polymorphic object, it should be possible to figure out the size
without an explicit _size member, but I'll let the OOP experts handle
that, if they want). There is another vtable member _len which is used
for character lengths, and that is indeed of type
gfc_charlen_type_node.

I think when the dust settles, I might make sense to get rid of
gfc_charlen_type_node/gfc_charlen_type and just use
size_type_node/size_t also for string lengths, but at least for now I
think it's clearer to use separate names.

> Regarding the introduction of is_charlen in gfc_typespec, I am unclear as to why it is needed. It is used exclusively in arith.c, which is not where we should be checking character lengths I think. It is visible by the fact that we normally shouldn’t need access to middle-end headers (tree.h and trans-types.h) at that level. So, can’t we make the check where we currently do it, i.e. later when we translate the constant string? That sounds more reasonable that introducing a new special-cased entity.

This is, well, a leftover from my attempts to make
gfc_charlen_type_node an alias for size_type_node, an unsigned type.
Since the gfc_typespec doesn't understand unsigned integers, it had to
be extended in some fashion.  You can see that the check in arith.c
checks that the charlen is in the range [0,
TYPE_MAX_VALUE(gfc_charlen_type_node)], which the "normal" check isn't
able to do.

So strictly speaking it's not necessary, as long as
gfc_charlen_type_node is a signed integer.

OTOH, if/when one wants to make gfc_charlen_type_node unsigned,
perhaps some more far-reaching changes are needed and that work is not
necessary anymore. Say, introducing BT_UNSIGNED_INTEGER (??) and
teaching gfc_typespec to handle unsigned integers? Or maybe it's
better to put a tree node specifying the type in the typespec and use
that instead of the bt type + kind to tell what type it is?

Do you want me to remove that for the time being?

> There are other cases (resolve.c, simplify.c) where you introduce a dependency on middle-end entities (tree.h, trans-types.h) in what are pure Fortran front-end stages. This breaks the separation that currently exists, and which I strongly think we should keep.

These changes are similar to the above, i.e. a check that uses
get_type_static_bounds() and works also if gfc_charlen_type_node is
changed to be an unsigned type.

> ** libgfortran **
>
> - in io/write.c, the “for” clauses in in namelist_write() have weird spacing around their semicolons (i.e. space should be after, not before)

Ah, I hadn't noticed that. As one can see, it's a pre-existing issue,
but I might as well fix it when I'm changing that line. Will do.

> - in intrinsics/extends_type_of.c, use gfc_charlen_type instead of size_t vtype->size

As I explained earlier, this is because the size member is the size of
the object rather than the charlen, so I think it should stay a
size_t.

> ** front-end **
>
> - class.c: use gfc_charlen_int_kind instead of gfc_size_kind

Same here.

> - class.c, in find_intrinsic_vtab(): in the call to gfc_get_int_expr, the third argument cannot be cast from (size_t) to (long), as this would fail on LLP64 hosts

Yes, but... the issue is that gfc_get_int_expr uses mpz_set_si, so
can't handle more than long anyway. But hmm, maybe that typecast
should be removed anyway, so that when/if gfc_get_int_expr is fixed
this would be automatically fixed as well.

> - expr.c, regarding gfc_extract_long(): we could definitely extract an host wide int or an mpz value. This new function is called twice: once in resolve_charlen() where we could use the GMP function mpz_sgn() to check if the constant value is negative; the second time in gfc_simplify_repeat, where we should simply bail out (return NULL) if the integer is too big to fit into a long (we would bail out a few lines later anyway, see “semi-arbitrary limit”).

Yes, similar to the above. In general, code which use
mpz_{get,set}_{s,u}i and are handling string lengths should be changed
to do something else instead in order to work on LLP64 hosts. It might
be possible to go via a wide_int to convert between tree's and mpz_t's
and vice versa, but I haven't looked further into it yet. But I think
that can be left for a follow-up patch.

> - iresolve.c, extra space after NULL in call to gfc_get_int_expr() in gfc_resolve_repeat()
> - match.c, in select_intrinsic_set_tmp(), charlen should be a gfc_charlen_t and mpz_get_si will break for long string sizes
> - in resolve.c, like in arith.c, we should not use tree.h and trans-types.h. We should do the comparison by looking at integer kinds, not through the charlen_type_node
> - in resolve.c, in resolve_select_type(), another case of mpz_get_si() that will break for long string sizes
> - in simplify.c, again, we should not use tree.h and trans-types.h

Yes, so these again boil down to what I've written above:

- Should I remove the so-far preliminary work to handle
gfc_charlen_type_node being unsigned?

- Should I fix the uses of mpz_{get,set}_{s,u}i?

> - trans-decl.c seems like unrelated changes

Ah yes, leftover from some debugging. Will remove.

> - trans-types.h: why do we now need to include trans.h?

IIRC this was due to some of the new .c files including trans-types.h
but not trans.h and failing to compile. AFAIU the convention is that
headers should include whatever is necessary to use said header. So
this is some latent bug that has been exposed by my other changes.


-- 
Janne Blomqvist


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]