This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen


On 08/20/2018 04:17 PM, Jeff Law wrote:
On 08/18/2018 11:38 AM, Martin Sebor wrote:
On 08/17/2018 09:32 PM, Jeff Law wrote:
On 08/17/2018 02:17 PM, Martin Sebor wrote:
On 08/17/2018 12:44 PM, Bernd Edlinger wrote:
On 08/17/18 20:23, Martin Sebor wrote:
On 08/17/2018 06:14 AM, Joseph Myers wrote:
On Fri, 17 Aug 2018, Jeff Law wrote:

On 08/16/2018 05:01 PM, Joseph Myers wrote:
On Thu, 16 Aug 2018, Jeff Law wrote:

restores previous behavior.  The sprintf bits want to count
element
sized chunks, which for wchars is 4 bytes (that count will then be

   /* Compute the range the argument's length can be in.  */
-  fmtresult slen = get_string_length (arg);
+  int count_by = dir.specifier == 'S' || dir.modifier ==
FMT_LEN_l ? 4 : 1;

I don't see how a hardcoded 4 is correct here.  Surely you need to
example
wchar_type_node to determine its actual size for this target.
We did kick this around a little.  IIRC Martin didn't think that it
was
worth handling the 2 byte wchar case.

Sorry, I think we may have miscommunicated -- I didn't think it
was useful to pass a size of the character type to the function.
I agree that passing in a hardcoded constant doesn't seem right
(even if GCC's wchar_t were always 4 bytes wide).

I'm still not sure I see the benefit of passing in the expected
element size given that the patch causes c_strlen() fail when
the actual element size doesn't match ELTSIZE.  If the caller
asks for the number of bytes in a const wchar_t array it should
get back the number bytes.  (I could see it fail if the caller
asked for the number of words in a char array whose size was
not evenly divisible by wordsize.)


I think in this case c_strlen should use the type which the %S format
uses at runtime, otherwise it will not have anything to do with
the reality.

%S is not what I'm talking about.

Failing in the case I described (a caller asking for the size
in bytes of a constant object whose type is larger) prevents
callers that don't care about the type from detecting the actual
size of a constant.

Specifically for sprintf, it means that the buffer overflow
below is no longer diagnosed:

  struct S { char a[2]; void (*pf)(void); };

  void f (struct S *p)
  {
    const char *q = (char*)L"\x41424344\x45464748";

    sprintf (p->a, "%s", q);
  }
I don't think this is in the testsuite, is it?  I verified that there
was no regressions when I installed Bernd's patch and when I installed
yours.

No, there are very few tests that exercise these kinds of mixed
argument types.  Code like that is most likely the result of
a mistake, but it's not the kind of a bug I had even thought
about until some of the codegen issues with mixed argument types
were brought up (PR 86711/86714).
Phew.  I was worried I'd somehow missed the failure or tested the wrong
patch or who knows what.

Can you add that test, xfailed for now to the testsuite?

Done in r263676.

I would just like the ability to get the length/size somehow
so that the questionable code that started us down this path
can be detected.  This is not just the sprintf(d, "%s", L"1")
kind of a mismatch but also the missing nul detection in cases
like:
[ ... ]
I know.  And ideally we'll be able to handle everything you want to.
But we also have to realize that sometimes we may have to punt.

Also remember that incremental progress is (usually) good.



  const wchar_t a[1] = L"\xffffffff";
  strcpy(d, (char*)a);
This touches on both the representational issue (excess elements in the
initializer) and how to handle a non-terminated string.  Both are issues
we're debating right now.


I don't think this is necessarily an important use case to
optimize for but it is one that GCC optimizes already nonetheless,
and always has.  For example, this has always been folded into 4
by GCC and still is even with the patch:

  const __WCHAR_TYPE__ wc[] = L"\x12345678";

  int f (void)
  {
    const char *p = (char*)wc;
    return strlen (p);            // folded to 4
  }

It's optimized because fold_const_call() relies on c_getstr()
which returns the underlying byte representation of the wide
string, and despite c_strlen() now trying to prevent it.
And I think you're hitting on some of issues already raised in the thread.

In this specific case though ISTM 4 is the right answer.  We casted to a
char *, so that's what we should be counting.  Or am I missing
something?  Also note that's the value we'd get from the strlen C
library call IIUC.

It is the right answer.  My point is that this is optimized
but the change to c_strlen() prevents the same optimization
in other similar cases.  For example, GCC 6 folds this into
memcpy:

  __builtin_strcpy (d, (char*)L"\x12345678");

GCC 7 and 8 do too but get the byte count wrong (my bad).

Current trunk doesn't optimize it.  If restoring the original
behavior is the intent (and not just fixing the counting bug)
then c_strlen() should be fixed to fold this again.

(I'm not a fan of the strcpy to memcpy transformation because
it makes other things difficult but that's beside the point.)

The missing nul variant of the same test case isn't folded (it
ends up calling the library strlen) but the bug cannot be
detected using the modified c_strlen():

  const __WCHAR_TYPE__ wc[1] = L"\x12345678";   // no nul

  int f (void)
  {
    const char *p = (char*)wc;
    return strlen (p);            // not folded
  }

To detect it, I'd have to use some other function, like
c_getstr().  I could certainly do that but I don't think
I should need to.
And that's on the list of things we're going to try and address next.  I
don't think it needs to be conflated with change which indicated to
c_strlen how to count.

The two are one and the same: if c_strlen() doesn't count bytes
in arrays of wide characters (wchar_t, char16_t, or char32_t)
then the original GCC behavior is not restored and the missing
nul detection won't work for these "mixed type" cases.

I do hope you're getting all these testcases recorded.  I'd even support
installing them into the suite now, properly xfailed.  It's easier to
see how any patch under consideration affects the wider set of tests.

I haven't been recording any of them --  I have no idea where
this is going.  As I've said, these patches prevent some of
the work I have already submitted.  That's why I have been
objecting to them, including adding the ELTSIZE argument to
c_strlen().

Martin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]