Bug 95177 - error: array subscript has type char
Summary: error: array subscript has type char
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: libfortran (show other bugs)
Version: 10.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: build, diagnostic
Depends on:
Blocks:
 
Reported: 2020-05-17 19:40 UTC by Roland Illig
Modified: 2020-10-13 03:00 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2020-05-20 00:00:00


Attachments
patch (907 bytes, patch)
2020-05-17 19:40 UTC, Roland Illig
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Roland Illig 2020-05-17 19:40:44 UTC
Created attachment 48552 [details]
patch

The GCC code should correctly use the <ctype.h> functions.

Compiling GCC with -Werror=char-subscripts should succeed.
Comment 1 kargl 2020-05-19 00:43:23 UTC
Why cast to unsigned char?  The prototypes for tolower(), toupper(),
isdigit(), etc show that the type of the argument is int.

Also, why are errors being issued?  None of the places where 
you have inserted an (unsigned char) cast is a subscripts.
Comment 2 Roland Illig 2020-05-19 04:38:32 UTC
>--- Comment #1 from kargl at gcc dot gnu.org ---
>Why cast to unsigned char?  The prototypes for tolower(), toupper(),
>isdigit(), etc show that the type of the argument is int.

See https://stackoverflow.com/a/60696378 for a detailed explanation.
Comment 3 Eric Gallager 2020-05-19 12:40:53 UTC
(In reply to Roland Illig from comment #2)
> >--- Comment #1 from kargl at gcc dot gnu.org ---
> >Why cast to unsigned char?  The prototypes for tolower(), toupper(),
> >isdigit(), etc show that the type of the argument is int.
> 
> See https://stackoverflow.com/a/60696378 for a detailed explanation.

...which in turn led to bug 94182
Comment 4 Steve Kargl 2020-05-19 14:52:50 UTC
On Tue, May 19, 2020 at 04:38:32AM +0000, roland.illig at gmx dot de wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95177
> 
> --- Comment #2 from Roland Illig <roland.illig at gmx dot de> ---
> >--- Comment #1 from kargl at gcc dot gnu.org ---
> >Why cast to unsigned char?  The prototypes for tolower(), toupper(),
> >isdigit(), etc show that the type of the argument is int.
> 
> See https://stackoverflow.com/a/60696378 for a detailed explanation.
> 

Ah, yeah, so?

There are no subscripts in the code you are changing.
Why does -Werror=char-subscripts trigger if there are
no subscripts?  Is the error flag misnamed?

If you're going to fix unbroken code, why not cast the
argument to the declared type of the ctype functions?
Comment 5 Martin Sebor 2020-05-19 16:20:08 UTC
Calling toupper() or any other character classification function declared in <ctype.h> with a negative value other than EOF is undefined.  When char is a signed type, using any value outside the 7-bit ASCII set runs the risk of accessing the char classification array, commonly used to implement the functions, outside its bounds due to sign extension.  The Stack Overflow post describes the technique in the abstract.  An example of a real implementation is Glibc (see for instance its __isctype macro in <ctype.h>).

Glibc uses casts or other conversions from char to a signed type before using the character value which suppresses GCC's -Wchar-subscripts, but the problem still exists.

To avoid the out-of-bounds access the argument to these functions should be cast to unsigned char first.  This is described in some detail in the CERT C Secure Coding Standard rule STR37-C. Arguments to character-handling functions must be representable as an unsigned char: https://wiki.sei.cmu.edu/confluence/x/BNcxBQ.
Comment 6 Martin Sebor 2020-05-19 16:34:02 UTC
See also pr78155 for a request to get GCC to warn for some instances of this problem.
Comment 7 Thomas Koenig 2020-05-20 04:02:36 UTC
As a matter of style, I do not like random casts, they clutter the code and make
adhering to the 80 character line limit even more of a pain.

A macro might be better here, or a function safe_toupper.
Comment 8 Thomas Koenig 2020-05-20 04:10:50 UTC
Alternatively, it might make sense to change some variable types to
GFC_UINTEGER_1 aka unsigned char.
Comment 9 Steve Kargl 2020-05-20 04:30:58 UTC
On Wed, May 20, 2020 at 04:10:50AM +0000, tkoenig at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95177
> 
> Thomas Koenig <tkoenig at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>      Ever confirmed|0                           |1
>              Status|UNCONFIRMED                 |NEW
>    Last reconfirmed|                            |2020-05-20
> 
> --- Comment #8 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
> Alternatively, it might make sense to change some variable types to
> GFC_UINTEGER_1 aka unsigned char.
> 

That could work.  I'm still trying to understand how an
option names -Werror=char-subscripts could trigger an
error.  There are no subscripts.  AFAIK, the patched 
routines are not general purpose routines, so the char
arguments can only take on values from the Fortran
character set, which is a subset of the 7-bit ASCII
set.  

Just commit the patch.  I've wasted too much time trying
to get an answer about how the option works.  Nothing
like cluttering working code.
Comment 10 Roland Illig 2020-05-20 06:19:59 UTC
(In reply to Steve Kargl from comment #9)
> That could work.  I'm still trying to understand how an
> option names -Werror=char-subscripts could trigger an
> error.  There are no subscripts.

The C standard allows every library function to also be defined as a macro.
This also applies to the functions from <ctype.h>.

The typical implementation of the <ctype.h> functions is:

extern int __ctype_classes[1 /* for EOF */ + UCHAR_MAX];

int isalpha(int ch) {
  return (__ctype_classes + 1)[ch] & 0x0008;
}

The corresponding macro is defined like this:

#define isalpha(ch) ((__ctype_classes + 1)[(ch)] & 0x0008)

That's where the subscript comes from.

The macro has no implicit type conversion (at least on NetBSD; the GNU libc may differ) and thus produces the warning.