Created attachment 48552 [details] patch The GCC code should correctly use the <ctype.h> functions. Compiling GCC with -Werror=char-subscripts should succeed.
Why cast to unsigned char? The prototypes for tolower(), toupper(), isdigit(), etc show that the type of the argument is int. Also, why are errors being issued? None of the places where you have inserted an (unsigned char) cast is a subscripts.
>--- Comment #1 from kargl at gcc dot gnu.org --- >Why cast to unsigned char? The prototypes for tolower(), toupper(), >isdigit(), etc show that the type of the argument is int. See https://stackoverflow.com/a/60696378 for a detailed explanation.
(In reply to Roland Illig from comment #2) > >--- Comment #1 from kargl at gcc dot gnu.org --- > >Why cast to unsigned char? The prototypes for tolower(), toupper(), > >isdigit(), etc show that the type of the argument is int. > > See https://stackoverflow.com/a/60696378 for a detailed explanation. ...which in turn led to bug 94182
On Tue, May 19, 2020 at 04:38:32AM +0000, roland.illig at gmx dot de wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95177 > > --- Comment #2 from Roland Illig <roland.illig at gmx dot de> --- > >--- Comment #1 from kargl at gcc dot gnu.org --- > >Why cast to unsigned char? The prototypes for tolower(), toupper(), > >isdigit(), etc show that the type of the argument is int. > > See https://stackoverflow.com/a/60696378 for a detailed explanation. > Ah, yeah, so? There are no subscripts in the code you are changing. Why does -Werror=char-subscripts trigger if there are no subscripts? Is the error flag misnamed? If you're going to fix unbroken code, why not cast the argument to the declared type of the ctype functions?
Calling toupper() or any other character classification function declared in <ctype.h> with a negative value other than EOF is undefined. When char is a signed type, using any value outside the 7-bit ASCII set runs the risk of accessing the char classification array, commonly used to implement the functions, outside its bounds due to sign extension. The Stack Overflow post describes the technique in the abstract. An example of a real implementation is Glibc (see for instance its __isctype macro in <ctype.h>). Glibc uses casts or other conversions from char to a signed type before using the character value which suppresses GCC's -Wchar-subscripts, but the problem still exists. To avoid the out-of-bounds access the argument to these functions should be cast to unsigned char first. This is described in some detail in the CERT C Secure Coding Standard rule STR37-C. Arguments to character-handling functions must be representable as an unsigned char: https://wiki.sei.cmu.edu/confluence/x/BNcxBQ.
See also pr78155 for a request to get GCC to warn for some instances of this problem.
As a matter of style, I do not like random casts, they clutter the code and make adhering to the 80 character line limit even more of a pain. A macro might be better here, or a function safe_toupper.
Alternatively, it might make sense to change some variable types to GFC_UINTEGER_1 aka unsigned char.
On Wed, May 20, 2020 at 04:10:50AM +0000, tkoenig at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95177 > > Thomas Koenig <tkoenig at gcc dot gnu.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Ever confirmed|0 |1 > Status|UNCONFIRMED |NEW > Last reconfirmed| |2020-05-20 > > --- Comment #8 from Thomas Koenig <tkoenig at gcc dot gnu.org> --- > Alternatively, it might make sense to change some variable types to > GFC_UINTEGER_1 aka unsigned char. > That could work. I'm still trying to understand how an option names -Werror=char-subscripts could trigger an error. There are no subscripts. AFAIK, the patched routines are not general purpose routines, so the char arguments can only take on values from the Fortran character set, which is a subset of the 7-bit ASCII set. Just commit the patch. I've wasted too much time trying to get an answer about how the option works. Nothing like cluttering working code.
(In reply to Steve Kargl from comment #9) > That could work. I'm still trying to understand how an > option names -Werror=char-subscripts could trigger an > error. There are no subscripts. The C standard allows every library function to also be defined as a macro. This also applies to the functions from <ctype.h>. The typical implementation of the <ctype.h> functions is: extern int __ctype_classes[1 /* for EOF */ + UCHAR_MAX]; int isalpha(int ch) { return (__ctype_classes + 1)[ch] & 0x0008; } The corresponding macro is defined like this: #define isalpha(ch) ((__ctype_classes + 1)[(ch)] & 0x0008) That's where the subscript comes from. The macro has no implicit type conversion (at least on NetBSD; the GNU libc may differ) and thus produces the warning.