Bug 78155 - missing warning on invalid usage of functions/macros from <ctype.h> (isalpha et al.)
Summary: missing warning on invalid usage of functions/macros from <ctype.h> (isalpha ...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: c (show other bugs)
Version: 7.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: diagnostic
Depends on:
Blocks: new-warning, new_warning
  Show dependency treegraph
 
Reported: 2016-10-29 03:28 UTC by Martin Sebor
Modified: 2022-06-17 19:20 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2017-07-30 00:00:00


Attachments
Test case (149 bytes, text/plain)
2020-05-04 11:51 UTC, Bruno Haible
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Sebor 2016-10-29 03:28:30 UTC
Isalpha and other character classification functions/macros defined in the <ctype.h> header require their argument to be in range of unsigned char or EOF and have undefined behavior otherwise.  On some implementations (such as Glibc), calling them with an argument outside that range crashes the program (see below).  To help detect this common error GCC should issue a warning when the argument is known to be invalid (as in the test case below) or when its value or range is unknown, when it its type is char that is subject to sign extension.

$ cat b.c && /build/gcc-git/gcc/xgcc -B/build/gcc-git/gcc  -Wall -Wextra -Wpedantic -fdump-tree-optimized=/dev/stdout b.c && ./a.out 
int main (void)
{
    __builtin_printf ("%i\n", __builtin_isalpha (999999));
}


;; Function main (main, funcdef_no=2, decl_uid=1965, cgraph_uid=2, symbol_order=2)

main ()
{
  int D.1968;
  int _1;
  int _4;

  <bb 2>:
  _1 = __builtin_isalpha (999999);
  __builtin_printf ("%i\n", _1);
  _4 = 0;

<L0>:
  return _4;

}


Segmentation fault (core dumped)
Comment 1 Eric Gallager 2017-07-30 22:12:53 UTC
When I run the program, it prints 0 rather than crashing. Confirming that a warning would be nice though, for portability to platforms where it would cause a crash.
Comment 2 Eric Gallager 2018-07-31 04:39:32 UTC
(In reply to Eric Gallager from comment #1)
> When I run the program, it prints 0 rather than crashing. 

(probably a difference between the Darwin Libc and glibc; it might be worth investigating what other libcs like musl or uclibc do...)

> Confirming that a warning would be nice though, for portability to platforms
> where it would cause a crash.
Comment 3 Eric Gallager 2019-08-05 04:58:34 UTC
Would you expect this warning to go under an existing flag, or a new one, Martin?
Comment 4 Martin Sebor 2019-08-05 16:58:27 UTC
I don't really see what existing warning this might fall under, except perhaps -Wchar-subscripts because isalpha and friend use the argument as an index into an array of 257 characters, but that seems like a stretch.

I think maybe adding a more general warning option, say something like -Wargument-range, and using it to diagnose all such problems, might be the way to go.  To generalize the solution I would even consider adding a new function attribute, let's call it range, to specify the range of valid values of a function argument.  Then isalpha (or any other such function) could be declared like so:

  __attribute__ ((range (/* position = */1, -1, UCHAR_MAX)))
  int isalpha (int);

GCC would then check every call to the function to see if its argument is in the expected range and, if not, issue a warning.  The attribute could even be applied multiple times to specify disjoint ranges.  Position zero could denote the return value so that toupper could be declared like so

  __attribute__ ((range (/* returns = */ 0, -1, UCHAR_MAX),
                  range (/* position = */ 1, -1, UCHAR_MAX)))
  int toupper (int);
Comment 5 Eric Gallager 2019-08-05 18:47:47 UTC
ok, making this block the "new-warning" meta-bug then
Comment 6 Bruno Haible 2020-05-04 11:51:02 UTC
Created attachment 48440 [details]
Test case

Another test case is the attached program, alpha.c. When run on glibc systems on x86, x86_64, and other CPUs (not powerpc), it sign-extends the 'char' argument; so the character 'ÿ' (in ISO-8859-1 encoding) becomes EOF, and the <ctype.h> function returns 0.

$ LC_ALL=de_DE.ISO-8859-1 xterm
$ ./a.out ÿ
not alphabetic

The corrected program (with a cast to 'unsigned char' in the isalpha() argument) behaves as expected:

$ LC_ALL=de_DE.ISO-8859-1 xterm
$ ./a.out ÿ
alphabetic
Comment 7 Eric Gallager 2022-04-28 11:57:48 UTC
retitling to help me find it more easily again later
Comment 8 Eric Gallager 2022-06-08 17:39:23 UTC
(In reply to Martin Sebor from comment #4)
> I don't really see what existing warning this might fall under, except
> perhaps -Wchar-subscripts because isalpha and friend use the argument as an
> index into an array of 257 characters, but that seems like a stretch.
> 
> I think maybe adding a more general warning option, say something like
> -Wargument-range, and using it to diagnose all such problems, might be the
> way to go.  To generalize the solution I would even consider adding a new
> function attribute, let's call it range, to specify the range of valid
> values of a function argument.  Then isalpha (or any other such function)
> could be declared like so:
> 
>   __attribute__ ((range (/* position = */1, -1, UCHAR_MAX)))
>   int isalpha (int);
> 
> GCC would then check every call to the function to see if its argument is in
> the expected range and, if not, issue a warning.  The attribute could even
> be applied multiple times to specify disjoint ranges.  Position zero could
> denote the return value so that toupper could be declared like so
> 
>   __attribute__ ((range (/* returns = */ 0, -1, UCHAR_MAX),
>                   range (/* position = */ 1, -1, UCHAR_MAX)))
>   int toupper (int);

There's been an attempt to add an attribute like this recently on the mailing lists: https://gcc.gnu.org/pipermail/gcc/2022-June/238819.html