This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PR78888 - add value range info for tolower/toupper


On Thu, Aug 03, 2017 at 02:38:35PM +0000, Joseph Myers wrote:
> On Thu, 3 Aug 2017, Jakub Jelinek wrote:
> 
> > In any case, you should probably investigate all the locales say in glibc or
> > some other big locale repository whether tolower/toupper have the expected
> > properties there.
> 
> They don't.  In tr_TR.UTF-8, toupper ('i') == 'i', because 'İ', the 
> correct uppercase version (as returned in locale tr_TR.ISO-8859-9), is a 
> multibyte character and toupper can only return single-byte characters.

Indeed,
#include <ctype.h>
#include <locale.h>

int
main ()
{
  setlocale (LC_ALL, "");
  int i;
  for (i = -1000; i < 1000; i++)
    if (tolower (i) >= 'A' && tolower (i) <= 'Z')
      __builtin_abort ();
    else if (toupper (i) >= 'a' && toupper (i) <= 'z')
      __builtin_abort ();
  return 0;
}
fails for LC_ALL=tr_TR.UTF-8, because tolower ('I') is 'I'.
Not to mention that the result is unspecified if the functions are called
with a value outside of the range of unsigned char or EOF.
Therefore, this optimization is invalid.

	Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]