This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: PR78888 - add value range info for tolower/toupper
- From: Jakub Jelinek <jakub at redhat dot com>
- To: Joseph Myers <joseph at codesourcery dot com>
- Cc: Prathamesh Kulkarni <prathamesh dot kulkarni at linaro dot org>, gcc Patches <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 3 Aug 2017 16:50:36 +0200
- Subject: Re: PR78888 - add value range info for tolower/toupper
- Authentication-results: sourceware.org; auth=none
- Authentication-results: ext-mx07.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
- Authentication-results: ext-mx07.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=jakub at redhat dot com
- Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 39732C04B30E
- References: <CAAgBjMmtn1_bTRbKv=LdaqVioaDTObjs+ubnS+QSvcqR6fecNg@mail.gmail.com> <20170803075112.GP2123@tucnak> <CAAgBjM=yGcNsV0D0xE5U=gXuPfGWaa0EcZdiYZFy+n=64s_nzQ@mail.gmail.com> <20170803105519.GX2123@tucnak> <alpine.DEB.2.20.1708031436150.19390@digraph.polyomino.org.uk>
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
On Thu, Aug 03, 2017 at 02:38:35PM +0000, Joseph Myers wrote:
> On Thu, 3 Aug 2017, Jakub Jelinek wrote:
>
> > In any case, you should probably investigate all the locales say in glibc or
> > some other big locale repository whether tolower/toupper have the expected
> > properties there.
>
> They don't. In tr_TR.UTF-8, toupper ('i') == 'i', because 'İ', the
> correct uppercase version (as returned in locale tr_TR.ISO-8859-9), is a
> multibyte character and toupper can only return single-byte characters.
Indeed,
#include <ctype.h>
#include <locale.h>
int
main ()
{
setlocale (LC_ALL, "");
int i;
for (i = -1000; i < 1000; i++)
if (tolower (i) >= 'A' && tolower (i) <= 'Z')
__builtin_abort ();
else if (toupper (i) >= 'a' && toupper (i) <= 'z')
__builtin_abort ();
return 0;
}
fails for LC_ALL=tr_TR.UTF-8, because tolower ('I') is 'I'.
Not to mention that the result is unspecified if the functions are called
with a value outside of the range of unsigned char or EOF.
Therefore, this optimization is invalid.
Jakub