This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Building gcc3 on Mandrake 8.0


"Zack Weinberg" <zackw@stanford.edu> writes:

> On Tue, Jun 19, 2001 at 02:08:10PM -0400, Jean-Marc Valin wrote:
> > Hi,
> > 
> > I really don't know where to send this. I don't even know if it's a bug.
> > Building gcc 3.0 (the official release) breaks on Mandrake 8.0 because
> > the following header: i686-pc-linux-gnu/libstdc++-v3/include/bits/gthr.h
> > is not generated properly from gthr.h-in
> > 
> > The correct result obtained for another system when running:
> > cat gthr.h-in | sed '/^#/s/\([A-Z_][A-Z_]*\)/_GLIBCPP_\1/g'
> > 
> > line 29:
> > #ifndef _GLIBCPP___gthr_GLIBCPP__h
> > #define _GLIBCPP___gthr_GLIBCPP__h
> > 
> > However when I run that on Mandrake 8.0, I get:
> > 
> > line 29:
> > #_GLIBCPP_ifndef _GLIBCPP___gthr_h
> > #_GLIBCPP_define _GLIBCPP___gthr_h
> 
> This happens when the locale forces sed operations to be case
> insensitive.  Do

this is the answer i have from our i18n expert :

--=-=-=
Date: Wed, 20 Jun 2001 16:27:42 +0200
From: Pablo Saratxaga <pablo@mandrakesoft.com>
To: Chmouel Boudjnah <chmouel@mandrakesoft.com>
Cc: Pablo Saratxaga <pablo@mandrakesoft.com>,
	Geoffrey Lee <snailtalk@mandrakesoft.com>
Subject: Re: ["Zack Weinberg" <zackw@stanford.edu>] Re: Building gcc3 on Mandrake 8.0

Kaixo!

On Wed, Jun 20, 2001 at 02:34:39PM +0200, Chmouel Boudjnah wrote:
 
> > cat gthr.h-in | sed '/^#/s/\([A-Z_][A-Z_]*\)/_GLIBCPP_\1/g'
 
> This happens when the locale forces sed operations to be case
> insensitive.  Do
 
> I consider it to be a bug that the locale does this sort of thing, but
> the C library maintainers don't agree.

No, it is not a locale or libc problem, but a sed problem.
If sed wants to parse an X-Y range in a locale sensitive way, then
it uses a locale sensitive function; if it wants to parse it in a locale
insensitive (eg: by X and Y numeric values) way, then it must use
a locale insensitive function (and if libc doesn't provide such, then
implement it).

Using a locale sensitive function, like most (or all?) of str*() to do
if the intention is to do a locale insensitive parsing is simply incorrect.
Unless a locale sensitivness is the goal of sed mantainer.
In any case, it should be fixed inside sed code.
(a workaround, and to be safe, is to call sed after setting LC_ALL=C
in scripts where you want locale insensitivity).

Note that you don't want locale insensitivity everywhere; for example if
you are in an utf-8 locale you may want to be able to do some sed 
manipulations on utf-8 chars, and not on bytes!

The real problem is on the way to handle a range (because if
the command would have used [ABCDEFGHIJKLMNOPQRSTUVWXYZ_] it would have
worked (I think)), should it be locale sensitive (eg, does [a-b] includes
'â' in French) or not (but then [A-T] with A=cyrillicA and T=cyrillicT may
not include some letters the user expect.

Maybe the right solution would be to create a new way to express ranges
in regexps, for locale sensitive ranges, and keep [ - ] for locale 
insensitive ones. but that may have som drawbacks too...


Anyway; after having thought about it a lot, and participated in various
threeads on the topic (sometimes heated), I now finally think that, indeed,
the [ - ] ranges should be locale insensitive; and that all programs using
them (sed, bash, perl,...) should ensure that they parse them in a case
insensitive way.


Another thing I think would be nice is if the libc could provide a set
of functions like the str*() and family, but that will be always locale
independent (they could be called byte*() or something).


In fact the str*() functions are evil, and broken.
Their names imply they are designed for handling text strings; however, only
half of them do it correctly (eg, on a multibyte locale half of them will
fail), and they are not byte functions either, as they depend also on the
locale. Half the people use them for string manipulation, half the people
for byte manipulation. A real mess.
They should be kept for compatibility but their use discouraged, and two
different set of functions should be created: for byte manipulation and
for text manipulation.


-- 
Ki ça vos våye bén,
Pablo Saratxaga

http://www.srtxg.easynet.be/		PGP Key available, key ID: 0x8F0E4975
--=-=-=


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]