This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Git conversion: fixing email addresses from ChangeLog files

From: "Richard Earnshaw (lists)" <Richard dot Earnshaw at arm dot com>
To: Joseph Myers <jsm at polyomino dot org dot uk>
Cc: Jakub Jelinek <jakub at redhat dot com>, gcc at gcc dot gnu dot org
Date: Sat, 28 Dec 2019 17:23:38 +0000
Subject: Re: Git conversion: fixing email addresses from ChangeLog files
References: <c23ff406-c3af-52c8-5c1e-4c921790389f@arm.com> <20191228120427.GQ10088@tucnak> <8aea9992-b24e-8e31-d515-a55fb45639e0@arm.com> <alpine.DEB.2.21.1912281709001.24695@digraph.polyomino.org.uk>

On 28/12/2019 17:14, Joseph Myers wrote:
> On Sat, 28 Dec 2019, Richard Earnshaw (lists) wrote:
> 
>> My suggestion would be that we try to canonicalize all the author
>> entries to UTF-8 as that avoids the limitations of ISO-8859-1, but that
>> would probably need further fixups to detect the additional names that
>> need rewriting.
> 
> What I've implemented in bugdb.py already includes converting ISO-8859-1 
> to UTF-8 (in any case where the author name is not valid UTF-8 - a general 
> property of text encodings is that if something is valid UTF-8, it almost 
> certainly is already encoded in ASCII or UTF-8 already), with special 
> handling of NBSP and with fixups for all the cases where the results of 
> converting ISO-8859-1 to UTF-8 looked wrong (i.e. where it looked like the 
> name in the original ChangeLog was not in fact UTF-8).
> 
> I've also now made bugdb.py check the list of fixups both before and after 
> recoding (which may help in some cases where e.g. a fixup is putting a 
> name in canonical form, meaning such a fixup doesn't need to be given in 
> forms with both UTF-8 and ISO-8859-1 encodings even if the name appears 
> with both those encodings in the history).
> 
> Because the author extraction is based on the ChangeLog entry included in 
> the original commit, any subsequent commits that (wrongly or correctly) 
> recoded ChangeLog entries are not relevant.
> 

I've added the list of emails that I posted yesterday to the conversion
scripts.  I've not written anything to reprocess that yet.  I want to
leave that until we've completed the general review of the preferred
changes we want.  Auto-generating that data from the list will probably
be easier than maintaining it inside bugdb.py for now.

R.

Follow-Ups:
- Re: Git conversion: fixing email addresses from ChangeLog files
  - From: Joseph Myers

References:
- Git conversion: fixing email addresses from ChangeLog files
  - From: Richard Earnshaw (lists)
- Re: Git conversion: fixing email addresses from ChangeLog files
  - From: Jakub Jelinek
- Re: Git conversion: fixing email addresses from ChangeLog files
  - From: Richard Earnshaw (lists)
- Re: Git conversion: fixing email addresses from ChangeLog files
  - From: Joseph Myers

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]