This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git


> On Jun 7, 2019, at 2:36 AM, Joseph Myers <joseph@codesourcery.com> wrote:
> 
> On Wed, 5 Jun 2019, Jason Merrill wrote:
> 
>>> I think failing to credit (by name and email address) the person implied
>>> by the commit metadata, in the absence of positive evidence (such as a
>>> ChangeLog entry) for the change being authored by someone else, is just
>>> wrong, in the same way it's wrong not to use --author when committing for
>>> someone else in git.
>> 
>> It's wrong, but it's not importantly wrong.
> 
> I think it's importantly wrong not to have a name and email address for 
> the committer in the absence of using such information for the author.  
> (Whereas if the name or email address refer to the right person but are 
> anachronistic for that commit, that's what I'd consider not importantly 
> wrong.)
> 
>> For email addresses, I think that using @gcc.gnu.org would be the best
>> approach for people that have such accounts, rather than an employer address
>> from an arbitrary point in time.
> 
> I'm fine with use of @gcc.gnu.org (used together with a name for the 
> person in question that is or was valid, at or after the time of some 
> commit they made) for committers who in fact do have or did have such an 
> address (as opposed to inventing such addresses for committers from the 
> gcc2 era who never had such addresses, or anyone who only committed in the 
> egcs.cygnus.com era and who no longer had an account by the time of the 
> move to gcc.gnu.org).
> 
> When the commit adds a ChangeLog entry and thus contains contemporaneous 
> information about the preferred name and email address for the author at 
> that time, I think using that information (via the reposurgeon 
> "changelogs" feature) is preferable to a generic author map entry (thus, 
> the author map entries should be considered a fallback for those commits 
> that didn't add a ChangeLog entry (or added one with bad syntax for which 
> parsing fails, etc.)).

I've uploaded first version of "pretty" trunk [*] with author information re-written in commits.  Please take a look and comment.

The approach I used:
1. Start with SVN $userid -> $name mapping from https://gitlab.com/esr/gcc-conversion/blob/master/gcc.map .
2. For every $commit look through "git log -p $commit" history and try to find first occurrence of "$name <some@email.com>" [**]
3. If successful, update $userid -> $name -> $current_email mapping
4. Use latest entry for $userid -> $name -> $current_email mapping.  If no such entry exists, use $userid@gcc.gnu.org for email address.

This approach tries to take account of people switching companies and changing their names.

The resulting mapping is attached.  Please let me know if you spot any mistakes in it.

Entries with sha1 hash in them like
mycroft = Charles Hannum (a1c19ad21c0fb2395a2793cb4b9db71528a51c8e)
mean that the script was not able to find "Charles Hannum some@email.com" in the entire "git log -p $sha1" history.

[*] https://github.com/maxim-kuvyrkov/gcc/commits/trunk-pretty
[**] The actual regex is sed -e "s#.*$name[ <(,\t]\+\([0-Z\._-]\+@[0-Z\._-]\+\.[0-Z_-]\+\).*#EMAIL: \1#" .

--
Maxim Kuvyrkov
www.linaro.org

Attachment: authors.map
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]