Re: Repository for the conversion machinery

On 7 October 2016 at 22:26, Joseph Myers wrote:
> On Fri, 7 Oct 2016, Frank Ch. Eigler wrote:
>> FWIW, I thought at one point the consensus was that the mailmap would
>> expand only to $ rather than $userid@$organization,
>> esp. considering the case where there is no single $organization that
>> accurately covers the whole contribution timespan of the given $userid.
> I don't think there was any such consensus (older ids weren't from
> anyway so would be nonsense for that part of the
> history).
> My view is: contributors are free to specify what name and email address
> they want used, but if they want something other than a single name and
> email address for the whole commit history with a given username, it's the
> contributor's responsibility to come up with lists of commits that use
> each mapping rather than a hypothetical recipe based on examining
> ChangeLogs.

We'd only need to look at the actual ChangeLogs if the commit message
doesn't include a name and email address. And if we just use the
committer, how do we record the author of a change?

As Richi said a year ago (and my reply was drafted a year ago but not sent) ...

On 17 September 2015 at 11:44, Richard Biener wrote:
> Maybe I'm missing sth but apart from the CVS imported revisions each
> SVN revision should contain the actual change plus the changes to the
> ChangeLog files (you can't count on the commit message itself I guess
> as not all people replicate the ChangeLog entries there).

It's probably a good start though. If the commit message does have:

YYYY-MM-DD  John Doe  <>

then it's probably reliable. If the commit message doesn't have that
(when I'm committing my own work I don't include that line in the
commit message) then look for ChangeLog entries in the commit.

> There may be cases we can't handle and then doing some commit ID
> mapping might be ok, but I expect 95% of the cases to work out nicely
> so we should preserve what is in the ChangeLog entry (note that we have
> very strict formatting requirement for the authors there).

Particularly since the ChangeLog entry gives the Author, which is
often not the same as the Committer.

> [reposurgeon aside from observations with other conversions where
> different author maps were needed for different revisions: the revision
> range for commits from the gcc2 repository works in the GCC case because
> that revision range came from CVS and so there are no tags with valid
> commit authors in that range.  But if you have a repository with different
> ranges of commits having different author maps *and* those ranges contain
> SVN tags, simply specifying a range <SVN-commit>..<SVN-commit> doesn't
> work as expected, since ranges are interpreted in reposurgeon's ordering
> of events, not SVN's ordering, and the tag events are out of sequence with
> the commit events.]
