|
From: Eric S. R. <es...@th...> - 2015-11-24 22:15:59
|
Daniel J Sebald <dan...@ie...>:
> >But the real show-stopper is that, if this project is anything like
> >typical, a single ChangeLog entry often actually summarizes work from
> >*multiple* commits. It may not be clear which ones, since one of
> >the bedevilling quirks of older CVSes was that commit timestamps were
> >take *client-side* and tended to be flaky.
>
> Could the documentation be integrated into the git commit history in an
> approximate way? Say, associate all the comments for a particular day with
> the last CVS entry for that date? But it would have to be that the full
> Changelog header and date appears in the comment. So for example, one
> commit message might have three Changelog entries listed. It would only be
> an approximate alignment of commit messages and the changelog, but anyone
> who would go through the history to manually decipher and reconstruct things
> would be faced with an approximate association with what is in CVS anyway.
Yes, something like that could be attempted. The reason it's never been done
is that the implementation would be a swamp of complexity, and the results of
very dubious quality.
The easiest way I could imagine to do it (all the alternatives would require
a larger volume of custom code) would be to:
(1) Write a Python program that could convert the entire sequence of ChangeLog
into a shelf object keyed by date, with the values being a pair consisting of
a name and the comment text.
(2) Write a custom plugin for reposurgeon that would load the shelf object
produced by the previous program and walk through the commit sequence looking
for where a copy of each item should be inserted.
The heart of the code would be a predicate that takes as arguments the
following:
* the git commit date
* the git commit committer ID
* a ChangeLog entry date
* a ChangeLog author ID
and returns yes or no according as the entry should or should not be appended
to the comment of the specified commit.
Good luck writing a predicate that produces consistently reasonable results.
Here are some of the complications:
* ChangeLog entry dates only have resolution to a day
* The commit dates are unreliable both due to clock skew and unreported
time-zone offsets (remember CVS commit stamps are taken client side)
* Git committer IDs may not match ChangeLog committer IDs even if they
were the same persion. There are just too many ways for email addresses
and personal names to have variations that are transparent to a human
but not to a string-matching algorithm.
As an example of the latter, I've often run into situations where a committer
used a correct spelling of his name (featuring, for example, a Latin-1 umlaut)
one context and a plain-ASCII approximtion in the other.
My prediction is that the attempt will not end well.
--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
|