This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
coalesced repies

From: Tom Lord <lord at emf dot net>
To: gcc at gnu dot org
Date: Mon, 9 Dec 2002 02:20:17 -0800 (PST)
Subject: coalesced repies
I've compiled a list of all asked and implied questions, and
provided a reply to each.

-t

================================================================


* A CVS server can die, leaving locks that need to be removed by
  hand (Zack Weinberg)

An ill-timed system crash can leave a wedged lock in just about any
system that records locks in persistent storage.  Even in systems
where locks aren't stored persistently, it is likely that the state of
persistent storage will go through critical states during which an
ill-timed crash will necessitate some form of recovery procedure.  The
reference implementation of arch is such a system (it stores locks in
persistent storage).  However, arch addresses the problem of "wedged
repositories" in two ways:

   A) The time during which locks must be held is minimized.

   B) Any user who has write access to a repository, including remote
      users, can safely break stale locks.  In other words, recovery
      doesn't require local access to the repository -- it requires
      only write access.

================================================================


* Zack has changes blocked, at least in part, by lack of version 
  control that supports renames.  (Zack Weinberg)

* Proper file renamning support is desirable.  (Joseph S. Myers)

* CVS supports tree rearrangements. (DJ Delorie)

arch has "best of breed" support for project tree rearrangements.
If you use arch in the recommended way (which involves the trade-off
of adding one-line inventory tags to your source files), then you can
rearrange trees freely in ordinary ways (`mv', `rm', etc.) and arch
will recognize these changes.  In other words, you don't have to worry
about revision control at all while you're moving files around -- arch
will "catch up" after the fact.

In addition, in case you don't want to add inventory tags to your
source files, arch has a mode of operation in which you "declare" tree
arrangements with explicit arch commands (`larch move', `larch
delete', etc.)


================================================================

* BK uses extensve checksumming to protect against HW failures.
  What about arch and svn?  (Joseph S. Myers)

Arch is, at present, very weak in this area.  It is very
straightforward (O(500) lines of code) to add these protections to the
reference implementation.  Because this is a very important issue
(IMO), I would want those 500 lines of code to be backed up by
carefully crafted design documents and testing.  I've been intrigued
by some of the lessons offered by the OpenCM design in this area (file
revisions in opencm are _identified_ by their unique hashes).  This is
one of the items on the agenda for my 6-engineer/1-year/$1.2M plan for
arch 1.0.

================================================================


* Atomic checkins across multiple files is desirable, though
  lack of this feature is rarely a problem in practice (Joseph
  S. Myers)

Two points in response:

A) arch does indeed have atomic checkins.

B) When you say it is "rarely a problem in practice", I suspect you
are thinking about interference between concurrent checkins.  But that
is not the only reason atomic checkins are desirable.

Another reason that atomic checkins are desirable is because of the
_reference librarian_ role of revision control.  For example: some
feature, X, is merged in.  Some time later, you want to query: what
were all of the changes (to all files) associated with feature X?  Or
you might want to say: I have this branch here, and the mainline
includes not only X, but Y, and Z as well -- please help me merge in
just feature X, leaving out Y and Z.  An atomic commit of X (such as
supported by arch) makes such queries and requests practical.



================================================================

* O(1) tag and branch operations are highly desirable (Joseph
  S. Myers)


In arch, as in svn, tagging and branching are the same thing.

In arch, these are fast O(1) operations.  Unfortunately, there is a
bug in the reference implementation that causes this operation to not
be O(1) -- but it is straightforward to fix, and is part of what is on
the agenda for 6 engineers/1 year/$1.2M.

If you want some gory details:  the cost of forming a tag in the
reference implementation is roughly the cost of creating a one-line
file in the repository, plus the cost of storing a log message for the
tag revision.  The bug is that some automatically generated headers in
the log message summarize log information from the tagged revision,
but currently, computing that summary is (for stupid reasons) an
expensive operation.


================================================================


* Good performance for basic operations (checkout, update, ...) on
  branches is desirable.  (Joseph S. Myers)

* CVS is slow.  Operations on a branch are painful. (David S. Miller)

* Performance is so important that a faster system will convince
  many people.  Bitkeeper is far nicer in this reagard than CVS.
  (David S. Miller)


Nothing interesting has a simple answer.  You've asked an interesting
question.

In the most important case, arch is very fast at these operations.
For example, most of my checkouts happen at roughly the speed of `cp
-r'.

But there are non-rare cases that are not that fast.  How fast are
they?  Often they are much faster than CVS -- but you can certainly
run arch in ways that makes them stupidly slow.

How much depth would you like to go into on this issue?  I can
describe how to use arch so that all of those operations are pretty
much always about as fast as could possibly be.  It isn't an admin
headache -- it's something you set up once, and thereafter it "just
works".  The deepest way to understand the issues is to learn about
the various storage management options in arch (archives and revision
libraries, and the options for caching).


================================================================


* Better testing tools are desirable.  I'm particularly interested in
  more testing, and more testcases.  (Joseph S. Myers)

Obviously arch isn't going to write test cases for you.

What arch _can_ do is help implement "more testing".  In particular, a
single test server can grab trees to test from multiple branches,
spanning multiple repositories.  To developers, all those multiple
repositories are unified -- it's just one big archive.  To repository
adminstrators, those repositories are strictly orthogonal -- the fact
that Joe Schmoe has made some branches from my branches imposes no
admin costs on me.  To the automated testing tools: just give me the
globally unique name of the development line to test, and I'll run
nightly tests on it.


================================================================



* Better bug-tracking is desirable: in particular, to assist with
  vetting bug reports.  It might be nice if bug-reports had enough 
  well-defined fields for automatic testing.

* Bug tracking linkage is already automated: when a commit happens
  and a PR number is mentioned in the log, the bug database notices
  this. (Phil Edwards)



The underlying issue here is the role of revision control as archive
and reference librarian.   In other words, between the bug database
and the source database, we have an (open ended) system of
cross-references -- and recognize some value in automating processes
with respect to those cross-references.

This is an open-ended problem, and arch can only contribute towards a
solution -- not completely provide one (that's part of why I started
the `arch-tech' project: a project that includes unifying bug-tracking
and revision control).

arch itself can contribute namespaces here.  arch defines a namespace
in which every revision, regardless of what repository its in, or what
branch its on, has a globally unique name.   `arch' commands treat all
the repositories in the world as just one big repository, addressed
via those globally unique names.  Those names are one of the critical
hooks arch provides for linking up with bug-tracking in process
automation hacks.

Another set of hooks that arch provides are "notifiers" -- events that
can be triggered in response to changes to arch repositories.  Lots of
systems provide variations on this: arch is no exception.



================================================================

* File renames should not wait for a new revision control system.
  A future revision control system should "reconstruct" renames
  from the record of CVS delete+add operations.

There are a number of projects that are working on recovering "whole
tree" history from the imperfect evidence in CVS repositories.




================================================================

* Long-lived branches tend to fall out of sync with HEAD.  When they
  are merged, old problems are reintroduced to HEAD.  (Joel Sherrill and
  Joseph S. Myers)


arch has quite a bit to contribute in this area.

Let's suppose we have a mainline, A, and a branch of that, B.

Let's suppose they evolve concurrently - that seems to be the
situation you are describing.

As B hits stable and useful milestones, we'll want to merge it into
A.  As A evolves for orthogonal changes, we'll want to merge it into
B.

But this kind of back and forth merging is poorly supported by CVS,
svn, and, well, everything except for arch.   That's understandable.
Repeated merging like this requires seven different merging
techniques, depending on the what merges have gone before.

So, no wonder branches fall out of sync: keeping up (repeated merging)
without automated help is a pain in the ass.

But arch addresses that.  The star-merge command, and underlying that:
it's changeset semantics and history-tracking mechanisms, make it
practical to keep B up-to-date with A as development on both
proceeds.

In other words, in an arch based world, you can feel quite reasonable
about demanding that branches (B) are fully up-to-date with HEADS (A)
before even considering them for merger.



================================================================



* Important ChangeLog information is easily lost when merging a
  branch into the mainline.  For example, sometimes ChangeLogs refer
  to changes stored in a non-main repository -- and that information
  is lost if the non-main repository goes away.  This could be fixed
  if those branches could be somehow "adopted" into the mani
  repository. (Joseph S. Myers)


There's a couple of relevant features of arch here.

First, arch is designed, in part, around the concept of mirroring
repositories.  If there's some branch, in some remote repository, and
you want to merge it into your critical branch -- you'll have no
problem making a local mirror of that remote branch's repository.
History is easy to keep track of with arch.

Second, arch has an (optinoal) feature to autogen changelogs from log
messages.  It generates a hypertext (e.g., the merge message has links
to all the merged revisions).  That hypertext is formatted wiki-style:
plain text that reads like a nice changelog, but that is also
ammenable to regexp-based parsing.  You can see an example of this in
my arch distributions:  there is a primary ChangeLog that tells the
main story, and a subdirectory of ChangeLogs from branches that fill
in the details.   These are plain-text-hypertext and all automatically
generated from log messages.  (If you like, I can try to bring back up
my web server -- so that you can see these traslated to HTML.)


================================================================

* Users without write access can not do "cvs add" and then produce
  diffs that include the added files.  They can "fake" this locally,
  but it is a notable problem.  (Joseph S. Myers)


In arch, remote users without write privilege just need to create a
directory -- their own personal repository.   Their branches reside
there.  arch unifies all repositories, everywhere, into one big
database.   All developers have all the features of revision control,
regardless of their privileges in particurlar repositories.



================================================================

* Testing is already quite well automated, thank you very much.
   (Phil Edwards and DJ Delorie)

* Comparative analysis of historic revisions and of various patch
   combinations is already automated: see Diego's SPEC2000 pages.
   (Phil Edwards)

* Regressions are being identified essentially by hand, though
   running the tests is an automated process, and some people have
   scripts to do that automatically.  Improvements in this area
   are welcome.  (Craig Rodrigues)

* Contrary to your claims, HEAD is less broken than other branches.
   Other branches are at least as broken, but we don't keep track of 
   that.  (DJ Delorie)


I think Diego's pages are awesome.   That's the direction to go.

arch can help by making it easier to add new branches to tests.
It can help by giving branches globally unique names, and by making
the repository-location of branches immaterial (from the testing
tools' perspective).

In other words, where Diego's pagse talk about two branches, I think
they should talk about _every_ branch, including those owned by people
without write access to the main repository.  And if Zach or Mark
wants to set up a temporary branch for convenience, while merging
things for the next release milestone -- it should be pretty trivial
to add pages for that branch.

arch's features that can help: the global namesapce, distributed
repositories, and the "notifier" feature (trigger on repository
events). 


================================================================


* Yes, there is a tension between release cycle and feature
  development, highlighted by freezes, but I don't see that
  revision control is relevant to this: the issue is people-hours.
  (Phil Edwards and DJ Delorie)


I have a _tentative_ answer, based on what I read in this discussion
and on the GCC list generally.

It looks like merging is a big chore -- people put it off until they
can't.  arch's "best of breed" use of intelligent merging and merge
histories makes it easier for people to "merge early and often".

So, sure, you can freeze the mainline -- but during that freeze,
people working on branches can keep up by merging into their branches,
secure in the knowledge that the reverse merge will be
straightforward.  Again, the `star-merge' command is really useful.



================================================================


* The term "genuine uptake", a term from feminist philosophy, is
  incomprehensible to me -- indeed, the concept of "feminist
  philosophy" sounds pretty dubious.  (Kai Henningsen)


You are in good company.  Many respectable intellectuals are similarly
skeptical.  In my opinion, this is largely due to lack of exposure to
high quality source materials.

Here are two book recommendations: the first is a
not-particularly-technical work of feminist philosphy, the second is
fairly techical, not feminist philosphy, but closely related.

Personally, I find the term "feminist philosphy" to be problematic.
I think we need a category like "philosophical analysis of oppression"
-- and both of these recommendations would fit in that category.


	A) "The Politics of Reality: Essays in Feminist Theory"
	   Marilyn Frye.  Crossing Press; ISBN: 089594099X; (May 1983)

	   http://www.amazon.com/exec/obidos/tg/detail/-/089594099X/qid=1039426345/sr=8-1/ref=sr_8_1/002-5637916-3974411?v=glance&s=books&n=507846

	   From an Amazon customer review:

	       I use "The Politics of Reality" as the primary textbook
	       for my course on philosophical issues in feminism. Frye
	       is extremely lucid and intelligent; beginning students
	       appreciate her useful analogies and well-structured
	       essays, and I still get something new out of the book
	       every time I read it.



	B) "Power: Essential Works of Foucault, 1954-1984, Volume III"
	   Michel Foucault, James D. Faubion, Rabert Hurley, Paul
	   Rabinow

	   http://www.amazon.com/exec/obidos/tg/detail/-/1565847091/qid=1039426628/sr=1-1/ref=sr_1_1/002-5637916-3974411?v=glance&s=books

	   From an Amaxon customer review:

	        This collection of Foucault's essays, lectures,
	        interviews, and editorials, offers even the casual
	        reader of Foucault welcome insights into his methods,
	        his intellectual biography and the development of his
	        own methods. Most valuable perhaps are interviews
	        collected from various magazines where he is
	        challenged by his interviewers to respond to their
	        criticisms and the criticisms of others. In one, for
	        instance, Foucault tries hard to correct those who
	        read his works as a totalizing critique of capitalism,
	        or the current penal system, or the mental
	        institution. He insists that his works are only
	        intended to be seen as the history of various specific
	        institutions and that those critics and followers who
	        are tempted to project his findings onto current
	        practices distort his intent.
Follow-Ups:
- Re: coalesced repies
  - From: Paul Koning
- Re: coalesced repies
  - From: Joseph S. Myers
- Re: coalesced repies
  - From: Mike Stump
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]