April 2011

Karl Fogel holding a paperback copy of 'Produire Logiciels Libres'.

The folks at Framabook graciously sent me some copies of the print version of their French translation of my book (in French, “Produire Logiciels Libres“, in English, “Producing Open Source Software“). They also sent some questions for an online interview to accompany the release, and Olivier Rosseler translated my responses.

The French version of the interview is now up at www.framablog.org/index.php/post/2011/04/10/karl-fogel-interview. I’m posting the English original here, and thank them very much for asking such provocative questions.

  From: Karl Fogel
  To: Christophe Masutti, Alexis Kauffmann
  Subject: Re: Interview french version POSS
  Date: Fri, 11 Mar 2011 19:05:00 -0500
  Christophe Masutti writes:

  > Hi Karl, could tell a few words about yourself to our French speaking
  > readers?
  > The French version of POSS has just been published, and your book was
  > translated or is being translated in other languages. What are your
  > feelings about all theses remixes of your work, all made possible
  > because you chose to put your book under a free licence?
  My feelings are 100% positive.  This has simply no downside for me.  The
  translation makes the book accessible to more readers, and that's
  exactly what I want.  I'm very grateful to all the translators.
  > If you were to write a second version of POSS today, what would you
  > change in it or add to it? By the way, do you plan on doing such a
  > rewriting?
  Well, in fact I am always adjusting it as open source practices change.
  The online version evolves steadily; maybe eventually we'll announce
  that some kind of official "version 2.0" has been reached, but really
  it's a continuous process.
  For example, five or six years ago, it was more common for projects to
  run their own development infrastructure.  People would set up a server,
  install a version control system, a bug tracker, a mailing list manager,
  maybe a wiki, and that would be where project development happens.
  But there's been a lot of consolidation since then.  Nowadays, only the
  very largest and very smallest projects run their own infrastructure.
  The vast majority use one of the prebuilt hosting sites, like GitHub,
  Google Code Hosting, SourceForge, Launchpad, etc.  Most open source
  developers have interacted with most of these sites by now.
  So I've been updating the part of the book that talks about hosting
  infrastructure to talk more about using "canned hosting" sites like the
  above, instead of rolling your own.  People now recognize that running a
  hosting platform, with all its collaboration services, is a big
  operational challenge, and that outsourcing that job is pretty much
  required if you want to have time to get any work done on your project.
  I've also updated the book to talk about new versions of open source
  licenses (like the GNU General Public License version 3, that came out
  after the book was first published), and I've adjusted some of the
  recommendations of particular software, since times have changed.  For
  example, Git is much more mature now than it was when I first wrote the
  > FLOSS is being produced pretty much the same way now than five years
  > ago. But forges have appeared that differ from the SourceForge model.
  > I'm thinking of GoogleCode, and especially GitHub. GitHub can be
  > considered as the "Facebook" of Open Source forges, in the way that
  > they offer social network functionalities, and that it is possible to
  > commit directly from one's browser. The notion of "fork" here is
  > different from what we are used to. What do you think about all that?
  Actually, I think the notion of forking has not changed -- there has
  been some terminological shift, perhaps, but no conceptual shift.
  When I look at the dynamics of how open source projects work, I don't
  see huge differences based on what forge the project is using.  GitHub
  has a terrific product, but they also have terrific marketing, and
  they've promoted this idea of projects inviting users to "fork me on
  GitHub", meaning essentially "make a copy of me that you can work with".
  But even though there is a limited technical sense in which a copy of a
  git-based project is in theory a "fork", in practice it is not a fork --
  because the concept of a fork is fundamentally political, not technical.
  To fork a project, in the old sense, meant to raise up a flag saying "We
  think this project has been going in the wrong direction, and we are
  going to take a copy of it and develop it in the right direction --
  everyone who agrees, come over and join us!"  And then the two projects
  might compete for developer attention, and for users, and perhaps for
  money, and maybe eventually one would win out.  Or sometimes they'd
  merge back together.  Either way, the process was a political one: it
  was about gaining adherents.
  That dynamic still exists, and it still happens all the time.  So if we
  start to use the word "fork" to mean something else, that's fine, but it
  doesn't change anything about reality, it just changes the words we use
  to describe reality.
  GitHub started using "fork" to mean "create a workable copy".  Now, it's
  true that the copy has a nice ability to diverge and remerge with the
  original on which it was based -- this is a feature of git and of all
  decentralized version control systems.  And it's true that divergence
  and "remergence" is harder with centralized version control systems,
  like Subversion and CVS.  But all these Git forks are not "forks" in the
  real sense.  Most of the time, when a developer makes a git copy and
  does some work in it, she is hoping that her work will eventually be
  merged back into the master copy.  When I say "master" copy, I don't
  mean "master" in some technical sense, I mean it exactly in the political
  sense: the master copy is the copy that has the most users following it.
  So I think these features of Git and of GitHub are great, and I enjoy
  using them, but there is nothing revolutionary going on here.  There may
  be a terminology shift, but the actual dynamics of open source projects
  are the same: most developers make a big effort to get their changes
  into the core distribution, because they do not want the effort of
  maintaining there changes independently.  Even though Git somewhat
  reduces the overhead of maintaining an independent set of changes, it
  certainly does not reduce it so much that it is no longer a factor.
  Smart developers form communities and try to keep the codebase unified,
  because that's the best way to work.  That is not going to change.
  > In June 2010, Benjamin Mako Hill remarked in his "Free Software Needs
  > Free Tools" article that hosting open source projects on proprietary
  > platforms was kind of a problem. According to you, is this a major
  > problem, a minor one, or is it no problem at all?
  > http://mako.cc/writing/hill-free_tools.html
  Well, I know Mako Hill, and like and respect him a great deal!  I think
  I disagree with him on this question, though, for a couple of reasons.
  First, we have to face reality.  It is not possible to be a software
  developer today without using proprietary tools.  Only by narrowing the
  definition of "platform" in an arbitrary way is it possible to fool
  ourselves into thinking that we are using exclusively free tools.  For
  example, I could host my project at Launchpad, which is free software,
  but can I realistically write code without looking things up in Google's
  search engine, which is not free software?  Of course not.  Every good
  programmer uses Google, or some other proprietary search engine, daily.
  Google Search is part of the platform -- we cannot pretend otherwise.
  But let's take the question further:
  When it comes to project hosting, what are the important freedoms?  You
  are using a platform, and asking others to use it to collaborate with
  you, so ideally that platform would be free.  That way, if you want to
  modify its behavior, you can do so: if someone wants to fork your
  project (in the old, grand sense), they can replicate the hosting
  infrastructure somewhere under their control if absolutely necessary.
  Well, that's nice in theory, but frankly, if you had all the source code
  to (say) Google Code Hosting, under an open source license, you still
  would not be able to replicate Google Code Hosting.  You'd need Google's
  operations team, their server farms... an entire infrastructure that has
  nothing to do with source code.  Realistically, you cannot do it.  You
  can fork the project, but generally you are not going to fork its
  hosting platform, because you don't have the resources.  And since you
  can't run the service yourself, you also can't tweak the service to
  behave in the ways you want -- because the people who run the physical
  servers have to decide which tweaks are acceptable and which aren't.  So
  in practice, you can't have either of these freedoms.
  (Some hosting services do attempt to give their users as much freedom as
  possible.  For example, Launchpad's code is open source, and they do
  accept patches from community members.  But the company that hosts
  Launchpad still approves every patch that they incorporate, since they
  have to run the servers.  I think SourceForge is about to try a similar
  arrangement, given their announcement of Allura yesterday.)
  So, given this situation, what freedom is possible?
  What remains is the freedom to get your data in and out.  In other
  words, the issue is really about APIs -- that is, "application
  programming interfaces", ways to move data to and from a service in a
  reliable, automatable way.  If I can write a program to pull all of my
  project data out of one forge and move it to a different forge, that is
  a useful freedom.  It means I am not locked in.  It's not the only
  freedom we can think of; it's not even the ideal freedom.  But it's the
  practical freedom we can have in a world in which running one's own
  servers has become prohibitively difficult.
  I'm not saying I like this conclusion.  I just think it is reality.  The
  "hunter gatherer" phase of open source is over; we have moved into the
  era of dependency on agricultural and urban infrastructure.  You can't
  dig your own irrigation ditches; you can't build your own sewer system.
  It's too hard.  But data portability means that if someone else is doing
  a bad job of those things, you can at least move to someplace that is
  doing a better job.
  So I don't care very much that GitHub's platform is proprietary, for
  example.  Of course I would prefer it to be entirely open source, but
  the fact that it is not does not seem like a huge problem.  The thing I
  look at first, when I'm evaluating any forge-like service, is: how
  complete are their APIs?  Can I get all my data off, if I need to?  If
  they provide complete APIs, it means they are serious about maintaining
  the quality of the service, because they are not trying to lock in their
  users through anything other than quality of service.
  > In France, high school and junior high students don't have computing
  > classes. Do you think computing as a subject -- and not only as a tool
  > for other subjects -- should be taught in schools?
  Absolutely.  The ability to understand data and symbolic processing is
  now very important.  It's a form of literacy.  You don't have to be a
  programmer, but you need to understand roughly how data works.  I had a
  conversation the other day that showed this gap in a very clear way.
  I was at the doctor, having some tests done.  The test involved a video
  image of my heart beating (using an ultrasound device), and the entire
  sequence was recorded.  It was amazing to see!  So afterwards, I asked
  at the front desk if I could get the data.  Those were my exact words:
  "Can I please get all the data from that echocardiogram?"  The clerk's
  reply was that they could give me a sheet with low-resolution pictures.
  "Thanks, but I actually want the data," I replied.  Yes, she said,
  that's what she was offering.  To her, the phrase "the data" did not
  have the very specific meaning it does to the data-literate.  What I
  meant, of course, was that I wanted every single bit that they had
  recorded.  That's what "all the data" means, right?  It means you don't
  lose any information: it's a bit-for-bit copy.  But she didn't have a
  definite concept of data.  To her, data means "something that I can
  recognize as being related to the thing requested".  For me, it was
  informational and computational; for her, it was perceptual.
  I realize this sounds harsh, but I really believe that is a form of
  illiteracy today.  You have to recognize when you are getting real
  information versus fake information, and you have to understand the vast
  difference in potential between the two.  If I go to another doctor,
  imagine the difference between me handing her a USB thumb drive with the
  complete video recording of my echocardiogram, and handing her some
  printouts with a few low-resolution still images of my heart.  One of
  these is useful, while the other is utterly pointless.
  Increasingly, companies that have a deep understanding of data -- of
  data about you -- have ways to use that data that are very profitable
  for them, but are not necessarily to your advantage.  So computing
  classes, of some kind, are a form of defense against this, an immune
  response to a world in which possession of and manipulation of data is
  increasingly a form of power.  You can only understand how data can be
  used if you have tried to use it yourself.
  So yes, computing classes... but not only as a defense :-).  It's also a
  great opportunity for schools to do something collaborative.  Too much
  of learning is about individual learning.  In fact, schools outlaw
  many forms of collaboration and call it cheating.  But in computing
  classes, the most natural thing to do is have the students form open
  source projects, or participate in existing open source projects.  Of
  course, the majority of students will not be good at it and should not
  be forced to do it.  This is true of any subject.  But for those who
  find it a natural fit, optional computer classes are a great opportunity
  that they might not have had otherwise.  So as a chance to expose people
  early to the pleasures of collaborative development, I think computing
  classes are important.  It will have an amazing effect for a subset of
  students, just as (say) music classes do.
  > Now one last question: what would be your advice to young programmers
  > wishing to enter the FLOSS community? Please answer with just one
  > sentence and not a whole book :-)
  Find an open source project you like (preferably one you use already)
  and start participating; you'll never regret it.