The folks at Framabook graciously sent me some copies of the print version of their French translation of my book (in French, “Produire Logiciels Libres“, in English, “Producing Open Source Software“). They also sent some questions for an online interview to accompany the release, and Olivier Rosseler translated my responses.
The French version of the interview is now up at www.framablog.org/index.php/post/2011/04/10/karl-fogel-interview. I’m posting the English original here, and thank them very much for asking such provocative questions.
From: Karl Fogel
To: Christophe Masutti, Alexis Kauffmann
Subject: Re: Interview french version POSS
Date: Fri, 11 Mar 2011 19:05:00 -0500
Christophe Masutti writes:
> Hi Karl, could tell a few words about yourself to our French speaking
> readers?
>
> The French version of POSS has just been published, and your book was
> translated or is being translated in other languages. What are your
> feelings about all theses remixes of your work, all made possible
> because you chose to put your book under a free licence?
My feelings are 100% positive. This has simply no downside for me. The
translation makes the book accessible to more readers, and that's
exactly what I want. I'm very grateful to all the translators.
> If you were to write a second version of POSS today, what would you
> change in it or add to it? By the way, do you plan on doing such a
> rewriting?
Well, in fact I am always adjusting it as open source practices change.
The online version evolves steadily; maybe eventually we'll announce
that some kind of official "version 2.0" has been reached, but really
it's a continuous process.
For example, five or six years ago, it was more common for projects to
run their own development infrastructure. People would set up a server,
install a version control system, a bug tracker, a mailing list manager,
maybe a wiki, and that would be where project development happens.
But there's been a lot of consolidation since then. Nowadays, only the
very largest and very smallest projects run their own infrastructure.
The vast majority use one of the prebuilt hosting sites, like GitHub,
Google Code Hosting, SourceForge, Launchpad, etc. Most open source
developers have interacted with most of these sites by now.
So I've been updating the part of the book that talks about hosting
infrastructure to talk more about using "canned hosting" sites like the
above, instead of rolling your own. People now recognize that running a
hosting platform, with all its collaboration services, is a big
operational challenge, and that outsourcing that job is pretty much
required if you want to have time to get any work done on your project.
I've also updated the book to talk about new versions of open source
licenses (like the GNU General Public License version 3, that came out
after the book was first published), and I've adjusted some of the
recommendations of particular software, since times have changed. For
example, Git is much more mature now than it was when I first wrote the
book.
> FLOSS is being produced pretty much the same way now than five years
> ago. But forges have appeared that differ from the SourceForge model.
> I'm thinking of GoogleCode, and especially GitHub. GitHub can be
> considered as the "Facebook" of Open Source forges, in the way that
> they offer social network functionalities, and that it is possible to
> commit directly from one's browser. The notion of "fork" here is
> different from what we are used to. What do you think about all that?
Actually, I think the notion of forking has not changed -- there has
been some terminological shift, perhaps, but no conceptual shift.
When I look at the dynamics of how open source projects work, I don't
see huge differences based on what forge the project is using. GitHub
has a terrific product, but they also have terrific marketing, and
they've promoted this idea of projects inviting users to "fork me on
GitHub", meaning essentially "make a copy of me that you can work with".
But even though there is a limited technical sense in which a copy of a
git-based project is in theory a "fork", in practice it is not a fork --
because the concept of a fork is fundamentally political, not technical.
To fork a project, in the old sense, meant to raise up a flag saying "We
think this project has been going in the wrong direction, and we are
going to take a copy of it and develop it in the right direction --
everyone who agrees, come over and join us!" And then the two projects
might compete for developer attention, and for users, and perhaps for
money, and maybe eventually one would win out. Or sometimes they'd
merge back together. Either way, the process was a political one: it
was about gaining adherents.
That dynamic still exists, and it still happens all the time. So if we
start to use the word "fork" to mean something else, that's fine, but it
doesn't change anything about reality, it just changes the words we use
to describe reality.
GitHub started using "fork" to mean "create a workable copy". Now, it's
true that the copy has a nice ability to diverge and remerge with the
original on which it was based -- this is a feature of git and of all
decentralized version control systems. And it's true that divergence
and "remergence" is harder with centralized version control systems,
like Subversion and CVS. But all these Git forks are not "forks" in the
real sense. Most of the time, when a developer makes a git copy and
does some work in it, she is hoping that her work will eventually be
merged back into the master copy. When I say "master" copy, I don't
mean "master" in some technical sense, I mean it exactly in the political
sense: the master copy is the copy that has the most users following it.
So I think these features of Git and of GitHub are great, and I enjoy
using them, but there is nothing revolutionary going on here. There may
be a terminology shift, but the actual dynamics of open source projects
are the same: most developers make a big effort to get their changes
into the core distribution, because they do not want the effort of
maintaining there changes independently. Even though Git somewhat
reduces the overhead of maintaining an independent set of changes, it
certainly does not reduce it so much that it is no longer a factor.
Smart developers form communities and try to keep the codebase unified,
because that's the best way to work. That is not going to change.
> In June 2010, Benjamin Mako Hill remarked in his "Free Software Needs
> Free Tools" article that hosting open source projects on proprietary
> platforms was kind of a problem. According to you, is this a major
> problem, a minor one, or is it no problem at all?
> http://mako.cc/writing/hill-free_tools.html
Well, I know Mako Hill, and like and respect him a great deal! I think
I disagree with him on this question, though, for a couple of reasons.
First, we have to face reality. It is not possible to be a software
developer today without using proprietary tools. Only by narrowing the
definition of "platform" in an arbitrary way is it possible to fool
ourselves into thinking that we are using exclusively free tools. For
example, I could host my project at Launchpad, which is free software,
but can I realistically write code without looking things up in Google's
search engine, which is not free software? Of course not. Every good
programmer uses Google, or some other proprietary search engine, daily.
Google Search is part of the platform -- we cannot pretend otherwise.
But let's take the question further:
When it comes to project hosting, what are the important freedoms? You
are using a platform, and asking others to use it to collaborate with
you, so ideally that platform would be free. That way, if you want to
modify its behavior, you can do so: if someone wants to fork your
project (in the old, grand sense), they can replicate the hosting
infrastructure somewhere under their control if absolutely necessary.
Well, that's nice in theory, but frankly, if you had all the source code
to (say) Google Code Hosting, under an open source license, you still
would not be able to replicate Google Code Hosting. You'd need Google's
operations team, their server farms... an entire infrastructure that has
nothing to do with source code. Realistically, you cannot do it. You
can fork the project, but generally you are not going to fork its
hosting platform, because you don't have the resources. And since you
can't run the service yourself, you also can't tweak the service to
behave in the ways you want -- because the people who run the physical
servers have to decide which tweaks are acceptable and which aren't. So
in practice, you can't have either of these freedoms.
(Some hosting services do attempt to give their users as much freedom as
possible. For example, Launchpad's code is open source, and they do
accept patches from community members. But the company that hosts
Launchpad still approves every patch that they incorporate, since they
have to run the servers. I think SourceForge is about to try a similar
arrangement, given their announcement of Allura yesterday.)
So, given this situation, what freedom is possible?
What remains is the freedom to get your data in and out. In other
words, the issue is really about APIs -- that is, "application
programming interfaces", ways to move data to and from a service in a
reliable, automatable way. If I can write a program to pull all of my
project data out of one forge and move it to a different forge, that is
a useful freedom. It means I am not locked in. It's not the only
freedom we can think of; it's not even the ideal freedom. But it's the
practical freedom we can have in a world in which running one's own
servers has become prohibitively difficult.
I'm not saying I like this conclusion. I just think it is reality. The
"hunter gatherer" phase of open source is over; we have moved into the
era of dependency on agricultural and urban infrastructure. You can't
dig your own irrigation ditches; you can't build your own sewer system.
It's too hard. But data portability means that if someone else is doing
a bad job of those things, you can at least move to someplace that is
doing a better job.
So I don't care very much that GitHub's platform is proprietary, for
example. Of course I would prefer it to be entirely open source, but
the fact that it is not does not seem like a huge problem. The thing I
look at first, when I'm evaluating any forge-like service, is: how
complete are their APIs? Can I get all my data off, if I need to? If
they provide complete APIs, it means they are serious about maintaining
the quality of the service, because they are not trying to lock in their
users through anything other than quality of service.
> In France, high school and junior high students don't have computing
> classes. Do you think computing as a subject -- and not only as a tool
> for other subjects -- should be taught in schools?
Absolutely. The ability to understand data and symbolic processing is
now very important. It's a form of literacy. You don't have to be a
programmer, but you need to understand roughly how data works. I had a
conversation the other day that showed this gap in a very clear way.
I was at the doctor, having some tests done. The test involved a video
image of my heart beating (using an ultrasound device), and the entire
sequence was recorded. It was amazing to see! So afterwards, I asked
at the front desk if I could get the data. Those were my exact words:
"Can I please get all the data from that echocardiogram?" The clerk's
reply was that they could give me a sheet with low-resolution pictures.
"Thanks, but I actually want the data," I replied. Yes, she said,
that's what she was offering. To her, the phrase "the data" did not
have the very specific meaning it does to the data-literate. What I
meant, of course, was that I wanted every single bit that they had
recorded. That's what "all the data" means, right? It means you don't
lose any information: it's a bit-for-bit copy. But she didn't have a
definite concept of data. To her, data means "something that I can
recognize as being related to the thing requested". For me, it was
informational and computational; for her, it was perceptual.
I realize this sounds harsh, but I really believe that is a form of
illiteracy today. You have to recognize when you are getting real
information versus fake information, and you have to understand the vast
difference in potential between the two. If I go to another doctor,
imagine the difference between me handing her a USB thumb drive with the
complete video recording of my echocardiogram, and handing her some
printouts with a few low-resolution still images of my heart. One of
these is useful, while the other is utterly pointless.
Increasingly, companies that have a deep understanding of data -- of
data about you -- have ways to use that data that are very profitable
for them, but are not necessarily to your advantage. So computing
classes, of some kind, are a form of defense against this, an immune
response to a world in which possession of and manipulation of data is
increasingly a form of power. You can only understand how data can be
used if you have tried to use it yourself.
So yes, computing classes... but not only as a defense :-). It's also a
great opportunity for schools to do something collaborative. Too much
of learning is about individual learning. In fact, schools outlaw
many forms of collaboration and call it cheating. But in computing
classes, the most natural thing to do is have the students form open
source projects, or participate in existing open source projects. Of
course, the majority of students will not be good at it and should not
be forced to do it. This is true of any subject. But for those who
find it a natural fit, optional computer classes are a great opportunity
that they might not have had otherwise. So as a chance to expose people
early to the pleasures of collaborative development, I think computing
classes are important. It will have an amazing effect for a subset of
students, just as (say) music classes do.
> Now one last question: what would be your advice to young programmers
> wishing to enter the FLOSS community? Please answer with just one
> sentence and not a whole book :-)
Find an open source project you like (preferably one you use already)
and start participating; you'll never regret it.
Best,
-Karl