Brewster Kahle Impressionary Transcript

Brewster Kahle on Universal Access to All Knowledge

An Impressionistic Transcript of Brewster Khale's talk at Wikimania...

You are really on to something. There is something big going on here and a lot of these talks are about trying to figure it out. At the beginning of the talk I'll talk about things I know about and at the end talk about things I don't know -- how is that for a VC pitch?

I'll talk about open source, open content and the rise of the technical non-profits. Universal Access to All Knowledge is a big goal, and if you accomplish it, what are you going to do? Move to Florida?

Wikipedia is the 15th most popular site on the web. This is because of the enlightenment goal. The goal isn't a technical one, but a structural one. Despite centuries old balance...1976 radical expansion of US Copyright Regulation. Property of IP is perhaps the worst idea since the Domino Theory. Information is knowledge, not property. Valenti's crowning achievement radicalized copyright regulation. Most people talk about 130 year protection, but it is the vast scope and repercussions.

First casualty was software. The response was Open Source licenses. MIT's sale to Symbolics, which forked development and RMS' experience lead to open source. This is Brewster's revisionist history, but it may be where it came from.

The second casualty was Music and Video. The response was Creative Commons licenses. Another response was organizations to facilitate community effort. We lost the help of institutions like MIT so we built new ones. The Free Software Foundation. DejaNews was a for profit, sold to Google, dissapated. IMDB, 6 guy community project was bought by Amazon. CDDB became Gracenote, Inc. WAIS Inc was sold to AOL. FTP Software sold to NetManage. Cygnus sold to Redhat. All commercial companies built upon community effort that don't last long. FSF is still around.

The response is the rise of the technical non-profit. Apache software foundation has no full-time employees, but is incorporated to last a while. OASF has gotten money not only from Mitch but from Foundations. Mozilla Foundation is a great success with Firefox and the Google toolbar (money) they spun off a for-profit company. Interesting ecology to watch and try and understand what it means. Linux. Internet Archive is based on the open access model -- can we get paid for the administration we do so everything we do can be openly accessible. Wikimedia foundation you know about. The rise of the technical non-profit is an interesting addition to the ecology, we went wrong with the over-corporatization post WWII. EFF., Public Knowledge and Open Content Alliance exist to enforce rights and serve us. We massively screwed up our law structure and the general approach of knowledge of property.

Open Hardware. Petabox, a cheap machine that is open sourced. The $100 Laptop Program has interest in the order of 5-10 million. What would happen if the next major laptop company is a non-profit? It is because they are non-profit that they are trusted and base on open work.

The structure is now in place to proceed towards Universal Access to All Knowledge. We have institutions dedicated towards these goals, but how are we doing towards it?

In Text, getting the 26-28 million books in the Library of Congress. 1 megabyte for a book, 26 terabytes, $60k cost for the entire library on a Linux machine. But I actually like books, the printed page. Created the mobile bookmobile, which has printed a million books. The cost is a penny a page, a buck a book means you can give books away. In our first debut of this was the supreme court when they were arguing to extend copyright another 20 years, but we lost that one. Eric Eldred has one, two in India, one in Egypt, one in Uganda... this gets closer to universal access, but what we realized is we need to scan more books. One way to do this is send them somewhere else. The Million Books project sends them to India, but we had to buy 100k books to send to them, but not many others wanted to send books to them. So the Indians were scanning their own books, which may be the right thing for them to do. Put the scanners next to the books. Sending to India to scan is $10 per book, in the US it is $30. The automatic scanners are not effective, so we made our own scanner and can do it at 10 cents a page. Scanning 400 books a day. $750 million dollars to digitize the Library of Congress. About a year an a half of the LoB budget.

Books are within our grasp technologically. There are issues about if it will be done by non-profits or projects like Google Books. We have an orphaned works problem. The way you ask a question in the US is through a lawsuit, Khale and Eldred. But if you get to frame a problem (orphaned works) you have already won. Who would forget the orphans? Give the orphans a home!

Next is in-print works. Amazon is working the other way, from print to out of print. We have found with the Open Content Alliance something that works. Even Microsoft is giving us money.

In Audio, if you take all the published works, there are 2-3 million musical works. A fairly litigated area. Some precedent that ripping them and putting them online might not be okay. A lot of musicians just looking to be put on the internet. The Grateful Dead allowed people to trade music. The key was, as long as no one was making any money. This allowed people to feel good about it. Legitimate bootlegging copies by other bands. So we went to this community and said: "would you like unlimited storage and bandwidth for free." They said, "we don't believe you." And they didn't like lossy compression. We said try us. Got lots of Okays. 2k bands, 30k recordings, everything the Grateful Dead played. Many versions of each concert, as there are debates over microphone types. If you give something for free, not only is it not taxed, but you get a tax rebate. Getting Slashdotted is a nightmare, your ISP bills could make you sell your guitar or house. Europe has a different copy-write scheme for performances (50 years), so we are working with the Dutch government to make old stuff free.

In Moving Images, 100-200,000 films. Not much, makes putting them online conceivable. We want to do this with DVD quality, but we are finding lots of archival films that never had distribution. Have 30k films on the Archive, dwarfed by YouTube, which is cool. Discovering genres like Lego Movies. Lots of these things end up in closets. Putting them online is $15 per video hour. We will host it, if it generally belongs in a library and it is okay to share it.

Television, we have a big Tivo, captured a Petabyte so far of 20 channels over a couple of years. We made one week available, the week of 911, we put online a month after. We are now understanding in the US that the news comes with a point of view. Chomsky used to say you should read 7 newspapers a day, recently this might make sense. Getting multiple points of view.

Television is technologically possible, there are some rights issues, but we could do it all -- all text, music, movies and TV is within our grasp. We got a change in the DMCA, yea! But we need a lot of help.

Web. We are best known from our web collection, about a Petabyte in size. In the history of libraries, they tend to get burned, usually by governments, and then they are sorry for 100 years, but it is too late. The lesson from the Library of Alexandria is don't just have one copy. Give copies away. Our first shot at this was with the Library of Alexandria version 2. If we had six or seven of these around the world I could sleep at night. We are trying to do this through large scale swap agreements.

Here is Wikipedia in the Archive. But most people are using it to look at their own stuff, their old websites. One of the reasons this is working is because we are non-profits.

Books, Music, Video, Software and Web -- it is all possible. Some open questions if it is public or private, for-profit or non-profit. Is Google the only shot we are going to have at scanning Harvard's library? Looks like it.

I'm going to use this opportunity to advertise some projects we need help on.

Non-profit Open Networks like SeattleWireless.net or MIT roofnet. Telecom company interests are not aligned with an open internet.

Distributed ownership network -- SFlan mesh network.

Open and transparent Web Search System -- Nutch. Let's build some alternatives and be more creative. Recall which does time-based search on the whole Archive, a project done by one woman that indexed more pages than Google, then she went to work for Google and hopefully she will come back.

Privacy and Anonymity. It is now known that the US Government is monitoring us. Tor.

Defensive Patent License. What if you did a GPL for Patents? The DPL is a license that reflects a public commitment to defense, so our patents are forever defensive. Any organization may freely use these licensed patents while so publicly committed to defense.

An Open Textbook system, started by Wikipedia. The number one request we get for books is textbooks.

Add Attribution to Wikipedia. Gutenberg guys didn't were nervous about the copyright thing. We should know where the facts from Wikipedia came from. Go read about Transclusion with Ted Nelson, backpointers. Richard Feynman, a physicist in 1982, was talking about how many layers it would take from Propedia to Micropedia to books as sources.

Open Library: annotate the book collection. Why is this book interesting to someone in the modern world. What can we do to re-inject old books into today?

We can pull off Universal Access to All Knowledge. This is where Wiki is going towards, one of the great things that humanity will be remembered for, up there with a Man on the Moon in the mythology of humanity.

Provided by the notetaker, [source]