What would it take to create a free online library of all human knowledge? Brewster Kahle, founder of the Internet Archive, addressed the question at Museums & the Web yesterday. A “library of everything” — which he likens to a “Library of Alexandria, version two” — is within our grasp he says, but it’ll take time, money, and, most important of all, political will to digitize cultural content and make it accessible to all.
Best known for its Wayback Machine, a collection of snapshots of web sites from the last decade, the San Francisco-based Internet Archive was founded in 1996 and has been digitizing material ever since. In that time, Kahle’s come up with some pretty good guesses how much it’d cost to put various types of information online.
In all cases — books, videos, audio recordings, and software — the problem isn’t the cost of storage or digitizing, it’s getting rights from authors or the corporate copyright holders. For instance, to store the entire 28 million volume Library of Congress, it would take about 100 terrabytes of storage space, at a cost of around $150,000. Likewise, to scan and digitize a book would cost around $30 apiece. But legal fees to get a narrow three-year exemption to copyright laws so the Archive could preserve computer software cost $30,000 in legal fees alone.
Books: For starters, Kahle said it would only cost around $150,000 to store digitized versions of the U.S. Library of Congress’s 28 million books (he said it’d take around 100 terrabytes of storage space). But the problem is how to affordably scan such books — or the remaining 70-odd million titles in the world. Kahle bought 100,000 books and shipped them to India to be scanned and shipped back to the US at around $10 apiece. If libraries could scan book pages here in the US, it’d be much cheaper — around a dime a page, using the Scribe, a book-scanning station Internet Archive devised.
He also spoke enthusiastically about accessibility projects, like the Archive’s digital bookmobile that allows individuals to print and bind titles from a list of a million digital books. The cost is a penny a page, or a dollar for a 100-page book. Compare that, Kahle said, to the costs of lending books, which Harvard study put at $3 per book. The Archive had bookmobiles in Uganda (“It was kind of cool to have kids making the first book they’ve ever owned”) and Egypt, near the site of the library of Alexandria.
Kahle showed off one of the first 300 $100 laptops created by MIT through the One Laptop Per Child project. The tiny computers have a swivel screen that turns a traditional laptop to a flat e-book reader, and has access to Archive’s book list.
Audio: As with books, storage of audio is relatively affordable, and the Internet Archive makes a pretty hard-to-refuse deal: they’ll host anyone’s audio online free, forever. Music tapers have jumped at that offer, and Kahle says his site offers free access to more than 36,000 live concerts, all posted with the permission of artists and “including everything the Grateful Dead has ever done.” In its ever-growing colleciton, the Internet Archive has 100,000 audio items online, from Mother Jones Radio to Berkeley Groks Science Radio, the Tse Chen Ling Buddhist Lectures to Free Speech Radio News’ broadcasts.
Moving Images, Software: There are probably as many as 200,000 large-scale movies (of Hollywood/Bollywood type) in the world. Only around 800 public-domain movies are online, plus around 55,000 other videos of user-generated content, political speeches, historic films like those in the Prelinger Archive, etc. Kahle said he’s suirpirsed at the popularity of two items: stop-action videos made using Lego figures, and “speed runs,” videogamers who record themselves playing games as fast as possible and documenting their process (Kahle said the IA server crashed this week because of the popularity of one such video).
The Internet Archive also collects software, retrieving old applications from floppy disks and other old media. With funding and storage space set up, the problem, again, has been rights. Thanks to the Digital Millennium Copyright Act, the Archive had to spend $30,000 in legal fees to get a three-year, “very narrow exemption” that would allow them to archive software. (He described the DMCA as “a sort of Soviet-era law. Everything’s illegal unless we tell you it’s OK.”)
As Kahle linked the discussion to the museum technologists in the room, he pointed out the problem with creating an online Library of Alexandria.
“The Library of Alexandria,” he said, “is best known for, er, burning.”
The lesson: “Don’t just have one copy.” He said the Internet Archive has multiple copies of everything, including a set that was gifted to the Library of Alexandria itself. In exchange, the library traded materials in Arabic.
As he closed, Kahle challenged museums and nonprofits to step up and be more active in the digitization and presentation of materials. If nonprofits don’t, corporations will govern the discussion, likely putting material behind paywalls or making it accessible only through their own proprietary websites.
He summed it up succinctly: “Public or perish.”
“ If we don’t take a strong role in building public services in the public sphere, I think we’ll have a diminishing role in the future except as a physical repository of artifacts,” he said.