Tuesday, December 13, 2005

Bookplanet: founder of Project Gutenberg (17,000 classic books digitized) on Johnny-come-lately Google

Project Gutenberg Fears No Google -- by VAUHINI VARA

Internet giants like Google Inc. and Yahoo Inc. are making headlines with their rival plans to create online libraries of books. Long before those companies even existed, though, there was Project Gutenberg: an ambitious, offbeat effort to digitize classic books by typing them out by hand.

The approach made a lot of sense back in 1971, when Project Gutenberg's founder Michael Hart was a student at the University of Illinois. He enlisted an army of volunteers to help in the effort, by pulling their own dusty volumes from attic shelves and transcribing them, word for word. The electronic versions were sent to Mr. Hart, who stored them on clunky university computers. Nearly 35 years later, Project Gutenberg has put more than 17,000 so-called e-books on its Web site. It continues to add more titles each week -- though most texts are now scanned rather than typed.

Mr. Hart, an eccentric technologist and bibliophile, still shepherds the effort on a shoestring budget from his computer-filled home in Urbana, Ill. But as bigwigs like Google, Yahoo, Inc. and Microsoft Corp. launch their own high-tech versions of what Mr. Hart has been doing for years, the 58-year-old Internet pioneer is feeling left behind. Mr. Hart spoke with the Online Wall Street Journal's Vauhini Vara about where Project Gutenberg stands, the challenges ahead and scanning Shakespeare. How and when did you first learn about the Google and Yahoo book-scanning projects? Did any of these companies approach you for advice or ask to collaborate with you?

Michael Hart: I talked to Google a year before Google's big announcement [in December 2004]. … They approached us. They sent us an email saying, "Hey, we'd like to talk to you." They let us tell them about all that we were doing. It took place at the big Google headquarters in Silicon Valley. They gave us a free lunch and everything. They were very polite, but very business-plan oriented. At a certain point, they sort of talked us out the door. So I heard about [Google Book Search] along with everybody else. Same with Yahoo.

Q: Why should there be multiple book-scanning efforts? Why don't you send your volunteers to work with one of the other guys? Wouldn't that be more efficient in achieving your goal of creating as many e-books as possible?

A: It's not that we don't want to work with them. Google didn't want to have anything to do with us. They want to do their own project. All of these places can legally use all of our books. If Google put up all of our books, that would be fine. I would have gladly worked with Google.

Q: How is Project Gutenberg different from what Google is doing?

A: Google is working from the top down. It's very centralized. Project Gutenberg is the opposite: It's decentralized, it's grassroots. From the consumer's point of view, if you're trying to get a quotation from a book, you could get the book from Project Gutenberg and cut and paste, say, the whole "Hamlet" soliloquy. On Google, you can't. Also, ours is totally non-commercial. You won't find advertising on any of our pages.

Q: Google and Yahoo are getting lots of attention for their efforts. Has the renewed interest in digitization helped Project Gutenberg's cause, or has it diverted attention from your project?

A: Google certainly got a billion dollars worth of publicity last December. I think we should have at least been mentioned. If you watched the whole media explosion, Project Gutenberg wasn't even mentioned. Anybody watching that would think that Google had just invented e-books. I do feel that the publicity has all gone to the people with the PR departments. We don't have a PR department. We don't have any press kits. We don't have any glossy little things to send you.

Q: How many Internet users download Project Gutenberg's books? Which are the most popular of the more than 17,000 books available on the project's Web site?

A: In a typical week, there are at least a million downloads. We get a lot of Thackeray downloads, a lot of James Joyce, a lot of Dickens. "Pride and Prejudice" is always up there. Sherlock Holmes is always up there. … There are always some you don't expect, like "Manners, Customs, and Dress During the Middle Ages, and During the Renaissance Period" by Paul Lacroix. …We also have reference material, which most people probably wouldn't think of -- like Roget's Thesaurus. Plus, the Koran, along with the Bible.

Q: Who are the Project Gutenberg volunteers?

A: Every once in a while I do a little survey: How old are you? Where are you from? What kind of schooling have you had? Why are you interested? There's no pattern at all, other than that they like books. It's everybody from kids to people who are 80 years old.

Q: Have the project's methods changed as technology has evolved?

A: Every year, more people use scanners than type them in by hand. Everything was typed in by hand until about 1989. Now, about 90% are scanned. But we do have enough people that like to type that some people worry that we might alienate them by stressing scanning. If we do put [an emphasis] on scanning, we might find that our volunteers that do typing will go away.

Q: What about you? Do you still manually type books or do you use a page scanner?

A: I'm mostly just holding on to the reins. I'm mostly an administrator. I don't get a chance to do the real thing. I've done both, and I can tell you from personal experience that it's a lot more fun to type a book. I once typed in a book that was 1,000 pages and it took nine months. Then I scanned the sequel, which was about 750 pages, and it took maybe three weeks. It's a huge amount more efficient, and a huge amount less fun.

Q: When you founded Project Gutenberg back in 1971, the Internet didn't yet exist. What was your goal for the project at the time and how has it changed as the Internet has moved into homes?

A: Nobody paid any attention to it at all for the first 17 years. The only attention I got was, "Oh, you're the guy who wants to put Shakespeare on the computer. Aren't you crazy." I talked about laptop computers that would hold the entire Library of Congress. The idea is the same today; it's just that the means of accomplishing the idea have accelerated.

Q: What are some of your favorite books or authors?

A: "Alice in Wonderland" was a family classic for us, and my dad was a Shakespeare professor. I do love Shakespeare.

Q: How long can Project Gutenberg keep going?

Every year, people have said, Project Gutenberg isn't going to be here next year. … And every year, I get to say, "Ha ha. Gotcha. You want to bet me on it next year?"

