Adam Ash

Your daily entertainment scout. Whatever is happening out there, you'll find the best writing about it in here.

Thursday, April 27, 2006

Bookplanet: Amazon decides how easy books are to read

Book, How Do I Love Thee? Let Me Count the Words
By NOAM COHEN

WHO would compare "The Story of Babar" to the prize-winning novel "Everything Is Illuminated"? Who would call James Joyce's "Ulysses," the bane of many an undergrad, a work for a seventh grader?

With the aid of software at Amazon.com known as Text Stats, anyone can make such comparisons, which are based on the crudest sort of computer analysis of a book: how many big words there are, and how long the sentences run.

Such simple statistical scrutiny has been around for decades — used to determine a book's appropriateness for a certain grade level, among other things. But software like Amazon's automates the process, and the Internet lets anyone see the results.

To what end? ask some literary scholars, who see such techniques as little more than superficial gimmicks. But others say they are a tool to gain insight into the authorship of and influences on a text, whether the work of Bob Dylan, Shakespeare or your average high school student.

When Amazon gets the right from a publisher to let readers "search inside" a book, Text Stats tallies the average length of a sentence and amasses little piles for each word used. (Or big piles, as in the case of the King James Bible, for example, where the count for "loin" is 1,548; "behold," 1,426; and "lord" 7,082.) The software then ranks a book for clarity and ease of reading on a variety of indexes.

For example, "The Story of Babar" has a Flesch-Kincaid Index score of 6.1 (sixth-grade level), the same as "Everything Is Illuminated" by Jonathan Safran Foer. Their "fogginess" quotients, an index similar to Flesch-Kincaid, are very close, too, though the Foer book is slightly less clear — 8 percent of its words are "complex," compared with 7 percent for "Babar." Text Stats also produces concordances, lists of the 100 most-used words in a book.

It is no surprise that the ratings made by computers, and the connections between books that they reveal, are often bizarre, since the software is not concerned with meaning and context and is unaffected by subjective factors like author reputation.

"It's machine reading; it is the kind of reading no one person can do," said Ben Marcus, director of the graduate fiction program at Columbia University and a novelist whose works are not accessible to Amazon's computers. "I think it is really fascinating, anything that takes us closer to a text, that makes us aware that it is put together to create an illusion."

The flaw is obvious, too. "The computer doesn't recognize how sentences relate to each other," he said. "Gertrude Stein or Beckett may write in elementary sentences, but they take such huge leaps between them." But that thickheadedness can be useful, some scholars say.

In "Alice in Wonderland," for example, a statistical study can "place this text against a large collection of 19th-century fiction to see which other works it resembles on a stylistic basis — what genre does it fit best, judging, say, from patterns of use of very common words?" Hugh Craig, who teaches at the University of Newcastle in Australia, wrote in an e-mail message. "But it would be essential to do the reading and analysis in the normal way as well, to see what it is that makes the patterns."

Richard Abrams of the University of Southern Maine said that he could get the big picture of a writer from statistical analysis. In preparing for a seminar on Mr. Dylan's lyrics, he said, he found it useful to consult a concordance of the 10 most used words in the lyrics, which included, he said, "babe" and "dark."

"For someone who had Dylan on the brain, there was an absolute sense of familiarity," he said. "You knew you were looking at a Dylan favorite word list, it showed Dylan as a Romantic."

Still, statistical analysis like this can bring to mind the reported critique of Mozart by the Austrian emperor Josef II: "too many notes."

Helen Vendler, the Shakespeare critic at Harvard, had not heard of Text Stats but speculated that "people will get bored by it — especially if it insults your intelligence by saying 'Ulysses' is at seventh-grade level." Likewise, she said a "concordance is not particularly interesting reading."

Amazon says it likes Text Stats because it keeps readers at the site longer comparing and contrasting books. "It is definitely a feature that we view as having a 'sticky' aspect," said Brian Williams , the senior product manager in charge of the Text Stats functions at Amazon. Mr. Williams said he had heard complaints about the rating of "Ulysses" but explained that Text Stats was "just one tool." He said he had read blog postings from authors discussing their score, always tongue and cheek. "It should be tongue and cheek," he said.

0 Comments:

Post a Comment

<< Home