Data Analysis Suggests Only Six Book Plots Exist

By Angela Chen on at

Everyone who secretly thought that all novels have the same plot can feel vindicated by new research supporting that suspicion. Sort of.

Researchers from the Computational Story Lab at the University of Vermont in Burlington used sentiment analysis—or analysis of emotion in a string of words—to map the plot of over 1,700 works of fiction. By looking at how the emotional tone of a story changes from moment to moment, the researchers could see the overall emotional arc of the stories.

They found that there were six main ones:

  • Fall-rise-fall, like Oedipus Rex
  • Rise and then a fall, like what happens to most villains
  • Fall and then a rise, like what happens to most superheroes
  • Steady fall, like in Romeo and Juliet
  • Steady rise, like in a rags-to-riches story
  • Rise-fall-rise, like in Cinderella

Within these general shapes, of course, there were many mini emotional arcs. The most popular stories follow the “fall-rise-fall” and “rise-fall” arcs, which might explain how superhero movies and Greek tragedies continue to be so endearing. The researchers even created a website where you can look at the exact graphs of the books they analysed.

This work is hardly new, seeing how literary theorists have tried for centuries to figure out how many plots there really are. In 1849, the French critic Georges Polti declared that there were “36 dramatic situations.” In the 1910s, scholars like Vladimir Propp, a Russian folklorist, travelled widely and built careers diagramming the basic plots of fairy tales. Since then, folklorists have built this work into formal systems like the Aarne-Thompson classification system. And in 1995, Kurt Vonnegut gave a lecture where he drew “the shapes of stories” on a board and said he believed they could be “fed into computers.”

Of course, none of these people had the data-mining techniques that the Burlington researchers did, but their work should still be taken with a grain of salt as well. First, they only analysed 1,700 pieces of fiction, which is hardly representative of the state of literature. Secondly, these were all English-language stories that had been downloaded from Project Gutenberg at least 150 times, meaning that they were all in the public domain and not contemporary works. It’s likely that the results would be different if they’d expanded to non-English literature or more recent literature.

The arcs themselves are so broad as to be almost obvious, and it can be hard to distinguish which one to use. Sure, Romeo and Juliet can be seen as a steady downfall, but wasn’t there also a rise when they agree to be together? It’s interesting to have data backing up the work of so many scholars—so long as nobody takes this to mean they only ever need to read six books. [arXiv via MIT Technology Review]