How scientists are using big data analysis to deconstruct the art of storytelling
Our most beloved works of fiction hide well-trodden narratives. And most fictions is based on far fewer storylines than you might have imagined. To come to this conclusion, big data scientists have worked with colleagues from natural language processing to analyse the narrative in more than a thousand works of fiction.
By deconstructing some of the magic of narrative in fiction books, they have also confirmed that there are six different, common ways of telling a story that can be found time and time again in popular stories. They were inspired by the work of US fiction author Kurt Vonnegut, who originally proposed the similarity of emotional story lines in a Masters’s thesis rejected by the University of Chicago. These findings have just been published in EPJ Data Science by Andrew Reagan from the University of Vermont, USA, and colleagues.
Annotated emotional arc of Harry Potter and the Deathly Hallows, by JK Rowling.
The authors selected 1,327 books, representative of English works of fiction, from the 50,000 books included in a major open access literature digitisation project called the Gutenberg project. They then applied three different natural language processing filters used for sentiment analysis to extract the emotional content of 10,000-word stories.The first filter—dubbed singular value decomposition—reveals the underlying basis of the emotional storyline, the second—referred to as hierarchical clustering—helps differentiate between different groups of emotional storylines, and the third—which is a type of neural network—uses a self-learning approach to sort the actual storylines from the background noise. Used together, these three approaches provide robust findings, as documented on the hedonometer.org website.
Reagan and colleagues thus determined that there were only six main emotional storylines. These include ‘rags to riches’ (sentiment rises), ‘riches to rags’ (fall), ‘man in a hole’ (fall-rise), ‘icarus’ (rise-fall), ‘Cinderella’ (rise-fall-rise), ‘Oedipus’ (fall-rise-fall). This approach could, in turn, be used to create compelling stories by gaining a better understanding of what has previously made for great storylines. It could also help teach common sense to artificial intelligence systems.