Breaking Bard: Using Microsoft AI to unlock Shakespeare’s greatest works
Spoiler alert: At the end of Romeo and Juliet, they both die.
OK, as spoilers go, it’s not big. Most people have read the play, watched one of the famous films or sat through countless school lessons devoted to William Shakespeare and his work. They know it doesn’t end well for Verona’s most famous couple.
In fact, the challenge is finding something no one knows about the world-famous, 300-year-old play. That’s where artificial intelligence can help.
Phil Harvey, a Cloud Solution Architect at Microsoft in the UK, used the company’s Text Analytics API on 19 of The Bard’s plays. The API, which is available to anyone as part of Microsoft’s Azure Cognitive Services, can be used to identify sentiment and topics in text, as well as pick out key phrases and entities. This API is one of several Natural Language Processing (NLP) tools available on Azure.
By creating a series of colourful, Power BI graphs (below) showing how negative (red) or positive (green) the language used by The Bard’s characters was, he hoped to shine a new light on some of the greatest pieces of literature, as well as make them more accessible to people who worry the plays are too complex to easily understand.
Harvey said: “People can see entire plotlines just by looking at my graphs on language sentiment. Because visual examples are much easier to absorb, it makes Shakespeare and his plays more accessible. Reading language from the 16th and 17th centuries can be challenging, so this is a quick way of showing them what Shakespeare is trying to do.
“It’s a great example of data giving us new things to know and new ways of knowing it; it’s a fundamental change to how we process the world around us. We can now pick up Shakespeare, turn it into a data set and process it with algorithms in a new way to learn something I didn’t know before.”
What Harvey’s graphs reveal is that Romeo struggles with more extreme emotions than Juliet. Love has a much greater effect on him challenging stereotypes of the time that women – the fairer sex – were more prone to the highs and lows of relationships.
“It’s interesting to see that the male lead is the one with more extreme emotions,” Harvey added. “The longest lines, both positive and negative, are spoken by him. Juliet is steadier; she is positive and negative but not extreme in what she says. Romeo is a fellow of more extreme emotion, he’s bouncing around all over the place.
“Macbeth is also interesting because there are these two peaks of emotion, and Shakespeare uses the wives at these points to turn the story. I also looked at Helena and Hermia in A Midsummer Night’s Dream, because they have a crossed-over love story. They are both positive at the start but then they find out something and it gets negative towards the end.”
His Shakespeare graphs are the final step in a long process. After downloading a text file of The Bard’s plays from the internet, Harvey had to process the data to prepare it for Microsoft’s AI algorithms. He removed all the stage directions, keeping the act and scene numbers, the characters’ names and what they said. He then uploaded the text to Microsoft Cognitive Services API, a set of tools that can be used in apps, websites and bots to see, hear, speak, understand and interpret users through natural methods of communication.
The Text Analytics API is pre-trained with an extensive body of text with sentiment associations. The model uses a combination of techniques during text analysis, including text processing, part-of-speech analysis, word placement and word associations.
After scanning the Shakespeare plays, Microsoft’s NLP tool gave the lines of dialogue a score between zero and one – scores close to one indicated a positive sentiment, and scores close to zero indicated a negative sentiment.
However, before you start imagining a world in which only robots read books before telling humans the gist of what happened, Harvey discovered some unexpected challenges with his test.
While the AI system worked well for Shakespeare plays that contained straightforward plots and dialogue, it struggled to determine if more nuanced speech was positive or negative. The algorithm couldn’t work out whether Hamlet’s mad ravings were real or imagined, whether characters were being deceptive or telling the truth. That meant that the AI labelled events as positive when they negative, and vice-versa. The AI believed The Comedy of Errors was a tragedy because of the physical, slapstick moments in the play.
Everything you need to know about Microsoft’s cloud
Harvey realised that the parts of the plays that dealt with what truly makes us unique as humans – joking, elation, lying, double meanings, subterfuge, sarcasm – could only be noticed and interpreted by human readers. His project required AI working alongside humans to truly understand and fully appreciate Shakespeare.
Harvey insists that his experiments with Shakespeare’s plays are just a starting point but that the same combination of AI and humans can eventually be extended to companies and their staff, too.
“Take the example of customers phoning their energy company,” he said. “With Microsoft’s NLP tools, you could see if conversations that happen after 5pm are more negative than those that happen at 9am, and deploy staff accordingly. You could also see if a call centre worker turns conversations negative, even if they start out positive, and work with that person to ensure that doesn’t happen in the future.
“It can help companies engage with data in a different way and assist them with everyday tasks.”
Harvey also said journalists could use the tool to see how readers are responding to their articles, or social media experts would get an idea of how consumers viewed their brand.
For now, Harvey is concentrating on the Classics and is turning his attention to Charles Dickens, if he can persuade the V&A in London to let him study some of their manuscripts.
“In the V&A manuscripts, you can see where Dickens has crossed out words. I would love to train a custom vision model on that to get a page by page view of his changes. I could then look at a published copy of the text and see which parts of the book he worked on most; maybe that part went well but he had trouble with this bit. Dickens’s work was serialised in newspapers, so we might be able to deduce whether he was receiving feedback from editors that we didn’t know about. I think that’s amazing.”