“Cosette Sweeping,” illustration from Victor Hugo
Literary translation is a subjective effort to convey a writer's creative choices into another language. Word-for-word accuracy is neither required nor expected -- but fidelity to the overall meaning and emotional content is what determines a successful endeavor.
How emotionally faithful is a piece of translated text? Is there a consistent, accurate way to measure that dimension? Can sentiment analysis be a useful tool? Let's look at short, but emotionally and structurally significant passages from three novels from the Western canon.
Methodology: Data project for the Lede Program at Columbia University, summer 2023
Tools and Processes:
-- Excerpts from Les Miserables, In Search of Lost Time and Wuthering Heights and their
translations obtained from Project Gutenberg
-- sentiment analysis performed with the Cardiffnlp twitter-XLM-roBERTa-base
-- Data visualization performed with Flourish and D3/>Svelte.
-- Code and data available on GitHub
Anyone who's read the rousing final volume of Victor Hugo's protean novel -- and anybody's who's seen the long-running musical comedy -- knows that this is an epic full of tears, death, suffering and redemption. Between the work houses, the prison labor camps and the student riots, the body count is high and the wine flows. But even with this colorful backdrop, the death of street urchin Gavroche represents the culmination of all the injustices and cruelties endured by the dispossessed. It is the absolute moral low point of the entire novel.
The insurgency has failed, and the small group of students retrenched behind their crumbling barricades know that their fate is sealed. Courageous little Gavroche ventures into the open, within shooting distance of the soldiers, in a desperate attempt to collect ammunition. He dances with death until, at last, he is hit by a marksman's bullet. He continues to sing for a brief moment until a second bullet kills him. There's horror but also playfulness, childish fun -- Gavroche feels invulnerable until he's shot dead.
The original French version and its English translation were carefully divided into 58 pieces of text, then analyzed by twitter-XLM-roBERTa-base for Sentiment Analysis, a multilingual model trained on almost 200 million tweets.
(RoBERTa, short for “Robustly Optimized BERT Approach”, is a variant of the BERT (Bidirectional Encoder Representations from Transformers) model, which was developed at Facebook AI).
The French original is on the left, and the English translation on the right. Deep blue denotes a "positive"
sentiment with a score of more than 0.5; light blue is "positive" with a score of less than 0.5; light gray is
"neutral"; light red is a "negative" sentiment with a score of less than 0.5; and finally the deep red that
dominates the French version means "negative" with a score of more than 0.5.
In other words: blue is positive, red is negative, gray is neutral.
Click on any bar to read the actual text.
No need to be bilingual to be struck by the vast differences in sentiment between the two versions. Out of 58 text elements, the English translation has 26 neutrals and just 28 negatives, while the French original has 41 negative labels and just 12 neutrals -- an unsurprising ratio for a text that describes the death of a child.
The most striking discrepancy lies at the very end of the text, when a second bullet fatally strikes the wounded boy:
"This time he fell face downward on the pavement / and moved no more. / This grand little soul had taken its flight."
While the French version gets a "negative" label for all three pieces of text, with scores ranging from 0.48 to 0.69, the English text gets: "neutral" with a 0.52 score, "neutral" with a 0.54 score, and a puzzling "positive" with a 0.47 score.
What could explain such a contrast? Could it be that the twitter-XLM-roBERTa-base model, trained on the syntax and content of 21st-century tweets, fails to correctly interpret a work of 19th century fiction full of metaphors and emotion?
Another hypothesis: would it be possible that, by nature, translations tend to lean toward a safer, more neutral version of the same phrases and ideas?
Let's take another work famous for its depiction of complex human emotions: Marcel Proust's titanic "In Search of Lost Time" hexalogy. In this passage, from the first novel, "Swann's Way", the narrator's long-buried memories of a painful lost love are searingly, "maddeningly" awakened by a little musical phrase from a violin sonata. Even the harshest pang of suffering is intimately connected with the remembered joy of lost love.
Here again, the original French version and its English translation were divided into 40 pieces of text and analyzed by twitter-XLM-roBERTa-base.
The French original is on the left, and the English translation on the right.
Again, deep blue denotes a "positive" sentiment with a score of more than 0.5; light blue is "positive" with a
score of less than 0.5; light gray is "neutral"; light red is a "negative" sentiment with a score of less than
0.5; and deep red means "negative" with a score of more than 0.5.
Click on a bar to read the text.
While the contrast is a bit less striking, it is undeniable that the original has more red bars of negative sentiment than the translation, which is dominated by neutrals.
The French version starts briskly: the red bars in the first 10 lines illustrate the physical reaction to the sudden recollection ("and this apparition tore him with such anguish / that his hand rose impulsively to his heart"). After fluttering in a desperate attempt to stave off the flood of memories (blues and neutrals in the middle section), it finally gives way to their "maddening" song -- in a confusing mix of pain and joy.
The model clearly finds the English version more neutral -- 23 of the 40 pieces, compared with only 9 negatives. The French version gets 18 negatives and 14 neutrals.
Still, puzzling labels abound: why is the phrase "When Odette was in love with him" score a 0.6 negative in French, but a 0.65 neutral in English? And why is the model finding the final clause, "the forgotten strains of happiness", rate a negative 0.48 in English and a 0.66 positive in French?
For good measure, I decided to take a look at English-to-French translation, and use twitter-XLM-roBERTa-base on another emotionally-charged work of fiction: Wuthering Heights. In this scene, Heathcliff reappears after a three-year absence. Everything has changed -- his beloved Cathy is married to Edgar Linton, and Heathcliff himself is a grown, somber man bent on revenge. But when he sees her, their shared delight consumes everything else.
"They were too much absorbed in their mutual joy / to suffer embarrassment." "You don't deserve this welcome / to be absent and silent for three years / and never to think of me!" "I've fought through a bitter life / since I last heard your voice; / and you must forgive me, / for I struggled only for you!"
In a bit more than 40 slices of text, there is joy, longing, anger and self-destruction. Will the French translation successfully convey this explosion of feelings, or will it water things down as we saw in our first two examples?
The French version is on the left, and the English on the right.
Click on a bar to read the text.
And yet again, a full half of the French text is negative, with just 11 neutrals and 10 positives. In the English text, neutral labels represent half of the rows, with 17 negatives to and just 4 positives.
Based on this small corpus, we observe that when it comes to literary translations, the Cardiff Roberta model finds French sentences more polarized than their English counterparts.
Hugo, with his intentional digressions and infinite subplots, and Proust, with his spellbinding sentences, are both challenging to translate into another language.
This experiment only used three tiny samples, but could serve as a jumping-off point for broader comparisons. Sentiment analysis may very well be a way to test translation quality and translation effectiveness. New translations are published all the time, to better fit the evolution of the destination language. Much more could be done, with more languages, and on texts well outside the Western canon.
Visualizing these diverging sentiment readings led us to experiment with several options.
One option was to visualize not only the label but also the score of each text element. Using the Svelte framework, we gave a shape (circles) and colors to the labels and used a force simulation algorithm to show the contrasting analyses.
While it has the benefit of displaying the score of each text element, the force graph doesn't allow a comparison between each specific snippet. That is something a horizontal bar chart can achieve -- and the option we eventually chose. What is being lost in the score is compensated by the clear display of the sequential nature of the text being analyzed.
Code and data available on GitHub