Introduction
We're still analyzing the comments submitted to President Ryan’s Ours to Shape website.
In the fourth installment of this series (we’re almost done, I promise), we’ll look at the sentiment – aka positive-negative tone, polarity, affect – of the comments to President Ryan’s Ours to Shape website.
We don’t have a pre-labeled set of comments, with negative or positive sentiment already identified, so we can’t use a supervised classification method (and I’m not committed enough to hand code a sample of comments). Instead, we’ll use a lexicon-based approach, using a predefined dictionary of positive and negative words and counting up their presence in the corpus of comments.
In the last post, we removed a few duplicate comments by the same contributor, so we’ll be working with a corpus of 842 distinct contributed comments (as of December 7, 2018).
The full code to recreate the analysis in the blog posts is available on GitHub.
Sentiment Analysis
There are many, many, many packaged sentiment dictionaries available. They should always be chosen with care, with attention to how they were created – crowdsourcing, grounded theory, algorithmically based on a labelled corpus – and for what purpose or context – for tweets, novels, newspapers.
I’ll use one with which I’m familiar – the Lexicoder Sentiment Dictionary. The LSD dictionary was created from previous sentiment dictionaries, widely used in in political science and psychology, but cleaned of ambiguous and problematic words. It’s tailored to political texts – did I mention I’m a political scientist – but I’d suggest the feedback to UVA represented by the Ours to Shape comments are political.
Here’s a sampling of the words categorized as positive...
[1] "gentlest" "upside" "kidding" "rightful*" "laud*"
[6] "good" "high end" "nabob*" "courting" "curious*"
and as negative...
[1] "chilli*" "dictat*" "dung" "kick" "carps"
[6] "vice" "unfavourab*" "misspent*" "direful" "drunk*"
There are 2,858 negative words and 1,709 positive words in all.
I apply this dictionary to the dfm of the comments to generate a count of the number of times words in the positive dictionary appear in the comment and the number of times words in the negative dictionary appear in the comment.
Document-feature matrix of: 5 documents, 2 features (10.0% sparse).
5 x 2 sparse Matrix of class "dfm"
features
docs negative positive
1 0 10
2 4 2
3 2 16
4 17 31
5 19 20
I divide the positive and negative counts by the number of words in the comment, and multiply by 100 to generate the percent of positive or negative words, then take the difference (% positive - % negative) to create a measure of tone.
words pos neg tone
Min. : 2.00 Min. : 0.000 Min. : 0.000 Min. :-66.667
1st Qu.: 29.00 1st Qu.: 8.333 1st Qu.: 0.000 1st Qu.: 4.000
Median : 41.00 Median :12.245 Median : 2.222 Median : 9.212
Mean : 55.94 Mean :13.355 Mean : 3.493 Mean : 9.861
3rd Qu.: 51.75 3rd Qu.:17.228 3rd Qu.: 5.263 3rd Qu.: 15.000
Max. :739.00 Max. :54.545 Max. :66.667 Max. : 54.545
On average, comments have 56 words, 13% of which are positively valenced and 3.5% of which are negatively valenced. The average tone is 10% net positive, though it ranges quite a bit. Let’s look at the distribution.
The comments definitely lean net positive, with quite a few extremely positive comments, and only one really uber negative comment. Here are the most extreme comments, and the categories to which they were submitted based on this metric. First, the most positive:
type tone
1 service 54.54545
text
1 By a strong renewed focus on character, including honesty, integrity, fairness, openness, and a spirit of service.
And the most negative:
type tone text
1 community -66.66667 Reject racism outright
The first comment gets a score of 55% – over half of the words here have positive connotations. The second comment has a score of - 67% – two of the three words are negatively valenced. While I wouldn’t disagree with the scores here – reject and racism are negative words, honesty and integrity are positive – this highlights some of the challenges of measuring tone. It’s not clear to me that the first comment was intended as a compliment to UVA – renewed focus suggests a lapse. And the short, pithy second comment rings as a critique, it’s probably not the most negative comment here; its brevity overemphasizes the negative words.
Still, we persist!
Sentiment by Category/Connections
Next we compare our measure of comment tone by comment category – are the comments about community or service or discovery more positive?
Well, no. Comments in each category appear to have similarly net positive distributions. Except for the outlier (Reject racism outright), there isn’t much to distinguish the categories.
Let’s try one more comparison – tone by the primary connection of the contributor.
There’s a little more going on here – while the center of the distribution for each connection type is similar, the tails are more variable. Comments by community members, for instance, don’t tend to get quite as positive as at least some comments by other contributors; and comments by supporters never veer into the net negative.
Lexicon-Based Analysis: Moral Foundations
Of course, there are multiple ways to think about sentiment, and sentiment is only one dimension of text that might be extracted via dictionaries. There’s been some work on uncovering moral rhetoric, or the dimensions of morality emphasized in speech and text. This work in moral foundations proposes five universal dimensions for ethical judgement, each arrayed from virtue to vice. The moral foundations are summarized below (adapted from the link above), labeled as virtue/vice, and a sampling of the words associated with each is provided.
- Care/harm: associated with virtues of kindness, gentleness, and nurturance.
Care-Virtue: safety, pities, protected, mothering, sympathy, hospital, consoling, patient, benefits, sharer
Care-Vice: agony, fighting, murderess, bullyboy, injurious, harassed, bullying, murderer, inflicting, genocide
- Fairness/cheating: supports ideas of justice, rights, and autonomy.
Fairness-Virtue: equals, compensating, fair, level the playing field, been objective, equalize, due processes, lawyers, compensated, retaliated
Fairness-Vice: betraying, double crosses, misleaders, crook, freeloading, economic disparity, untrustworthiness, mislead, fleeced, dupe
- Loyalty/betrayal: provides the basis of patriotism, faithfulness, and self-sacrifice.
Loyalty-Virtue: allegiant, loyal, fellowship, companion, trooper, families, war, loyalty, collectively, herd
Loyalty-Vice: apostate, unpatriotic, heretic, outsider, treachery, betray, cheated on, rebellions, cheats on, treason
- Authority/subversion: underlies ideas of leadership and followership, deference to legitimate authority, respect for traditions.
Authority-Virtue: top dog, submit, fathering, forbidding, venerating, bosses, slaved, supervised, governor, traitor
Authority-Vice: anarchy, disobedient, rebellion, heretical, trouble maker, nonconformists, bedlam, tumult, insurrectional, overthrows
- Sanctity/degradation: related to the idea that the body is a temple which can be desecrated by immoral activities and contaminants.
Sanctity-Virtue: pure, exterminating, divinities, exalts, glories, modesty, angel, jesus, prophet, chaste
Sanctity-Vice: puked, douchebags, deviant, drug, repugnant, deformities, excreting, whoring, curse, impurities
I apply the moral foundations lexicon to the comments to see if we can uncover any dominant moral rhetoric in this conversation about the university. After getting the count of words for each dimension, I convert these to a percent of total words to normalize across comment length.
Some of the moral dimensions don’t arise in the contributed comments at all – sanctity or ideas of purity don’t seem especially prominent (or relevant), and the negative poles of authority (subversion), care (harm), and loyalty (betrayal) don’t appear with any frequency; more surprising (to me) is the relative absence of fairness as an underpinning construct.
The moral dimensions that do come out are loyalty, care, and authority. Let’s see what that’s about by looking at the highest rated comment on each dimension.
type loyalvirtue
1 community 20
text
1 UVA focuses on understanding our broader community versus focusing on the community understanding UVA
type carevirtue
1 discovery 33.33333
text
1 How about protecting our community from Nazis, and not hiring fascist sympathizers?
type authorityvirtue
2 service 16.66667
text
2 Please maintain the honor code and the single sanction.
The first does get at the us/them element of loyalty; the second is clearly about protection from harm; and the third definitely calls out to respect for tradition. All in all, not bad.
Finally, let’s compare this across comment categories – perhaps ideas about community or discovery or service rest on distinct moral frames.
In fact, there are some differences. While loyalty, care, and authority are the most frequent moral dimensions for all three comment categories, comments about community rest notably more on ideas of loyalty, than on care or authority. Service comments, too, rely more on the loyalty dimension, but reference ideas of care and kindness more than the the comment categories. And feedback on discovery appeals more to authority than the other categories of comments.
Still to Come
After some additional unsupervised exploration – via cluster analysis and topic modeling – the goal is to model the relationship among these extracted features to see what we can learn. Stay tuned!
Michele Claibourn
Director, Research Data Services
University of Virginia Library
December 19, 2018
For questions or clarifications regarding this article, contact statlab@virginia.edu.
View the entire collection of UVA Library StatLab articles, or learn how to cite.