Analysis of Ours to Shape Comments, Part 4

Introduction

We're still analyzing the comments submitted to President Ryan’s Ours to Shape website.

In the fourth installment of this series (we’re almost done, I promise), we’ll look at the sentiment – aka positive-negative tone, polarity, affect – of the comments to President Ryan’s Ours to Shape website.

We don’t have a pre-labeled set of comments, with negative or positive sentiment already identified, so we can’t use a supervised classification method (and I’m not committed enough to hand code a sample of comments). Instead, we’ll use a lexicon-based approach, using a predefined dictionary of positive and negative words and counting up their presence in the corpus of comments.

In the last post, we removed a few duplicate comments by the same contributor, so we’ll be working with a corpus of 842 distinct contributed comments (as of December 7, 2018).

The full code to recreate the analysis in the blog posts is available on GitHub.

Sentiment Analysis

There are many, many, many packaged sentiment dictionaries available. They should always be chosen with care, with attention to how they were created – crowdsourcing, grounded theory, algorithmically based on a labelled corpus – and for what purpose or context – for tweets, novels, newspapers.

I’ll use one with which I’m familiar – the Lexicoder Sentiment Dictionary. The LSD dictionary was created from previous sentiment dictionaries, widely used in in political science and psychology, but cleaned of ambiguous and problematic words. It’s tailored to political texts – did I mention I’m a political scientist – but I’d suggest the feedback to UVA represented by the Ours to Shape comments are political.

Here’s a sampling of the words categorized as positive...


 [1] "gentlest"  "upside"    "kidding"   "rightful*" "laud*"    
 [6] "good"      "high end"  "nabob*"    "courting"  "curious*"

and as negative...


 [1] "chilli*"     "dictat*"     "dung"        "kick"        "carps"      
 [6] "vice"        "unfavourab*" "misspent*"   "direful"     "drunk*"

There are 2,858 negative words and 1,709 positive words in all.

I apply this dictionary to the dfm of the comments to generate a count of the number of times words in the positive dictionary appear in the comment and the number of times words in the negative dictionary appear in the comment.


Document-feature matrix of: 5 documents, 2 features (10.0% sparse).
5 x 2 sparse Matrix of class "dfm"
    features
docs negative positive
   1        0       10
   2        4        2
   3        2       16
   4       17       31
   5       19       20

I divide the positive and negative counts by the number of words in the comment, and multiply by 100 to generate the percent of positive or negative words, then take the difference (% positive - % negative) to create a measure of tone.


     words             pos              neg              tone        
 Min.   :  2.00   Min.   : 0.000   Min.   : 0.000   Min.   :-66.667  
 1st Qu.: 29.00   1st Qu.: 8.333   1st Qu.: 0.000   1st Qu.:  4.000  
 Median : 41.00   Median :12.245   Median : 2.222   Median :  9.212  
 Mean   : 55.94   Mean   :13.355   Mean   : 3.493   Mean   :  9.861  
 3rd Qu.: 51.75   3rd Qu.:17.228   3rd Qu.: 5.263   3rd Qu.: 15.000  
 Max.   :739.00   Max.   :54.545   Max.   :66.667   Max.   : 54.545

On average, comments have 56 words, 13% of which are positively valenced and 3.5% of which are negatively valenced. The average tone is 10% net positive, though it ranges quite a bit. Let’s look at the distribution.

Histogram of tone measure.

The comments definitely lean net positive, with quite a few extremely positive comments, and only one really uber negative comment. Here are the most extreme comments, and the categories to which they were submitted based on this metric. First, the most positive:


     type     tone
1 service 54.54545
                                                                                                                text
1 By a strong renewed focus on character, including honesty, integrity, fairness, openness, and a spirit of service.

And the most negative:


       type      tone                   text
1 community -66.66667 Reject racism outright

The first comment gets a score of 55% – over half of the words here have positive connotations. The second comment has a score of - 67% – two of the three words are negatively valenced. While I wouldn’t disagree with the scores here – reject and racism are negative words, honesty and integrity are positive – this highlights some of the challenges of measuring tone. It’s not clear to me that the first comment was intended as a compliment to UVA – renewed focus suggests a lapse. And the short, pithy second comment rings as a critique, it’s probably not the most negative comment here; its brevity overemphasizes the negative words.

Still, we persist!

Sentiment by Category/Connections

Next we compare our measure of comment tone by comment category – are the comments about community or service or discovery more positive?

Violin plot of overall tone by comment category.

Well, no. Comments in each category appear to have similarly net positive distributions. Except for the outlier (Reject racism outright), there isn’t much to distinguish the categories.

Let’s try one more comparison – tone by the primary connection of the contributor.

Violin plot of overall tone by primary connection.

There’s a little more going on here – while the center of the distribution for each connection type is similar, the tails are more variable. Comments by community members, for instance, don’t tend to get quite as positive as at least some comments by other contributors; and comments by supporters never veer into the net negative.

Lexicon-Based Analysis: Moral Foundations

Of course, there are multiple ways to think about sentiment, and sentiment is only one dimension of text that might be extracted via dictionaries. There’s been some work on uncovering moral rhetoric, or the dimensions of morality emphasized in speech and text. This work in moral foundations proposes five universal dimensions for ethical judgement, each arrayed from virtue to vice. The moral foundations are summarized below (adapted from the link above), labeled as virtue/vice, and a sampling of the words associated with each is provided.

Care/harm: associated with virtues of kindness, gentleness, and nurturance.


Care-Virtue: safety, pities, protected, mothering, sympathy, hospital, consoling, patient, benefits, sharer
Care-Vice: agony, fighting, murderess, bullyboy, injurious, harassed, bullying, murderer, inflicting, genocide

Fairness/cheating: supports ideas of justice, rights, and autonomy.


Fairness-Virtue: equals, compensating, fair, level the playing field, been objective, equalize, due processes, lawyers, compensated, retaliated
Fairness-Vice: betraying, double crosses, misleaders, crook, freeloading, economic disparity, untrustworthiness, mislead, fleeced, dupe

Loyalty/betrayal: provides the basis of patriotism, faithfulness, and self-sacrifice.


Loyalty-Virtue: allegiant, loyal, fellowship, companion, trooper, families, war, loyalty, collectively, herd
Loyalty-Vice: apostate, unpatriotic, heretic, outsider, treachery, betray, cheated on, rebellions, cheats on, treason

Authority/subversion: underlies ideas of leadership and followership, deference to legitimate authority, respect for traditions.


Authority-Virtue: top dog, submit, fathering, forbidding, venerating, bosses, slaved, supervised, governor, traitor
Authority-Vice: anarchy, disobedient, rebellion, heretical, trouble maker, nonconformists, bedlam, tumult, insurrectional, overthrows

Sanctity/degradation: related to the idea that the body is a temple which can be desecrated by immoral activities and contaminants.


Sanctity-Virtue: pure, exterminating, divinities, exalts, glories, modesty, angel, jesus, prophet, chaste
Sanctity-Vice: puked, douchebags, deviant, drug, repugnant, deformities, excreting, whoring, curse, impurities

I apply the moral foundations lexicon to the comments to see if we can uncover any dominant moral rhetoric in this conversation about the university. After getting the count of words for each dimension, I convert these to a percent of total words to normalize across comment length.

Boxplots of percent of words present by moral foundation.

Some of the moral dimensions don’t arise in the contributed comments at all – sanctity or ideas of purity don’t seem especially prominent (or relevant), and the negative poles of authority (subversion), care (harm), and loyalty (betrayal) don’t appear with any frequency; more surprising (to me) is the relative absence of fairness as an underpinning construct.

The moral dimensions that do come out are loyalty, care, and authority. Let’s see what that’s about by looking at the highest rated comment on each dimension.


       type loyalvirtue
1 community          20
                                                                                                   text
1 UVA focuses on understanding our broader community versus focusing on the community understanding UVA

       type carevirtue
1 discovery   33.33333
                                                                                 text
1 How about protecting our community from Nazis, and not hiring fascist sympathizers?

     type authorityvirtue
2 service        16.66667
                                                     text
2 Please maintain the honor code and the single sanction.

The first does get at the us/them element of loyalty; the second is clearly about protection from harm; and the third definitely calls out to respect for tradition. All in all, not bad.

Finally, let’s compare this across comment categories – perhaps ideas about community or discovery or service rest on distinct moral frames.

Barplot of average percent of words present by moral foundation and comment type.

In fact, there are some differences. While loyalty, care, and authority are the most frequent moral dimensions for all three comment categories, comments about community rest notably more on ideas of loyalty, than on care or authority. Service comments, too, rely more on the loyalty dimension, but reference ideas of care and kindness more than the the comment categories. And feedback on discovery appeals more to authority than the other categories of comments.

Still to Come

After some additional unsupervised exploration – via cluster analysis and topic modeling – the goal is to model the relationship among these extracted features to see what we can learn. Stay tuned!

Michele Claibourn
Director, Research Data Services
University of Virginia Library
December 19, 2018

For questions or clarifications regarding this article, contact statlab@virginia.edu.

View the entire collection of UVA Library StatLab articles, or learn how to cite.