Friday, January 9, 2015

Python TextBlob Sentiment Analysis

I am taking Python TextBlob for a spin. TextBlob is a python library for processing natural language. Modules like this are what makes Python so fun and awesome. This module does a lot of heavy lifting. First impressions are pretty good.

So what does it do. I am using the Sentiment Analysis portion of the module. Sentiment Analysis refers to the process of taking natural language to identify and extract subjective information. You can take text, run it through the TextBlob and the program will spit out if the text is positive, neutral, or negative by analyzing the language used in the text.

Why is this cool? By hooking this up to Twitter you can get a the pulse of how people feel about something. Feed in some text from an email you received and you can measure the tone of the email to see if the message is positive, neutral or negative. If that is not cool enough for you than that is a you problem.

Take that statement for example: "If that is not cool enough for you than that is a you problem."

I ran that through a Sentiment Analysis and here are the results.


Sentiment Analysis
TextIf that is not cool enough for you than that is a you problem.
Polarity-0.0875
Subjectivity0.575
Classificationneg
P_Pos0.344455873
P_Neg0.655544127

What does that mean?

  • Polarity - a measure of the negativity, the neutralness, or the positivity of the text
  • Subjectivity - value from 0 to 1 measuring the subjectivness of the text. 0 is objective, 1 is subjective
  • Classification - either pos or neg indicating if the text is positive or negative
  • P_Pos - a measure of how positive the text is
  • P_Neg - a measure of how negative the text is

Pretty cool if you ask me. I started playing with the text to see if I can shift the values. Check out this test on various sentences about hummus.


Text polaritysubjectivityclassificationp_posp_neg
Humus is good0.70.6pos0.5042265430.495773457
Hummus is good. Hummus is terrible.-0.150.8neg0.2347306510.765269349
Hummus is great. Hummus is terrible.-0.10.875neg0.3022460180.697753982
Hummus is awesome. Hummus is terrible.01neg0.4327330510.567266949
Hummus is amazingly awesome. Hummus is terrible.01pos0.584239130.41576087

Terrible is a pretty negative word. It is a lot stronger than the word good. Notice in row 2 good did not overcome the word terrible. Great could not overcome the world terrible. Awesome moved the needle to positive a bit more but still could not cancel terrible. Amazingly Awesome finally shifted the statement to the positive classification.

I then started hooking this up to Twitter. Since I already knew how to Authenticate to Twitter using Python, all I had to do is figure out the search functions to search for various topics. Here is some sample code.

So we are importing our modules needed to do the TextBlob analysis. We take in some arguments that we use as twitter search terms. I have hard coded the tweet count to 250. We loop through the list of arguments and execute a twitter search for each term. Then send the resulting tweets through the TextBlog sentiment analysis. I take the sentiment results and stuff them into a database.

From there I made a quick web site to display the current sentiment of some search terms. Green background means a positive sentiment and red means a negative sentiment. Here is a sample of Kansas City, KC Royals, KC Chiefs, and Sporting Kansas City.





This is just version 1. I plan to color the background based on the Polarity. That will offer a more granular representation of the overall feel of a topic. I plan to also add the numbers to the site so you can see the measure of polarity and such. Trending is in the works, since I am storing the sentiment values on each run of the script. I can display a trend on how the search term is changing over time.

I call the sentiment site RT_Lean which stands for Real Time Lean, which means how are people leaning on a search term in real time.

4 comments:

  1. I've been playing around with this library for a couple of days, and I am struggling to grasp what "subjectivity" really means here, and how its measured. Any idea?
    Adel

    ReplyDelete
  2. Using my example. "Hummus is Greek food". That is an objective statement since hummus is usually considered greek type food. "Hummus is the most awesome dip to put on a pita". That is subjective because that would be influenced by my personal taste and/or preference.

    I ran them both through the TextBlog Sentiment.

    Hummus is Greek Food. Subjectivity 0.0 - meaning highly objective.
    Hummus is the most awesome dip to put on a pita.: Subjectivity 0.75. That statement is pretty subjective.

    You will have to play around with it. Let me know what you find out.

    ReplyDelete
  3. Ok so hummus originated in Egypt apparently. However I would bet most people associate it with Greek food.

    I also realized my code samples were not displaying. I think I fixed that problem.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete