Monday, December 28, 2015

Twitter Sentiment with Python Part 2

Earlier I blogged about grabbing some tweets from Twitter and running them through a text sentiment analysis module using Python. I recently revisited the project and added a new feature.

Here is the full script. Below I will add some notes

Some Notes

So I setup the script to run every 15 minutes against the keyword Kansas City to track the sentiment of this fine Midwestern town in the Great USA. I turned the script lose and sorta forgot about it for  month or so.

Kansas City is not the most polarizing of search terms on Twitter. Out of the 5000+ executions of the script the sentiment went negative a total of 88 times (1.76%).  The top 10 negative sentiment scores were recorded at the times in the table below.

0.99999574912/6/15 15:38
0.99995177112/20/15 13:52
0.99989486212/16/15 15:53
0.99985036511/29/15 14:52
0.99981270612/25/15 8:52
0.99942610912/20/15 12:22
0.99942192512/17/15 22:22
0.99894590212/6/15 14:23
0.99340058711/18/15 16:52
0.99275798512/13/15 15:53

  • Odd that I did not see any negative sentiment hits on Thanksgiving.  Black Friday did not even have a hit.
  • 12/25/2015 @ 8:52AM apparently some folks did not like their Christmas gifts in Kansas City.
  • 12/20/2015 from 12:07 - 21:07. This time span had the most negative sentiment hits of the entire data set. A quick search on twitter for that day did not yield any clues. I also searched the news outlets and did not see anything obvious.
After reviewing the collected data I noticed a problem with my process. I am parsing the tweets to score the sentiment, however I cannot go back to review the tweets to see what caused the sentiment. 

Therefore I launched version 2 of the script (the script listed above) and added a new table to my database. I am now capturing the actual tweets as well as storing the sentiment score. I changed the capture time to 30 minutes.


During the initial script build I also wanted a way to visualize the current sentiment in a sort of heat map type color system based on the negative score. I remember searching a while for this code but now I cannot remember where I found it. Anyway basically it will set the background color of this web page based on the value of the negative score. The negative score is determined using the sentiment analyzer.

If the negative score is close to 0 then the page will be green. If the negative score is close to 1 then the page will be red. If the negative score is .5 the page will be blue. Everything in between will be a various shade of these 3 main colors. Here is a sample of a high negative sentiment, the numbers are off here because I hijacked the code to negative sentiment of 1 to show the red color.:

All of this is driven by the code below.

So there you have it. I have cranked up the script again to collect data every 30 minutes. I predict in about a month or so I will remember that the script is running and will blog the findings. One of the next steps I want to do is incorporate the values in the D3js calendar chart. However, I may have to pick a more polarizing topic otherwise the whole year will be green.


Post a Comment