Monday, December 28, 2015

Twitter Sentiment with Python Part 2

Earlier I blogged about grabbing some tweets from Twitter and running them through a text sentiment analysis module using Python. I recently revisited the project and added a new feature.

Here is the full script. Below I will add some notes



Some Notes


So I setup the script to run every 15 minutes against the keyword Kansas City to track the sentiment of this fine Midwestern town in the Great USA. I turned the script lose and sorta forgot about it for  month or so.

Kansas City is not the most polarizing of search terms on Twitter. Out of the 5000+ executions of the script the sentiment went negative a total of 88 times (1.76%).  The top 10 negative sentiment scores were recorded at the times in the table below.


Negativecapturedate
0.99999574912/6/15 15:38
0.99995177112/20/15 13:52
0.99989486212/16/15 15:53
0.99985036511/29/15 14:52
0.99981270612/25/15 8:52
0.99942610912/20/15 12:22
0.99942192512/17/15 22:22
0.99894590212/6/15 14:23
0.99340058711/18/15 16:52
0.99275798512/13/15 15:53


  • Odd that I did not see any negative sentiment hits on Thanksgiving.  Black Friday did not even have a hit.
  • 12/25/2015 @ 8:52AM apparently some folks did not like their Christmas gifts in Kansas City.
  • 12/20/2015 from 12:07 - 21:07. This time span had the most negative sentiment hits of the entire data set. A quick search on twitter for that day did not yield any clues. I also searched the news outlets and did not see anything obvious.
After reviewing the collected data I noticed a problem with my process. I am parsing the tweets to score the sentiment, however I cannot go back to review the tweets to see what caused the sentiment. 

Therefore I launched version 2 of the script (the script listed above) and added a new table to my database. I am now capturing the actual tweets as well as storing the sentiment score. I changed the capture time to 30 minutes.

Website


During the initial script build I also wanted a way to visualize the current sentiment in a sort of heat map type color system based on the negative score. I remember searching a while for this code but now I cannot remember where I found it. Anyway basically it will set the background color of this web page based on the value of the negative score. The negative score is determined using the sentiment analyzer.

If the negative score is close to 0 then the page will be green. If the negative score is close to 1 then the page will be red. If the negative score is .5 the page will be blue. Everything in between will be a various shade of these 3 main colors. Here is a sample of a high negative sentiment, the numbers are off here because I hijacked the code to negative sentiment of 1 to show the red color.:


All of this is driven by the code below.


So there you have it. I have cranked up the script again to collect data every 30 minutes. I predict in about a month or so I will remember that the script is running and will blog the findings. One of the next steps I want to do is incorporate the values in the D3js calendar chart. However, I may have to pick a more polarizing topic otherwise the whole year will be green.




Thursday, December 24, 2015

Fusion Table API Insert Data Python

Me and a buddy have been hacking at this Google Fusion API for a couple days now trying to figure it out. We finally had a break through. He sent me his sample code with the authentication piece and the SELECT statement and I started trying an INSERT. After about 10 rounds of fail finally got something to work and wanted to post it. We struggled to find a simple sample on how to do this, I am sure there is a better way, but at least its working.

Python Modules Needed


You will need to install some Python modules if you have not authenticated to the Google APIs before. Here is a list of modules I installed. 

  • requests
  • urllib3
  • apiclient
  • gspread
  • PyOpenSSL
  • oauth2client
  • google-api-python-client (used easy_install)
Not all of these are needed for the script. I am working from a new laptop so I had install them fresh. All of these were installed using pip except the Google API Python client. For some reason pip did not install that so we had to use easy_install.


Authentication


We used the same method detailed in this earlier post using the gspread module. You will want to create the JSON file with your authentication key in it so you can authenticate to the Fusion Table API.

Make sure you grant your client email address access to edit the Fusion table. You can do this using the Fusion Table Share feature and then add the email from the JSON file you downloaded when  you built your key   XXXXXXXXXXXXXXX@developer.gserviceaccount.com



Script


I am using the USGS Earthquake JSON feed I blogged about earlier today to import Earthquake data in to a Fusion Table. Here is the script.


Basically we just setup a loop to parse the Earthquake JSON data and on each loop we execute an INSERT statement in Fusion. Again there is probably a better way than the line by line method detailed here. But again at least we can import data into Fusion. This is not going to be practical for thousands of rows but is fine for our purpose here.

Earthquake Data JSON feed

Previously on the blog I consumed earthquake data from a CSV produced by the USGS website. Recently I revisited the USGS website and worked on consuming their JSON feed.

You can find information on their feed here: http://earthquake.usgs.gov/earthquakes/feed/v1.0/geojson.php

I am as giddy as a 11 year old girl at a One Direction concert when I see a website has a JSON feed. Here is the python script I used to consume the data and create the text file to import.



From there I was able to import the data into Google Fusion Tables. I tried using the Fusion Table API to knock this out automagically, but did not have much luck. Plan to keep hacking at the Fusion Table API to see what I can come up with.  In the meantime here is a map of all the Earthquakes that occurred 12/23/2015







Here is a link to the full screen map:
https://www.google.com/fusiontables/embedviz?q=select+col4+from+1aXGZQoukkHHSAKSHuCakYuVbh2qSaBxN5C-APpOE&viz=MAP&h=false&lat=9.66632685018197&lng=1.11668359374994&t=1&z=2&l=col4&y=2&tmplt=2&hml=TWO_COL_LAT_LNG




Wednesday, December 23, 2015

Link Dump 12/23/2015

Trying to start a semi frequent blog post with links, hand curated links about stuff I find. More for me than you.