Tuesday, January 27, 2015

Consume CSV with Python

From time to time I will find a CSV out there on the web ready for consumption (Lucky). Using Python you can easily consume the data with a few lines of code and then do your thing.

Python Download


So using python and the URLLIB2 module you can hit the CSV, convert it to a string, and then output it out to whatever CSV file you want. This will allow you to massage the data before moving on using Python. Handy if you need to remove feilds, convert data, etc.

I connect to a database using the pyodbc module and then execute the stored procedure detailed below.



SQL Data Load


Once we have the CSV file downloaded we can use BULK INSERT to throw the data into a SQL Server database for us. Sometimes you get real lucky and you can just do a straight insert into the database. As you work with more and more data you realize that is usually not the case.

The procedure below includes a quick and dirty column conversion on the location column. The data source decided to wrap quotes around the location column only. Since it was at the end of the file and I did not need the last column, a REPLACE was used and a cludgy hack was born to combine the last two columns. This works for this data load, but you may not be so fortunate and other manipulations may need done to your data.



Why?


You may be asking, "Why not just go to the USGS web site to view this data. The map is already made, searchable, etc." That is a really great question, and for most people that is the way to go. Just use the site. However grabbing the data allows you to do other fun stuff that you cannot do through the web site.

Combine this data with the previously blogged about GPS distance calculations using SQL Server and you can build a script to Tweet when an earthquake happens within a 250 mile radius of some given point. Stuff like that.

0 comments:

Post a Comment