A docker image that can be used as a standardized starting point for python development work. Additionally,
Use pyodbc to connect to SQL Server
The smaller footprint, the better
Seems like a relatively straightforward ask. However, if you’re not careful, you’re going to hit 3 different roadblocks along your journey.
Block#1 : Docker Image Size
Maybe it was my search criteria, but many tutorials ask you to start building a python docker container by using the official Python image. But if you’re not careful, when you’re ready to deploy outside of your workstation, you’re going to realize that the size of the underlying image is over 900MB. This, followed by the immediate assertion that to reduce image size you can use the alpine flavor instead. Yes!! You’ve solved the image size issue.
Block#2: The Pyodbc module
Installing pyodbc on the official python image is a fairly straight forward command
pip install pyodbc
However, plugging this command into a Dockerfile using a python alpine image is certainly going to give you issues. As smarter people on the internet will tell you, this to do with the lack of gcc and/or g++ installed within the underlying alpine base (you’re starting to see why the image is really small in size). But, no worries, all you need is the right set of dependencies installed and we can take care of this block too.
Block#3: SQL Server as a Data Source
Top search results will tell you the best way to connect to SQL Server from docker is to make sure you have Microsoft’s ODBC drivers for SQL Server installed. Great! There’s also a dedicated page from Microsoft to do it on Linux. And when you’ve gone down the rabbit hole a bit longer, you realize that Microsoft is probably not going to directly support drivers for Alpine. So what do we do now? Pivot to using the FreeTDS driver all things are right with the world.
Block#4: FreeTDS Driver Location within the container
Now hold on a hot minute, why is there a 4th one listed here? Because once you’ve hopped past all the previous blocks you’re going to probably find that pyodbc is now having trouble finding the driver you just installed. Again, smarter folks of the tech world to the rescue. Essentially what we have to do is tell pydobc where to find the installed driver since it can’t seem to pick the right one up right off the bat.
Putting this all together
TL;DR – Here are the steps we need to follow to get everything working
Use python (alpine flavor) as your image base
Pip install pyodbc
Set connection string
Explicitly tell pyodbc, within your script, where to find your driver
Here’s a GitHub repo with what I managed to put together – with help from /u/jpens from the Docker subreddit and all the other articles listed.
Now we have a docker image with a relatively small footprint that uses pyodbc and FreeTDS to connect to SQL Server.
As most fans of Zombieland will tell you, it’s been a long 10 year roller coaster of a journey to the sequel. After the first movie was well received, there was so much news, it was hard to determine what was real and not. For every seemingly positive news, there was also an immediate, inevitable setback. And eventually, after years of back and forth you give up the notion of another movie (and yet hold out on the hope of a 1% chance – just like any optimist might). Half Life fans know exactly what I’m talking about. In any case, the wait was so worth it.
There is no sequel curse in the this franchise. After waiting a decade for the sequel, one would think the mojo might have disappeared – things have certainly changed over a decade after all. However, everything seems to pick off right where we left off. Even the cast didn’t seem to age one bit (in the trailers!). There’s more than enough odes and nods to the first movie to satisfy even the most hardcore of original fans. And while the original cast still brings it, the whole exercise in reminiscing is made better with the introduction of new cast members. Special shout out must go out to Zoey Duetch as Madison (valley girl tropes usually devolve into annoyance fairly quickly, but not this time!).
One of the things that made the original movie great were #TheRules. However, I’m not sure who ok-ed the editing in the sequel as there was just not enough time to read the rules displayed on the screen – especially in the beginning of the movie….unless it was supposed to be an attempt at subliminal messaging or something of that sort.
Within the zombieland universe, there is no such thing.
If you’re a fan of the first movie or just someone looking for a good zombie comedy, this movie is a must see! This is a befitting sequel to what should have been at least a trilogy had it been done right years ago.
The BAE Texts Analysis (BAETA – pronounced bay-ta) project was designed to analyze text messages between bae and I. There were also a few constraints to keep in mind:
Ability to analyze messages from both Google Voice and iOS
Ability to visualize findings
Ability to replicate the process for the future
With these constraints in mind, I figured the best way to approach this, was with the following tools
Data scripting – Python with the following modules/packages
bs4 a.k.a BeautifulSoup (you’ll see why below)
nltk – for NLP (Stemming / Lemmatization)
Data Storage – SQL Server
Data Visualization – Tableau / Power BI
With some of my messages on Google voice, I had to figure out a way to collect this data. A quick search led me to Google Takeout. Sign in to my account, select all the data I want Google to zip up (depending on your data history this could take a while), and once all done, I get an email saying my data is ready for download. I followed the steps and, voila, I have all my history from Google Voice downloaded to my computer with the following structure
As you can see, all texts are housed within files with the name Text. Here’s a snippet from one of the files.
Now that I know what the structure of a text / conversation looks like, I use the BeautifulSoup module from Python to navigate the document structure and extract the information I need.
In order to get access to iMessages I needed three things
A Macbook – there are articles out there that talked about how creating an iTunes backups could give me access to my texts and after failing a couple of attempts (perhaps I could’ve resolved this with more troubleshooting) on a Windows machine, I decided to switch tactics and use a Macbook instead
Messages App on Mac OS – Once on the Mac, I had to set it up to sync my iMessages. Here’s an article that gave me more information on where I needed to look for the required file. What I needed to find, here, was a chat.db file that was generated as a consequence of the iMessage sync. This sqlite database houses all the chats and I can hone in on what i’m looking for from there.
SQLite DB Browser – Once I had the chat.db file from step#2, I moved it to my Windows machine and got the DB Browser for SQLite that helps browse the data within the chat.db file. Here’s what the db structure looked like.
Now that I know what’s in the database, I try to figure out what tables are likely to contain the data I need – message, chat and chat_message_join. The message table has the actual message but I’m also going to need some other info
ROWID to join with message_id from the chat_message_join table
chat_id from the chat_message_join to join against the ROWID from the chat table
epoch time manipulation – the article I linked to in the section above, gives excellent guidance as to what needs to be done to convert the epoch time to human readable time.
Once I got a handle on what information was where, I knew I wanted to extract and retain the following key pieces of information from both data sets
Message Author – who sent a message
Message Time – what time was the message sent
Message – the actual message itself
And now that I know what I want from my source data, I wrote a Python program to do 3 things
Extract key information from Google Voice Takeout data
Extract key information from iMessage Data
Analyse the message content from the extracted data – as part of the exercise I also wanted to analyse what words / emojis were used the most
Store the data for later use – write it to a db within my local SQL Server instance
While creating the script and figuring out how to store the data, there were a couple of things I had to keep in the back of my mind.
utf-8 encoding – the data from Google Voice needed to be utf-8 encoded
utf-8 encoding on sql server table – in order to retain the encoding when data was written to the table, I needed to use nvarchar as the column datatype as opposed to the standard varchar
Now that I extracted the information and put it into tables within my db, it’s time to start visualizing. I did a version of all visualizations in Tableau and kept hitting roadblocks with Unicode displays. Power BI on the other hand, had no such issues. In any case, I wanted to answer the following questions.
Who Texts More?
Do Messaging Habits (message count) Change With Platform?
Messaging Over Time
Who Initiates Conversations
Who Talks More
Word & Emoji Analysis
Having answered the questions I set out to, there are a couple of things I would like to do in the next iteration of this project
Sentiment Analysis – scour texts and assign a sentiment score to either the text themselves on some cluster / aggregation of them
Word Use By Author – what words use trends can we find by looking at messages from each author
Visualization – Use the workarounds for the Unicode character roadblocks in Tableau (although it’s a complete shame that a powerful tool like Tableau has this as a problem to begin with) and publish to Tableau
This was such a fun project to undertake.Besides sharpening some old skills and learning new ones, the most important part was sharing this with Bae and while she doesn’t share the same nerd level of enthusiasm as me, it was certainly quite exciting. And with the project base all set, it’s just a matter of time before I return and work on some of the future enhancements.
Earlier this year, after being with Apple for over 10 years, I decided to switch back over to Windows, primarily because laptops this side of the fence seemed to offer better configurations for comparable pricing. And having used Windows 10 for over a couple of years, it seemed like a decent enough bet. The one thing I didn’t anticipate, however, was the Windows Update system.
Back in the good ‘ol days, if you had a Windows machine, you considered yourself lucky if you got a notification warning you about an impending restart – usually the computer just restarted itself and applied all the updates..because why not. Since those days, Microsoft decided to institute the “Active Hours” methodology – allowing you to schedule hours where it was a complete no no to install any updates. Hmm, not bad!
So the other day there were a few updates that needed to get installed and since I wasn’t working on anything important I decided to give the go ahead and install whatever updates was recommended. And, moreover, assuming that the system would apply updates and land back at the login screen as it had done many times before, I pressed “Restart” and didn’t come back until 24 hours had passed. Bad mistake.
The laptop in question, being my personal machine, I had decided from Day 1 that BitLocker was probably overkill. Worst case, I’d just reformat the drive and we’d be good to go. So far so good. Imagine my surprise when no authentication method would work. Ok, what do we do now? Everyone knows the IT adage…
“Have you tried turning it off and on again?” – Any IT Person
Alright, restart it is. Here’s where things get interesting – right after the laptop manufacturer logo, I was greeted by this.
Ok, what in God’s good name is BitLocker asking me? I’ve never activated it. And because of that simple precursor of a step, there is no access code, no recovery key, nothing – again… because nothing was ever setup. Why in the world, then, does Windows 10 think I have an encrypted drive? And what recovery keys are you talking about?
Now, I’m not sure if anyone else has run into this, but can you imagine waking up one day to find your laptop had decided, on its own, that it was in your best interest to turn on an encryption service without warning? I have read reports of windows updates deleting files, breaking other things but this is not something I had heard of or come across until now. Frantic Googling was of little help. Some resolution activities included buying a whole new hard drive! Why thank you!
So a whole fresh Windows 10 installation and a manual data backup plan later, thanks Windows… I think I can take it from here.
A filled world map with the world happiness rankings of 2017. As a added feature, I used the data from 2015 and 2016 to find which year the country was the happiest.
Most & Least Happiest Countries of 2017
Let’s take a look at the most and least happiest countries of 2017. In addition to the rankings, I also took the countries that improved the most and regressed the most as an indicator of happiness. To visualize this, I used a combination of maps (to show the countries on a map) and bar charts (to show the rank progression – plotting the change in rank from 2016 to 2017).
What I have learnt over many attempts of writing and maintaining a personal blog, especially after having gone from shared hosting to cloud self-managed hosting, is that maintaining the underlying infrastructure is hard work. Having gone from paying a pretty penny on Bluehost, I decided to switch to AWS with disastrous results – I found addressing database leaks that would take down the entire EC2 instance.
In any case it’s time to get back to writing again and this time I decided to go with Digital Ocean. Now we keep writing and seeing if my experiment with Digital Ocean will yield the results I’m looking for.