Gus Cavanaugh

Getting to Second Base with HBase

Just enough to have some fun but not get you in trouble with Dad

What non-coding business analysts need to know to manipulate Big Data

NoSQL databases are a class of databases used in Big Data applications. They provide tremendous advantage to Big Data developers: fault-tolerance, flexible schema, and fast query performance. HBase is one of the most popular . . .

January 18, 2016

IBM Watson: Bieber has more self-discipline than Plato

Using Compose, Twitter, and Watson to compare to philosophical titans


This in an introduction to using Compose, a database hosting service, to quickly store and retrieve data. Compose eliminates all of the annoyances of hosting your own database server, saving you both time and money. It is also easy to use, which shortens the learning curve for new developers (like me).

So what should we do with this . . .

January 07, 2016

Install Spark on Windows

No Hadoop, No Problem

Install Spark

I have installed Spark directly on Windows, which is unusual. Most people will probably run Spark through a VM (virtual machine - a separate computer that runs as software within in your computer) or a docker container (same idea, but higher level of abstraction). Unlike writing MapReduce or Pig (Pig is the scripting language . . .

January 02, 2016

Tolstoy's Favorite Word

Using Apache Spark and Watson Analytics to analyze the most frequent words in War and Peace


This a quick example of using Apache Spark, the big data processing engine, and Watson Analytics, a new analysis and visualization tool, to do a basic word count and analyze the results. For fun, we are using Leo Tolstoy's War and Peace. As you're likely aware, Tolstoy's magnum opus is one hefty book, so figuring out which . . .

January 02, 2016

5 Reasons to Host a Hackathon in Ghana

Yes, Ghana, go find it on a map

Hackathon in Ghana

This weekend I took a suprisingly pleasant 10.5 hour direct flight from DC to tropical Accra, Ghana (I was never picked first for basketball at recess but I slept just fine in coach - winning!). I was in Ghana to help facilitate a hackathon at Ashesi University, a new school outside of Accra. For those not familiar with . . .

October 28, 2015

Retrieving Big Data

For the Non-developer

I recently gave a talk on retrieving big data at Data Wranglers DC on Retrieving Big Data. The goal of this talk was to introduce Hadoop to business analysts in the barest, no-nonsense form and cover one Pig hurdle I faced. Read: I gave really shitty explanations of Hadoop core components like "All you need to know about MapReduce is . . .

July 03, 2015

One Keyboard Shortcut

To Rule them All (In Excel)

No Shortcuts in Life (but not in Excel MotherFucker!!)

You’ve probably heard there are no shortcuts in life. I’m not the fucking Buddha so I’ll leave that existential shit to someone else. In Excel, my mouse using neophyte, there are some sweet goddamn shortcuts you need to get on quickly.

You use the mouse in Excel? No no, please tell me . . .

May 29, 2015