Sasha Laundy

HOWTO Make Your Future Data Science Team Love You

My talk at Strata Santa Clara, Nov 22 2014

Abstract

It’s a common story. Software developers are working hard to get a project off the ground. They set up logging to catch errors, but when they go to do data science down the road, they find that their logs are missing crucial information. A few days up front doing a “data audit” could have saved them time, made them money, and helped them gain insight into their customers.

This talk will give you the toolkit you need to collect data properly, years before you bring on a data scientist. You will be able to do your own data audit, even if you don’t know anything about data science. You will learn the three major things to check—is your data complete? Is it correct? And is it connectable?

You’ll also get a concise list of tools—Python and the command line—to quickly look through your data to get some intuition for what’s hiding in those CSVs. Be a hero to your future data team.

Video

I’ll post the video as soon as it’s available.

Slides

My slides tend to be sparse and visual. I recommend watching the video.

Useful Books

Data Science at the Command Line

Thinking with Data

Handy Tools

csvkit

bitly data hacks

Have other favorite tools for quick interrogation? Tell me about them!

Questions, reactions, and followup