Sasha Laundy

HOWTO Make Your Future Data Scientist Love You

My talk at PyData NYC, Nov 22 2014

Abstract

It’s a common story. Software developers are working hard to get a project off the ground. They set up logging to catch errors, but when they go to do data science down the road, they find that their logs are missing crucial information. A few days up front doing a “data audit” could have saved them time, made them money, and helped them gain insight into their customers. This talk will give you the toolkit you need to collect data properly, years before you bring on a data scientist. You will be able to do your own data audit, even if you don’t know anything about data science. You will learn the three major things to check—is your data complete? Is it correct? And is it connectable? You’ll also get a concise list of tools—Python and the command line—to quickly look through your data to get some intuition for what’s hiding in those CSVs. Be a hero to your future data team.

Video

Slides

There’s a lot of the talk in the talk itself that doesn’t appear in the slides. I recommend watching the video.

Recommended Books

Data Science at the Command Line

Thinking with Data

Handy Tools

csvkit

bitly data hacks

Questions, reactions, and followup

Blog posts that mention my talk

Michael Becker’s PyData NYC recap

Julia Evan’s PyData NYC recap

Tweets