My talk at PyData NYC, Nov 22 2014
It’s a common story. Software developers are working hard to get a project off the ground. They set up logging to catch errors, but when they go to do data science down the road, they find that their logs are missing crucial information. A few days up front doing a “data audit” could have saved them time, made them money, and helped them gain insight into their customers. This talk will give you the toolkit you need to collect data properly, years before you bring on a data scientist. You will be able to do your own data audit, even if you don’t know anything about data science. You will learn the three major things to check—is your data complete? Is it correct? And is it connectable? You’ll also get a concise list of tools—Python and the command line—to quickly look through your data to get some intuition for what’s hiding in those CSVs. Be a hero to your future data team.
There’s a lot of the talk in the talk itself that doesn’t appear in the slides. I recommend watching the video.