Sasha Laundy

I like science, Python, and bacon.

Unexpected Lessons From ORDCamp: Unstructure Is the Best Structure

I had the pleasure of spending the weekend attending ORDCamp in Chicago. It’s an unconference bringing together a wide range of artists, founders, engineers, educators, writers, and even a couple of beekeepers. I learned so many things I expected to learn (cool new ideas, tools, and stories) but the overall Gestalt experience brought several unexpected realizations.

The whole crowd at the closing session.

This is my fifth or sixth unconference. This format bring together people around some common philosophy or topic. Unlike a conference, there is no content set ahead of time. No confirmed talks, no pre-arranged speakers. The attendees construct the schedule together on the first night. This works way better than you’d think.

Session leaders are not necessarily even experts in their topic. It’s perfectly acceptable to just start a session around a question you have been pondering, creating a space for an interesting discussion. I ran a session called “Training Animals (and Humans) for Fun and Profit.” I’m not an animal trainer and I’ve never had a pet. A dog training book [1] changed the way I think about parenting, strengthening relationships, software user interfaces, and how to change my own habits. I wanted to share what I’d learned.

Sessions don’t have to even be about ideas. One of the more popular sessions was watching someone play the video game Spelunky. Others were more hands on. I’m particularly sad I missed Flaming Nunchucks, which was exactly what it sounds like.

One reason this format works so well is that it creates just enough structure. Organizers make sure social norms and expectations are clear, give advice on how best to navigate the experience, and then put the content squarely in the hands of the attendees. This works so well precisely because there is a void to fill. If you don’t step up and offer something interesting to the community it’s going to be a pretty boring day. It puts people in a contributor mindset, which changes everything.

At normal conferences, you can really phone it in. Sit in the back, half-listen to the sage on the stage, flip through Twitter, hang out with people you already know. But at well-run unconferences, opportunities to engage feel precious and fleeting. You want to attend every session and meet everyone.

At the intro session, organizers Fitz and Zach encouraged us to settle in, “carefully place your stuff precisely anywhere” and lay off the tweeting and posting to indulge in a weekend for ourselves. I left my laptop and phone in my bag the whole weekend. I knew it would help my focus, but I didn’t expect it to so radically improve my energy level. Instead of sliding into Twitter or email when I had a down moment, I had time to be thoughtful: to say hello, ask a question, get some water, or play Chopsticks on a musical Tesla coil. Twitter and email feel stimulating but are actually shallow and exhausting: aspartame for the mind.

When we felt the urge to check our phones, organizers and ORDCamp veterans enouraged us to see that as a sign to get up and leave the session we were in and explore others. Making it socially acceptable to say ‘no’ politely and find something that’s a better fit for you isn’t disrespectful. Quite the opposite: you get a better session, and the other folks in the room get engaged, excited partners. [2]

The other piece of unexpected advice came from ORDCamp veteran Bill. “Choose the session you’re least interested in.” I couldn’t resist the interesting sessions for the first few rounds, for the 5pm slot I chose The Art and Science of Smoking Pigs. I’m never going to have the space to do it myself, and how much could there possible be to say about pig roasting?

Moshe blew me away. He’s spent years refining his barbeque technique, and showed us the science, art, and culture that went into the best barbeque I’ve ever had. Not only that, but he fed us rib tips and barbeque sauce-covered Cap’n Crunch. And he brought a blow torch and charred up a few different wood samples so we could get a sense for the different smoke flavors.

Luck surface area [3] is an immensely useful concept. Bill’s advice was essentially to increase my serendipity surface area, exposing myself to ideas that perhaps weren’t interesting because I just hadn’t seen them in the right light before. [4] I’ll certainly never see barbeque the same way again.

Unconferences in general are all about increasing serendipity surface area. They are just enough structure to create an amazing skeleton, and then get out of the way and allow the incredible brains in the room to fill the space deliberately left empty. It’s a structure that gets out of the way.

Hacker School is such a radically efficient educational experience for exactly the same reason. In some sense, it’s a 12-week-long unconference with ad-hoc one-to-two person sessions. It’s a school that gets out of the way of your learning. This is also why I chose an unstructured hack night as the first format for Women Who Code. The group simply a social platform to bring together phenomenal people and then get out of their way so they can build the code and relationships that they need.

The most surprising thing I’m taking with me is perspective on my daily life. NYC hardens you around the edges quickly. There’s so much noise and hubbub to filter out, and so many people demanding your time, attention, and money. With world-class everything, it’s easy to slip into casual jadedness. Seeing people share their excitement and passion for beekeeping and parenting and schools and tattoos and nunchucks and games and sousaphones, I couldn’t remember the last time I was that viscerally excited about something. I want to put that feeling in a jar and keep it with me in the big city.

I’ll let you know how it goes. In the meantime, get yourself to an unconference. Or make your own.

[1] Recommended to me by habit formation expert BJ Fogg, whose work has also been immensely useful.

[2] This is also great relationship advice.

[3] Luck surface area means increasing the opportunities you have to be lucky, either by increasing your exposure to opportunity or by preparation to take advantage of it when it comes around.

[4] Someone’s offhand comment on Friday gave me a great new metaphor: ant trails. Ants wander around exploring somewhat randomly. As they go, they lay down trails of pheremones. When other ants come across these trails, they follow them. In short order, the entire workforce are no longer exploring, just trucking food back and forth on established trails. It’s the best way to efficiently keep their colony fed…at the expense of exploration. It’s very easy to go to sessions in your wheelhouse. Bill’s advice got me out of my ant trails.

Blaggregator Now Has Comments!

We were having some great conversations about fellow Hacker Schoolers’ blog posts in Humbug, but that channel is too high-volume for most alums.

Enter: Comments! The Hacker School community can now privately discuss blog posts in a pretty interface with a high signal-to-noise ratio. Check it out (Hacker School login required).

Git + Autocomplete = Bliss

Did you know that you can turn on autocomplete for git? A quick survey of other Hacker Schoolers turned up that most Linux distros come with this built in, but most Mac users didn’t have it installed.

  1. Streamline your tools
  2. Save time
  3. Profit!

It’s super easy. Put git’s autocomplete script in your home directory. Then add it to your .bashrc by running source ~/.git-completion.bash.

There’s more detail here, along with aliases, which are a MUST. I added git unstage FILENAME which I find much more intuitive than git reset HEAD -- FILENAME. Hooray!

Part of my morning routine here at Hacker School is learning a few new things about Python, bash, and git, since in the long run, knowing these tools like the back of my hand saves tons of time.

Let me know on Twitter/Humbug if you want to hear more about this routine.

Demo Code Review

A big part of life at Hacker School is code reviews. Sometimes facilitators review code but for the most part, it’s our super talented peers.

Getting reviewed is really valuable to accelerate learning. It points unclear parts of your code, and blind spots you didn’t know you had. In some of my projects I’ve implemented things in a brute-force way, and my reviewers have pointed me to data structures or libraries I hadn’t yet discovered that allowed me to cut out lots of code.

A picture of our code review session.

Today Allison (a facilitator) offered a demo code review, so we could see how she thinks about a code review. We went over a Lisp implemented in Python written by Nick.

The crowd watching was pretty diverse, so everyone took something different away. Some learned about Python, some about the Git workflow, and some about Lisps.

On Rakefiles and Rabbit Holes

TL;DR. I noticed a bug in Octopress. In fixing it, I found a separate, truly spectacular error, and learned a lot of interesting things about bash. I would never have learned so much outside of Hacker School, since it gives me the time and space to open up the box.

The original bug.

Octopress’ rake preview lets you preview your post in the browser before deploying. It spawns Rack to serve local requests and Guard to watch for file changes. Guard in turn spawns Fsevent to do the actual watching.

Handy pstree is handy.
1
2
3
4
5
6
7
8
9
$ pstree
...
 | | \-+= 31340 root login -pf sasha
 | |   \-+= 31341 sasha -bash
 | |     \-+= 31430 sasha ruby /.../ruby-1.9.3-p374/bin/rake
 | |       |-+- 31433 sasha ruby /.../ruby-1.9.3-p374/bin/gua
 | |       | \--= 31443 sasha /.../ruby-1.9.3-p374/gems/rb-fs
 | |       \--- 31434 sasha (ruby)
...

But when you crtl-C, Fsevent stays running in the background until you kill it or modify a file in the watched dir. This throws a TCPServer Error if you try to run rake preview a second time. Sad pandas.

Initially I had no idea what was going on here. I’ve dabbled in Ruby but not modified a Rakefile before. I learned ps and kill ten years ago, but didn’t know much about what was going on under the hood. Ripe conditions for learning a ton.

This is what I love about Hacker School. I have the time and space to go down these rabbit holes. At a startup, priority goes to shipping code. Here, shipping code is the means, not the end.

In the end, the actual bug was pretty small: when you interrupt rake, it passes kill 9 to guard. Guard ends itself but doesn’t properly terminate its child process fsevent`.

Rake preview catching interrupts
1
2
3
4
  trap("INT") {
    [guardPid, rackupPid].each { |pid| Process.kill(9, pid) rescue Errno::ESRCH }
    exit 0
  }

This can be duct-taped together by sending 3 (QUIT) instead of 9 (KILL) to Guard, but we’re asking the Guard team if this is a known issue. But it gets better!

Curiouser and curiouser

Things got really weird when I was playing around with the different exit codes. The Ruby documentation says that if you pass kill a negative argument, it will kill the entire group of processes, not just the child process. Promising!

However, it broke very spectacularly. If you want to follow along, grab the development branch of Octopress, currently 2.1:

$ git clone -b 2.1 https://github.com/imathis/octopress.git

Find the preview task and change the message it passes on interrupt from 9 to -3:

Line 161 as of this writing
1
2
3
4
  trap("INT") {
    [guardPid, rackupPid].each { |pid| Process.kill(-3, pid) rescue Errno::ESRCH }
    exit 0
  }

Then run rake preview (you will have to run rake install the first time). Wait until you see Guard’s prompt (powered by Pry):

16:10:05 - INFO - Guard is now watching at '/Users/sasha/code/octopress'
[1] guard(main)> 

Then Control-C. You will see some lines of error output, and then BOTH guard and bash’s prompt, instead of one or the other. Wat!

[2] guard(main)> SASHAs-MacBook-Air-2:octopress sasha$ 

Weird! Press any key that sends input. PRY EXPLODES. Press enter and you get a bash prompt back, but now you can’t see anything you type. It’s still getting to bash (try ls <enter>) but some things, like control-L, no longer work.

Lolwut?

Down the rabbit hole

So I started reading about TTY and POSIX signals and using stty. Interesting stuff, particularly the history of our terminal evolving from ticker tape outputs.

You can also change all sorts of wacky things about your terminal with stty. Try stty -echo (stty echo to undo it). This explains why I wasn’t able to see my own typing after the Pry explosion - when control was reluctantly handed back to bash, the flags on my terminal weren’t properly reset, including the flag to use raw (non-canonical) input processing, which is why it won’t process things like control-l until you hit enter.

I didn’t find all the answers in my reading, but I’m asking significantly better questions:

  • How groups are being used is unclear. I expected the process group to inherit its gid from the parent process and to be killed cleanly by passing a negative argument to kill, but that didn’t work. Passing -3 kills Rack but only mortally wounds Guard.

  • One remaining mystery: why is the bash prompt printed to the terminal after the Guard prompt when Guard is still running? Maybe control is passed from Guard to bash and then back to Guard?

  • It seems that Pry is getting something unexpected from its stdin, which triggers its explosion, which may or may not be coming from bash. But how to intercept it?

  • The Readlines library defines some bash shortcuts, like mapping ctrl-l to clear, but lets you override them with an .inputrc. So you can do awesome things to your prompt like adjusts how ctrl-w works. There are also vim and emacs modes for bash.

Stuff I learned: a little Ruby, Rakefiles, POSIX signals, the history of TTY, more about stdin, stdout, and stderr, stty, working with pids, gids, and the Process module in Ruby. And my first accepted (albeit one-character) pull request of Hacker School, which is certainly the highest learning-to-LOC ratio I can think of.

Turbocharging Octopress

This blog is powered by Octopress, which is basically a set of rake tasks, themes, and other add-ons generate a blog from Markdown posts. It’s in turn powered by Jekyll.

The documentation is generally pretty good, but they didn’t really explain one fundamental thing. It’s pretty simple once you dig into the Rakefile, but here’s a quick explanation if you just want to get up and running.

WTF is going on with the branches?

On github, you will have two branches: source and master. But locally:

1
2
$ git br
* source

Huh. Interesting. Locally, you only have one branch: source. Wat?

Basically, source holds your posts in Markdown and other files before they are transmogrified into HTML. Once you run rake generate, Octopress will generate all the HTML & CSS and put it all in /public.

And when you run rake deploy, Octopress pushes the contents of /public (on local branch source) to the home directory (on remote branch master).

So: each time you finish a post, run BOTH rake deploy to deploy and git push origin source to back up your source files to github.

Sublime Text <3 Markdown

So you’re writing your blog posts in Markdown like a boss. There are a few things you can do to make Sublime Text 2 a lean, mean blogging machine:

Custom themes. For example, MarkdownEditing, a series of custom themes and shortcuts for Markdown. (Hint: installing Package Control first will make this easier)

Spell check. Sublime Text 2 uses Hunspell for spell checking, the same library used in Word. This process was a bit more convoluted:

  1. Command-shift-p in ST2 to open Package Control
  2. Choose “Add Repo”. Paste in http://www.sublimetext.com/docs/2/spell_checking.html
  3. Command-shift-p again. Choose “Install Package,” then “Dictionaries”
  4. Find your preferences in Preferences > Package Settings > Markdown Editing > Markdown Settings - User
  5. Add the following to the settings file:
     "spell_check": true,
     "dictionary": "Packages/Language - English/en_US.dic"
    
  6. Restart Sublime Text. Observe red squigglies when you edit Markdown.
  7. Profit

Crashing Heroku (or When Memory Management in Python Really Matters)

David and I paired this week to build a Markov chain generator. It analyzes 15MB of old Hacker News headlines and generates completely new headlines that really sound like they came from HN. You can follow the Twitter bot or check out the code.

Last time, we asked your advice on how to handle all possible trigrams in 15 MB of seed text - a task that took up about 1 GB of memory in our first implementation, rapidly blasting through Heroku’s limit of 512 MB. The culprit? Dictionaries.

Our initial data structure was a dict of dicts. The key was a bigram, or word pair, and the value was a dict of all of the words that followed that bigram in the training text and the number of times they occured. Wat? Here’s an example:

Our initial data structure
1
2
3
4
5
6
7
8
9
10
11
12
{
  ('how', 'to'): { 'do': 529,
                   'grow': 202,
                   'know': 134,
                   .....
                 },
  ('to', 'do'): { 'the': 1245,
                  'everything': 149,
                  .....
                },
  ...
}

With 15 MB of training text and so much redundancy, this can get out of hand quickly! Every word appears at least 3 times in the matrix, and some appear thousands of times.

Possible solutions:

  • Generators. for iterating over the headlines. This helped a lot - got us from 1.3 G down to about 850 MB.
  • Compression. Tossing out common values, like 1 for the count of the long tail of words.
  • Creative use of data types. E.G. storing words as ints instead of strings.
  • Lists. They take up less memory than dicts. One variation that we tried (and is in this version of our code) is creating two dicts of lists rather than one dict of dicts.
  • Trade memory for speed. Darius, a Hacker School alum, pointed us to his Stack Overflow answer, where he laid out a few other approaches that mainly trade memory usage for speed. Also, David dug up some recent papers that focus entirely on this problem. Researchers are currently doing some very clever algorithms to get the most out of their data structures.

So where did we leave it?

Darius’ suggestions and the approaches in the literature will certainly help reduce memory usage. We wanted to maximize learning per unit of time at Hacker School and didn’t think implementing these approaches teach us proportionally more than reading about them. We shipped it by deploying on a machine with more memory. Follow the Twitter bot here!

Stuff I learned along the way:

Generators, classes and methods in python, pickle, regex, defaultdict, Tweepy, Buffer’s API, how bad the built in titlecase methods are in Python, some clever way to handle default arguments, some fundamental principles of NLP.

Help Us Plug Our Memory Leaks

David and I are running our app on Heroku, but it uses too much memory to run! Got any clever optimization suggestions?

We’re building a Markov chain generator that is trained on a corpus of 15MB of old Hacker News headlines. The app currently indexes the training corpus and constructs a matrix of all the possibilities, then queries that matrix to generate a hopefully entertaining new headline. Preview it here.

However, as we first wrote it, it used 1+ GB of memory when we deployed to Heroku, where the limit is 512 MB! We wrangled it down to 570 MB today and will tackle it again tomorrow morning, but need more ideas.

Some things we’ve tried:

  • Pickling the matrix. Not pickling the matrix. (didn’t help)
  • Storing the words as ints in our matrix rather than strings (helped)
  • Tweaked the data structure (helped, could probably do more with this)
  • We ran out of time, but want to try some sort of data store. Redis?

What should we try next? Our code after the jump:

In Search of Funny Gibberish

David and I paired this week to build a bot that analyzes 15MB of old Hacker News and generates completely new headlines that really sound like they came from HN. You can follow the Twitter bot or check out our code.

Inspiration

I was inspired by this Hacker News parody and this Twitter bot to build a Markov chain generator as my warm-up project in Python.

A lightweight introduction to Markov chain generators

Markov chain generators are an interesting and dead simple way to generate a chain of anything - words, weather predictions, market simulations - anything that has happened before and can happen again.

The chain is generated by examining the current link in the chain, picking the next link based on what typically followed the first link in the training corpus. Then, it completely forgets about the initial link. It chooses the third link only based on what generally comes after the second link. And so forth.

So there is no history, no memory. The generator can switch between training sentences stochastically, so it can generate sentences that sound kind of like the original source, but don’t necessarily make any sense, and are hopefully funny.

Bigrams vs trigrams

The first design decision to make is how much history to examine when choosing a new link in your chain. If you only go one word back, you are looking at pairs of words, or bigrams.

Some sentences generated with bigrams: (please pardon that they’re all seeded with ‘why’ - this is from an early branch)

1
2
3
4
5
6
7
8
9
10
Why I Am I Talk About To Be Designing For The Daily Check Out Of Braid Creating False
Why Machines Is Right Fit
Why Nokia Partners
Why We Doing In Siberia
Why Are Literally Amp Chrome Opera Singer Is Coming Soon
Why The Twitter Besides Buying Groupon Will Not Good Freely Available To Effectively Off
Why Objective C Safer In Gears
Why Some Sleep Deprived Brains
Why I Like Instapaper Redesigns Foursquare Checkin Offers Readers Cause Problems And Should Set Theorists
Why Dropbox S More Music Gear Online Teaching Ror Developers

They are nice and random, and clearly contain the right buzzwords, but they aren’t very grammatical and therefore can’t be funny. Humor relies on surprise: setting up an expectation, then delivering something different. With bigrams, you never get a coherent enough sentence to generate an expectation, so no shot at being funny.

Let’s smooth out our chains by moving from bigrams to trigrams. So instead of looking at what follows a given individual word in our training corpus, we’ll look at what follows a pair of words. Here are some examples - note that the grammar is significantly better and some are worth a chuckle. I particularly like 7 and 10.

1
2
3
4
5
6
7
8
9
10
My Year Of Experience Is A Big Twist
Engine Yard
Mini-microsoft: Compensatory Arrangements Of Certain (microsoft) Officers
The 12-step Landing Page
More Webmaster Questions - Answered
Scripting Gnu Screen
The Full Social Network Buttons To Operate On Your Terrorist Neighbor
Typical Tech Entrepreneur?
The Lost Lesson Of 'free'
Contact Lenses Are Curing The Founder's Syndrome

Which seed?

HN is at its least self conscious and most easily lampooned when doling out advice. Consider these ‘how’- and ‘why’-seeded sentences:

1
2
3
4
5
6
7
8
9
10
How To Choose The Right People And The Chance To Present
Why Javascript Is Broken
How Do You Manage Your Startup’s Pr At Tech Startups Are Moving To Rackspace
Why Do Organic Eggs Come In Bunches
How Apple Is The Prevalence Of Qwerty
Why Google Wants To Magically Transfer Gov Debt To Darwin
How To Finance Your App From The Lhc Will See Global History Of Governments And Geeks Parse The World
Why Are Bank Security Questions On Agile
How To Hack The Us - So Stock Up 879.55%
Why Computer Displays Suck For You

When we choose a first word randomly from all the words that have ever been in headlines, we get a bigger assortment, but I think they’re less funny:

1
2
3
4
5
6
7
8
9
10
Canonical Contributes Only 1% Of Profit
Google Uneveils New Search Results With Google's Closure Of Paid Prioritization
What Do You Deal With Worldnow, Adds 19 Million Potential Users
Diminishing Dead-tree Media And Mobile Computing Is A Beautiful Monster
Ask Hn Yahoos: What Yahoo Should Do To Excel
China Demands New Pcs Is Ruined By A Thousand Years
Freemium: A Business Plan Competition
Buy My Blog, Please
Why I Am In Your Field?
8 Tips To Considerate When Planning To Move Themselves (neural Network)

The other drawback is that they’re more likely to hit on a seed with only one possible resulting sentence, like “Buy My Blog, Please,” above.

1
2
$ grep -i "buy my blog" ../hnfull.txt
Gawker media boss Nick Denton: Buy my blog, please

If you think of the bot as walking through the possibilities, a common seed like ‘how’ will branch of lots of different ways, so there are many paths for the bot to walk and thus many possible sentence outcomes. A less common seed, like “scripting” will result in fewer possible paths, and more likely to just return a real headline verbatim. There were only 10 ways to finish “buy my” in our corpus of 350,000 training headlines, compared to 7,710 ways to finish “how to.”

Raising the stakes

But how to make it as funny as possible? Some possible improvements:

  • Crowdsource funniness ratings (with mechanical turk or a ‘hot or not’ app, etc). Only tweet out the funniest headlines.
  • Feed the funniness ratings back into the algorithm. For example, only use the funniest seeds.
  • Do semantic analysis of parts of speech in the training corpus and use them with templates. This would improve grammar but decrease spontaneity.
  • Hire Nick and Dave to crank out more of these :)

Any other ideas?