Sasha Laundy

I like science, Python, and bacon.

The Slacker Developer’s Guide to the Mavericks Upgrade

I am not an early adopter.

Until last week, I was running 10.7. Yes, it’s true. Don’t judge. I had code to ship. No time for yak shaving!

But finally I bit the bullet. After dual redundant spatially-distributed backups, I ran the upgrade. And everything went great at the GUI level! And the Messages app is awesome! But Ruby, virtualenv, and Java were all borked. I don’t even Ruby, but I use Octopress and Heroku, so I couldn’t even blog or restart Hubot. Sadness!

Some tips in your upgrade:

  1. Homebrew links will probably be messed up. brew unlink foo and then brew link foo fixed most of these. Consider scripting it to save time and heartache! E.G. for i in $( brew list ); do brew unlink $i; brew link $i; done

  2. Ruby has a new version, and RVM is now using .ruby_version instead of .rvmrc to keep track of Ruby versions. It automatically moved these files around without telling me, which is half of what broke my blog.

  3. Ruby 2.whatever will complain about Readlines or Yaml or OpenSSL depending on its mood. Try uninstalling Ruby with RVM, installing whatever it demands, and then reinstalling Ruby and seeing if it works. At this point, I’m basically thinking of Ruby as a demanding newborn that is astonishingly articulate about its needs.

  4. Java is gone because reasons. Here’s how to reinstall it.

  5. I’m still not sure what’s going on with my virtualenv but I think it has to do with my questionable decision to use a third-party Python distro. Passing in a path to the system Python with -p seems to be a decent workaround.

  6. A bunch of packages wouldn’t install through pip, and it turns out that the new default when flags are not recognized by gcc is to CATASTROPHICALLY CRASH rather than keep calm and carry on. Don’t have time for this drama? Add the following to your environment:

1
2
export CFLAGS=-Qunused-arguments
export CPPFLAGS=-Qunused-arguments

Back to work!

Unexpected Lessons From ORDCamp: Unstructure Is the Best Structure

I had the pleasure of spending the weekend attending ORDCamp in Chicago. It’s an unconference bringing together a wide range of artists, founders, engineers, educators, writers, and even a couple of beekeepers. I learned so many things I expected to learn (cool new ideas, tools, and stories) but the overall Gestalt experience brought several unexpected realizations.

The whole crowd at the closing session.

This is my fifth or sixth unconference. This format bring together people around some common philosophy or topic. Unlike a conference, there is no content set ahead of time. No confirmed talks, no pre-arranged speakers. The attendees construct the schedule together on the first night. This works way better than you’d think.

Session leaders are not necessarily even experts in their topic. It’s perfectly acceptable to just start a session around a question you have been pondering, creating a space for an interesting discussion. I ran a session called “Training Animals (and Humans) for Fun and Profit.” I’m not an animal trainer and I’ve never had a pet. A dog training book [1] changed the way I think about parenting, strengthening relationships, software user interfaces, and how to change my own habits. I wanted to share what I’d learned.

Sessions don’t have to even be about ideas. One of the more popular sessions was watching someone play the video game Spelunky. Others were more hands on. I’m particularly sad I missed Flaming Nunchucks, which was exactly what it sounds like.

One reason this format works so well is that it creates just enough structure. Organizers make sure social norms and expectations are clear, give advice on how best to navigate the experience, and then put the content squarely in the hands of the attendees. This works so well precisely because there is a void to fill. If you don’t step up and offer something interesting to the community it’s going to be a pretty boring day. It puts people in a contributor mindset, which changes everything.

At normal conferences, you can really phone it in. Sit in the back, half-listen to the sage on the stage, flip through Twitter, hang out with people you already know. But at well-run unconferences, opportunities to engage feel precious and fleeting. You want to attend every session and meet everyone.

At the intro session, organizers Fitz and Zach encouraged us to settle in, “carefully place your stuff precisely anywhere” and lay off the tweeting and posting to indulge in a weekend for ourselves. I left my laptop and phone in my bag the whole weekend. I knew it would help my focus, but I didn’t expect it to so radically improve my energy level. Instead of sliding into Twitter or email when I had a down moment, I had time to be thoughtful: to say hello, ask a question, get some water, or play Chopsticks on a musical Tesla coil. Twitter and email feel stimulating but are actually shallow and exhausting: aspartame for the mind.

When we felt the urge to check our phones, organizers and ORDCamp veterans enouraged us to see that as a sign to get up and leave the session we were in and explore others. Making it socially acceptable to say ‘no’ politely and find something that’s a better fit for you isn’t disrespectful. Quite the opposite: you get a better session, and the other folks in the room get engaged, excited partners. [2]

The other piece of unexpected advice came from ORDCamp veteran Bill. “Choose the session you’re least interested in.” I couldn’t resist the interesting sessions for the first few rounds, for the 5pm slot I chose The Art and Science of Smoking Pigs. I’m never going to have the space to do it myself, and how much could there possible be to say about pig roasting?

Moshe blew me away. He’s spent years refining his barbeque technique, and showed us the science, art, and culture that went into the best barbeque I’ve ever had. Not only that, but he fed us rib tips and barbeque sauce-covered Cap’n Crunch. And he brought a blow torch and charred up a few different wood samples so we could get a sense for the different smoke flavors.

Luck surface area [3] is an immensely useful concept. Bill’s advice was essentially to increase my serendipity surface area, exposing myself to ideas that perhaps weren’t interesting because I just hadn’t seen them in the right light before. [4] I’ll certainly never see barbeque the same way again.

Unconferences in general are all about increasing serendipity surface area. They are just enough structure to create an amazing skeleton, and then get out of the way and allow the incredible brains in the room to fill the space deliberately left empty. It’s a structure that gets out of the way.

Hacker School is such a radically efficient educational experience for exactly the same reason. In some sense, it’s a 12-week-long unconference with ad-hoc one-to-two person sessions. It’s a school that gets out of the way of your learning. This is also why I chose an unstructured hack night as the first format for Women Who Code. The group simply a social platform to bring together phenomenal people and then get out of their way so they can build the code and relationships that they need.

The most surprising thing I’m taking with me is perspective on my daily life. NYC hardens you around the edges quickly. There’s so much noise and hubbub to filter out, and so many people demanding your time, attention, and money. With world-class everything, it’s easy to slip into casual jadedness. Seeing people share their excitement and passion for beekeeping and parenting and schools and tattoos and nunchucks and games and sousaphones, I couldn’t remember the last time I was that viscerally excited about something. I want to put that feeling in a jar and keep it with me in the big city.

I’ll let you know how it goes. In the meantime, get yourself to an unconference. Or make your own.

[1] Recommended to me by habit formation expert BJ Fogg, whose work has also been immensely useful.

[2] This is also great relationship advice.

[3] Luck surface area means increasing the opportunities you have to be lucky, either by increasing your exposure to opportunity or by preparation to take advantage of it when it comes around.

[4] Someone’s offhand comment on Friday gave me a great new metaphor: ant trails. Ants wander around exploring somewhat randomly. As they go, they lay down trails of pheremones. When other ants come across these trails, they follow them. In short order, the entire workforce are no longer exploring, just trucking food back and forth on established trails. It’s the best way to efficiently keep their colony fed…at the expense of exploration. It’s very easy to go to sessions in your wheelhouse. Bill’s advice got me out of my ant trails.

Blaggregator Now Has Comments!

We were having some great conversations about fellow Hacker Schoolers’ blog posts in Humbug, but that channel is too high-volume for most alums.

Enter: Comments! The Hacker School community can now privately discuss blog posts in a pretty interface with a high signal-to-noise ratio. Check it out (Hacker School login required).

Git + Autocomplete = Bliss

Did you know that you can turn on autocomplete for git? A quick survey of other Hacker Schoolers turned up that most Linux distros come with this built in, but most Mac users didn’t have it installed.

  1. Streamline your tools
  2. Save time
  3. Profit!

It’s super easy. Put git’s autocomplete script in your home directory. Then add it to your .bashrc by running source ~/.git-completion.bash.

There’s more detail here, along with aliases, which are a MUST. I added git unstage FILENAME which I find much more intuitive than git reset HEAD -- FILENAME. Hooray!

Part of my morning routine here at Hacker School is learning a few new things about Python, bash, and git, since in the long run, knowing these tools like the back of my hand saves tons of time.

Let me know on Twitter/Humbug if you want to hear more about this routine.

Demo Code Review

A big part of life at Hacker School is code reviews. Sometimes facilitators review code but for the most part, it’s our super talented peers.

Getting reviewed is really valuable to accelerate learning. It points unclear parts of your code, and blind spots you didn’t know you had. In some of my projects I’ve implemented things in a brute-force way, and my reviewers have pointed me to data structures or libraries I hadn’t yet discovered that allowed me to cut out lots of code.

A picture of our code review session.

Today Allison (a facilitator) offered a demo code review, so we could see how she thinks about a code review. We went over a Lisp implemented in Python written by Nick.

The crowd watching was pretty diverse, so everyone took something different away. Some learned about Python, some about the Git workflow, and some about Lisps.

On Rakefiles and Rabbit Holes

TL;DR. I noticed a bug in Octopress. In fixing it, I found a separate, truly spectacular error, and learned a lot of interesting things about bash. I would never have learned so much outside of Hacker School, since it gives me the time and space to open up the box.

The original bug.

Octopress’ rake preview lets you preview your post in the browser before deploying. It spawns Rack to serve local requests and Guard to watch for file changes. Guard in turn spawns Fsevent to do the actual watching.

Handy pstree is handy.
1
2
3
4
5
6
7
8
9
$ pstree
...
 | | \-+= 31340 root login -pf sasha
 | |   \-+= 31341 sasha -bash
 | |     \-+= 31430 sasha ruby /.../ruby-1.9.3-p374/bin/rake
 | |       |-+- 31433 sasha ruby /.../ruby-1.9.3-p374/bin/gua
 | |       | \--= 31443 sasha /.../ruby-1.9.3-p374/gems/rb-fs
 | |       \--- 31434 sasha (ruby)
...

But when you crtl-C, Fsevent stays running in the background until you kill it or modify a file in the watched dir. This throws a TCPServer Error if you try to run rake preview a second time. Sad pandas.

Initially I had no idea what was going on here. I’ve dabbled in Ruby but not modified a Rakefile before. I learned ps and kill ten years ago, but didn’t know much about what was going on under the hood. Ripe conditions for learning a ton.

This is what I love about Hacker School. I have the time and space to go down these rabbit holes. At a startup, priority goes to shipping code. Here, shipping code is the means, not the end.

In the end, the actual bug was pretty small: when you interrupt rake, it passes kill 9 to guard. Guard ends itself but doesn’t properly terminate its child process fsevent`.

Rake preview catching interrupts
1
2
3
4
  trap("INT") {
    [guardPid, rackupPid].each { |pid| Process.kill(9, pid) rescue Errno::ESRCH }
    exit 0
  }

This can be duct-taped together by sending 3 (QUIT) instead of 9 (KILL) to Guard, but we’re asking the Guard team if this is a known issue. But it gets better!

Curiouser and curiouser

Things got really weird when I was playing around with the different exit codes. The Ruby documentation says that if you pass kill a negative argument, it will kill the entire group of processes, not just the child process. Promising!

However, it broke very spectacularly. If you want to follow along, grab the development branch of Octopress, currently 2.1:

$ git clone -b 2.1 https://github.com/imathis/octopress.git

Find the preview task and change the message it passes on interrupt from 9 to -3:

Line 161 as of this writing
1
2
3
4
  trap("INT") {
    [guardPid, rackupPid].each { |pid| Process.kill(-3, pid) rescue Errno::ESRCH }
    exit 0
  }

Then run rake preview (you will have to run rake install the first time). Wait until you see Guard’s prompt (powered by Pry):

16:10:05 - INFO - Guard is now watching at '/Users/sasha/code/octopress'
[1] guard(main)> 

Then Control-C. You will see some lines of error output, and then BOTH guard and bash’s prompt, instead of one or the other. Wat!

[2] guard(main)> SASHAs-MacBook-Air-2:octopress sasha$ 

Weird! Press any key that sends input. PRY EXPLODES. Press enter and you get a bash prompt back, but now you can’t see anything you type. It’s still getting to bash (try ls <enter>) but some things, like control-L, no longer work.

Lolwut?

Down the rabbit hole

So I started reading about TTY and POSIX signals and using stty. Interesting stuff, particularly the history of our terminal evolving from ticker tape outputs.

You can also change all sorts of wacky things about your terminal with stty. Try stty -echo (stty echo to undo it). This explains why I wasn’t able to see my own typing after the Pry explosion - when control was reluctantly handed back to bash, the flags on my terminal weren’t properly reset, including the flag to use raw (non-canonical) input processing, which is why it won’t process things like control-l until you hit enter.

I didn’t find all the answers in my reading, but I’m asking significantly better questions:

  • How groups are being used is unclear. I expected the process group to inherit its gid from the parent process and to be killed cleanly by passing a negative argument to kill, but that didn’t work. Passing -3 kills Rack but only mortally wounds Guard.

  • One remaining mystery: why is the bash prompt printed to the terminal after the Guard prompt when Guard is still running? Maybe control is passed from Guard to bash and then back to Guard?

  • It seems that Pry is getting something unexpected from its stdin, which triggers its explosion, which may or may not be coming from bash. But how to intercept it?

  • The Readlines library defines some bash shortcuts, like mapping ctrl-l to clear, but lets you override them with an .inputrc. So you can do awesome things to your prompt like adjusts how ctrl-w works. There are also vim and emacs modes for bash.

Stuff I learned: a little Ruby, Rakefiles, POSIX signals, the history of TTY, more about stdin, stdout, and stderr, stty, working with pids, gids, and the Process module in Ruby. And my first accepted (albeit one-character) pull request of Hacker School, which is certainly the highest learning-to-LOC ratio I can think of.

Turbocharging Octopress

This blog is powered by Octopress, which is basically a set of rake tasks, themes, and other add-ons generate a blog from Markdown posts. It’s in turn powered by Jekyll.

The documentation is generally pretty good, but they didn’t really explain one fundamental thing. It’s pretty simple once you dig into the Rakefile, but here’s a quick explanation if you just want to get up and running.

WTF is going on with the branches?

On github, you will have two branches: source and master. But locally:

1
2
$ git br
* source

Huh. Interesting. Locally, you only have one branch: source. Wat?

Basically, source holds your posts in Markdown and other files before they are transmogrified into HTML. Once you run rake generate, Octopress will generate all the HTML & CSS and put it all in /public.

And when you run rake deploy, Octopress pushes the contents of /public (on local branch source) to the home directory (on remote branch master).

So: each time you finish a post, run BOTH rake deploy to deploy and git push origin source to back up your source files to github.

Sublime Text <3 Markdown

So you’re writing your blog posts in Markdown like a boss. There are a few things you can do to make Sublime Text 2 a lean, mean blogging machine:

Custom themes. For example, MarkdownEditing, a series of custom themes and shortcuts for Markdown. (Hint: installing Package Control first will make this easier)

Spell check. Sublime Text 2 uses Hunspell for spell checking, the same library used in Word. This process was a bit more convoluted:

  1. Command-shift-p in ST2 to open Package Control
  2. Choose “Add Repo”. Paste in http://www.sublimetext.com/docs/2/spell_checking.html
  3. Command-shift-p again. Choose “Install Package,” then “Dictionaries”
  4. Find your preferences in Preferences > Package Settings > Markdown Editing > Markdown Settings - User
  5. Add the following to the settings file:
     "spell_check": true,
     "dictionary": "Packages/Language - English/en_US.dic"
    
  6. Restart Sublime Text. Observe red squigglies when you edit Markdown.
  7. Profit

Crashing Heroku (or When Memory Management in Python Really Matters)

David and I paired this week to build a Markov chain generator. It analyzes 15MB of old Hacker News headlines and generates completely new headlines that really sound like they came from HN. You can follow the Twitter bot or check out the code.

Last time, we asked your advice on how to handle all possible trigrams in 15 MB of seed text - a task that took up about 1 GB of memory in our first implementation, rapidly blasting through Heroku’s limit of 512 MB. The culprit? Dictionaries.

Our initial data structure was a dict of dicts. The key was a bigram, or word pair, and the value was a dict of all of the words that followed that bigram in the training text and the number of times they occured. Wat? Here’s an example:

Our initial data structure
1
2
3
4
5
6
7
8
9
10
11
12
{
  ('how', 'to'): { 'do': 529,
                   'grow': 202,
                   'know': 134,
                   .....
                 },
  ('to', 'do'): { 'the': 1245,
                  'everything': 149,
                  .....
                },
  ...
}

With 15 MB of training text and so much redundancy, this can get out of hand quickly! Every word appears at least 3 times in the matrix, and some appear thousands of times.

Possible solutions:

  • Generators. for iterating over the headlines. This helped a lot - got us from 1.3 G down to about 850 MB.
  • Compression. Tossing out common values, like 1 for the count of the long tail of words.
  • Creative use of data types. E.G. storing words as ints instead of strings.
  • Lists. They take up less memory than dicts. One variation that we tried (and is in this version of our code) is creating two dicts of lists rather than one dict of dicts.
  • Trade memory for speed. Darius, a Hacker School alum, pointed us to his Stack Overflow answer, where he laid out a few other approaches that mainly trade memory usage for speed. Also, David dug up some recent papers that focus entirely on this problem. Researchers are currently doing some very clever algorithms to get the most out of their data structures.

So where did we leave it?

Darius’ suggestions and the approaches in the literature will certainly help reduce memory usage. We wanted to maximize learning per unit of time at Hacker School and didn’t think implementing these approaches teach us proportionally more than reading about them. We shipped it by deploying on a machine with more memory. Follow the Twitter bot here!

Stuff I learned along the way:

Generators, classes and methods in python, pickle, regex, defaultdict, Tweepy, Buffer’s API, how bad the built in titlecase methods are in Python, some clever way to handle default arguments, some fundamental principles of NLP.

Help Us Plug Our Memory Leaks

David and I are running our app on Heroku, but it uses too much memory to run! Got any clever optimization suggestions?

We’re building a Markov chain generator that is trained on a corpus of 15MB of old Hacker News headlines. The app currently indexes the training corpus and constructs a matrix of all the possibilities, then queries that matrix to generate a hopefully entertaining new headline. Preview it here.

However, as we first wrote it, it used 1+ GB of memory when we deployed to Heroku, where the limit is 512 MB! We wrangled it down to 570 MB today and will tackle it again tomorrow morning, but need more ideas.

Some things we’ve tried:

  • Pickling the matrix. Not pickling the matrix. (didn’t help)
  • Storing the words as ints in our matrix rather than strings (helped)
  • Tweaked the data structure (helped, could probably do more with this)
  • We ran out of time, but want to try some sort of data store. Redis?

What should we try next? Our code after the jump: