Petyo March 5th
2010

OSS Friday release – Shellshot gem

After Ilya presenting pretty-diff gem, this Friday it is my turn to share another gem we chose to extract from Beanstalk.

Shellshot – Deal With System Commands the Right Way

First of all, let me warn you. Issuing system calls is usually the worst way to deal with a certain problem. Executable locations vary from platform to platform. Eventual errors are hard to track. Stuck processes are hard to kill. Parameters should be escaped properly. Chances are someone from the community implemented what you need as a gem. If you really need the performance, drop to C even.

Unfortunately, none of the options above were available for certain parts of our application, so we had to resort to system calls. Sigh. What an imperfect world we live in. But let’s try to prepare lemonade from the lemons we got.

Problem #1: System Calls Are Hard to Cancel on Timeout

So hard, a SystemTimer library had to be incorporated (note – the gem is SystemTimer, not system_timer). This was the first thing we learned the hard way – don’t rely on the standard Timeout library when you do system calls.

Fork? Wait? Waitpid? Exec or System? Kill?! Kill!!

Executing a command that may get killed is not that simple, in fact:

pid = fork do
  exec('your command here')
end

Process.wait(pid)

The whole fork/wait gymnastics is not for the fainthearted. The ruby documentation assumes you have solid unix background on the subject – which, unfortunately was not the case for us mere experienced-in-the-web-development guys. It took us some time to figure out how to combat the myriad of zombie processes at some point, due to the unfortunate choice of waitpid instead of wait.

Piping? Should I Call Mario and Luigi? Anyone Knowing What Context Switching Means?

This problem caused significant amount of hair loss for yours truly, as it was the most cryptic one. It turned out that there is a subtle difference between

exec('ruby -e "puts 1"')

and

exec('ruby -e "puts 1" > file.log')

In the first case, the command is executed directly. In the second, it is wrapped in sh -c "...". The exec command in turn returns the pid of the sh, and not the command (ruby). So if you try killing the process with the pid, returned sometimes the child command performs the so called context switching and continues its execution. Not what you want, usually.

If you need such output redirection, the solution here would be to redefine std{in,out,err} in the fork, before calling exec.

Confused? Bored to Death?

No worries. You don’t need to remember or understand the issues above. Just use shellshot (check README for documentations and samples). Under the hood it utilizes SystemTimer to keep the call under control; IO.pipe is used to collect stderr and report meaningful error messages to you and avoid context switching. After kill, Process.wait is issued to reap occasional zombie processes.

Should I ditch using exec and system on my own in favor of Shellshot?

Not at all! Shellshot is done for dealing with long-running processes which may get stuck, or can behave in unexpected ways. If your system calls are short and generally safe, this is is not something you should worry about.

Stay tuned for more code for free next Friday!