Using zero-downtime restarts

Warning

This feature is highly experimental and is currently only available in git master. It will be released in the 5.1 release cycle.

pump.io has the ability to roll over to new codebases or configurations with no impact on uptime.

Warning

You should schedule maintenance windows even when making use of this feature. To preserve stability this feature’s error handling is extremely conservative, so if something goes wrong, you will need to restart pump.io the “normal” way.

To make use of this feature, first ensure you meet the requirements:

  1. MongoDB as the Databank driver
  2. Two or more cluster workers configured (this is the default)

When performing a zero-downtime restart, pump.io will abort if it encounters any of the following errors:

  • A requirement is not met
  • A magic number from the new code doesn’t match the magic number from the old code loaded when the master process started - this number will be incremented for things that would make zero-downtime restarts cause problems; for example:
    • The logic in the master process itself changing
    • Cross-process logic changing, such that a new worker communicating with old workers would cause problems
    • Database changes
  • A new worker died directly after being spawned (e.g. from invalid JSON in pump.io.json)
  • A new worker signaled that it couldn’t bind to the appropriate ports

If a zero-downtime restart fails for either of the last two reasons, the master process will refuse subsequent restart requests and will not respawn any more cluster workers. In this case, you should restart your master process as soon as possible.

Note also that if a worker process doesn’t shut itself down within 30 seconds, it will be killed, and pump.io will also refuse a restart request if a restart is already in progress.

To prepare for the restart, first start a stream of your logs. For example:

$ sudo tail -f /var/log/pump.io/pump.io.log.json | bunyan

This step is very important as pump.io will report any errors to the logfile.

To actually trigger a zero-downtime restart, send SIGUSR2 to the pump.io master process. For example:

$ sudo killall -USR2 node

Warning

Node’s default action upon receiving SIGUSR2 is to terminate. pump.io worker processes override this behavior, but other Node.js programs on your system might not. Take care to not signal any processes you don’t want to kill.

You should continue to observe your logs until you see a message about the zero-downtime restart being complete.