Using zero-downtime restarts¶
Since version 5.1, pump.io has the ability to roll over to new codebases or configurations with no impact on uptime.
Warning
You should schedule maintenance windows even when making use of this feature. To preserve stability this feature’s error handling is extremely conservative, so if something goes wrong, you will need to restart pump.io the “normal” way.
To make use of this feature, first ensure you meet the requirements:
- MongoDB as the Databank driver
- Two or more cluster workers configured (this is the default)
- pump.io 5.1 or better
When performing a zero-downtime restart, pump.io will abort if it encounters any of the following errors:
- A requirement is not met
- A magic number from the new code doesn’t match the magic number
from the old code loaded when the master process started - this
number will be incremented for things that would make zero-downtime
restarts cause problems; for example:
- The logic in the master process itself changing
- Cross-process logic changing, such that a new worker communicating with old workers would cause problems
- Database changes
- A new worker died directly after being spawned (e.g. from invalid
JSON in
pump.io.json
) - A new worker signaled that it couldn’t bind to the appropriate ports
If a zero-downtime restart fails for either of the last two reasons, the master process will refuse subsequent restart requests and will not respawn any more cluster workers. In this case, you should restart your master process as soon as possible.
Note also that if a worker process doesn’t shut itself down within 30 seconds, it will be killed, and pump.io will also refuse a restart request if a restart is already in progress.
To prepare for the restart, first start a stream of your logs. For example:
$ sudo tail -f /var/log/pump.io/pump.io.log.json | bunyan
This step is very important as pump.io will report any errors to the logfile.
To actually trigger a zero-downtime restart, send SIGUSR2 to the pump.io master process. For example:
$ sudo killall -USR2 node
Warning
Node’s default action upon receiving SIGUSR2 is to terminate. pump.io worker processes override this behavior, but other Node.js programs on your system might not. Take care to not signal any processes you don’t want to kill.
You should continue to observe your logs until you see a message about the zero-downtime restart being complete.