Using zero-downtime restarts
==============================================

Since version 5.1, pump.io has the ability to roll over to new
codebases or configurations with no impact on uptime.

.. WARNING:: You should schedule maintenance windows even when making
             use of this feature. To preserve stability this feature's
             error handling is extremely conservative, so if something
             goes wrong, you will need to restart pump.io the "normal"
             way.

To make use of this feature, first ensure you meet the requirements:

1. MongoDB as the Databank driver
2. Two or more cluster workers configured (this is the default)
3. pump.io 5.1 or better

When performing a zero-downtime restart, pump.io will abort if it
encounters any of the following errors:

* A requirement is not met
* A magic number from the `new` code doesn't match the magic number
  from the `old` code loaded when the master process started - this
  number will be incremented for things that would make zero-downtime
  restarts cause problems; for example:

  * The logic in the master process itself changing
  * Cross-process logic changing, such that a new worker communicating
    with old workers would cause problems
  * Database changes

* A new worker died directly after being spawned (e.g. from invalid
  JSON in ``pump.io.json``)
* A new worker signaled that it couldn't bind to the appropriate ports

If a zero-downtime restart fails for either of the last two reasons,
the master process will refuse subsequent restart requests and will
not respawn any more cluster workers. In this case, you should restart
your master process as soon as possible.

Note also that if a worker process doesn't shut itself down within 30
seconds, it will be killed, and pump.io will also refuse a restart
request if a restart is already in progress.

To prepare for the restart, first start a stream of your logs. For
example:

::

   $ sudo tail -f /var/log/pump.io/pump.io.log.json | bunyan

This step is very important as pump.io will report any errors to the
logfile.

To actually trigger a zero-downtime restart, send SIGUSR2 to the
pump.io master process. For example:

::

   $ sudo killall -USR2 node

.. WARNING:: Node's default action upon receiving SIGUSR2 is to
             terminate. pump.io worker processes override this
             behavior, but other Node.js programs on your system might
             not. Take care to not signal any processes you don't want
             to kill.

You should continue to observe your logs until you see a message about
the zero-downtime restart being complete.