[PDB Tech] PDB Outage 2017-08-24

Job Snijders job at instituut.net
Thu Aug 24 06:10:51 PDT 2017


Dear all,

Earlier today PeeringDB was unavailable due to an unplanned outage.

Start:      2017-08-24T05:36:48Z
End:        2017-08-24T10:42:38Z
Duration:   5 hours and 5 minutes

Background
----------

The physical machine on which the main peeringdb instance runs suffered
a kernel panic. This in itself shouldn't be a significant event since
the machinary is configured to restart after panic, however, a
misconfiguration in the pyhsical network interfaces configuration file
prevented the PeeringDB virtual machines from booting.

Unfortunately it took some time to reach the appropiate people who are
able to fix relatively trivial issues like these and bring the service
back online.

No data was lost due to this outage. (side note: full copies of all
peeringdb data are distributed to diverse locations every minute. Copies
exist in San Jose, Chicago and Amsterdam.)

Follow up for PeeringDB volunteers:

    - validate 'reboot robustness' more often
    - ensure up to date OOB credentials are distributed to appropiate
      volunteers

Every cloud has a silver lining
-------------------------------

This service disruption event was used as an opportunity to increase the
resources available to peeringdb (more cpu & ram), as well as to upgrade
to a kernel with performance improvemenents and apply security updates.

Reducing reliance on the main PeeringDB instance
------------------------------------------------

For mission critical deployments of PeeringDB related software, I
recommend to consider using a local copy of the PeeringDB data. An
example python program can be found here: http://peeringdb.github.io/peeringdb-py/ 

Kind regards,

Job


More information about the Pdb-tech mailing list