Software vs Automation

Mina's Profile

Posted on Thursday, 3 Dec 2015 by Mina Galić

As always, this started out as a presumptuous tweet:

Of course, this was met with a certain skepticism. So here are some lessons learned from ~9 years of putting applications into production, and being woken up by them falling over.

don’t invent your own configuration format

…not without good reason, anyway. Therearetoomanyconfigurationformatsalready. Before inventing a new format, please discuss this with your ops people beforehand.

version numbers are a contract

This contract extends to the API, the ABI, your configuration, your utilities. You can break the contract between major versions, but please, for the love of Knuth don’t break the method of determining the version

providing a tool to modify the configuration…

Does the tool need the daemon to be running? Will it know when the configuration is broken? Will it be able to tell me how the configuration is broken?

use the underlying systems’ facilities

Rather than bootstrapping a daemon and a watchdog, use (smf|systemd|etc…) Use the OS package manager, or your programming environment’s package manager (gem|pip|war|etc). Or your container’s package manager. This makes installation really easy, and an atomic transaction.

Do not reinvent a new package manager

If you accidentally did anyway, consider providing a way of finding out if the package manager has already ran successfully.

Put packages into repositories

This makes installation actually easy, and dependency management possible within the above mentioned atomic transaction.

Consider providing your repositories over IPv6

Oh, your package repository should not be (only) github.

If your software needs to scale, it’s easier when it’s stateless

If you need to replicate state, how will you do that to 3 nodes? How about 300 nodes?

These are the basics. Your software is now installed, configured and running! But is it running correctly?

Provide an easy way to get metrics from the software

This isn’t restricted to health metrics of the application. But can be liberally extended to the health of the business.

And finally, if you want to be really nice to your admin, not just your admin’s automation software:

Provide debuggable errors in the log

Can someone who didn’t write the software find out why it crashed from the logs? Can they fix it?

Document each of the above

How else do you think they’ll actually discover any of that functionality!?

That’s all folks!