Introducing BDDA, the infrastructure workflow we use for Kubernetes

In Atlassian’s Kubernetes platform team, we’re working to bring the existing Kubernetes clusters under one team’s management; to arrive at some standard tooling, build processes, and workflow to build the Kubernetes clusters that are running internal workloads. This is the result.

We’ve spent a lot of time trying to find a workflow and dev process that would allow us to use DevOps concepts at the same time as we make sure that we do everything as close to the right way as possible.

The core part of this is a workflow that we call BDDA (pronounced like ‘Buddha’) – for “Build-Diff,Deploy-Apply”.

How we arrived at BDDA

As we started working on consolidating the existing Kubernetes clusters that had started springing up inside Atlassian, we decided that our environments had three goals:

To explain further:

Ease of reprovisioning

We want to move as far as possible towards having our infrastructure be cattle, not pets. We weren’t sure the extent to which we would be able to make the compute underlying a Kubernetes cluster immutable, but the aim was to head as far down that road as we could. The high-level goal we used to help with this was that it needed to be easy to tear down a cluster and rebuild it from scratch.

Declarative configuration

We believe that the declarative configuration model for config management is a better match for the core Kubernetes concepts – the reconciliation loop at the heart of Kubernetes is built around the idea of declaring the desired state of things and then waiting for the system to bring the world into line with the desired state.

We also believe that declarative configuration tools tend to gravitate to solving problems by including the solution in the upstream tools, as opposed to a set of steps inside a private repo somewhere. That said, we do use imperative tools where it makes sense.

Automated testing

We want to eliminate manual ‘Is this working?’ steps and lessen the chances of breaking changes making it to production. The best way to do that is to use the advances in CI/CD to run automated testing as part of our builds, using a dedicated tool. We’re using Atlassian’s Bamboo tool, which also has the side effect of giving us a bunch of compliance ticks for free. (How this is handled internally is a whole different story, that should be told by someone who knows it better.)

What is BDDA?

The key insight we had was that, for infrastructure, a ‘build’ can correspond to the process of generating a diff between what is in the config repo and what is running on prod, storing that diff in the repo, then checking that diff does what’s expected in some way, and Deployment is applying the config and removing the diff for that environment.

There are two big advantages to this approach:

There are, of course, disadvantages too:

How the workflow actually… flows

As outlined in our other post, we break up our infrastructure into three layers:

  1. Base AWS config (VPCs, subnets, and the like) – called the FLAG. This is built using Terraform.
  2. Compute config (everything required to present an unconfigured Kubernetes API) – called the KARR. Also built with terraform.
  3. Kubernetes config (everything that is applied to the Kubernetes API to make it a KITT cluster) – called the Goliath. What we use for this layer requires a little bit more explanation.

We would prefer to use Helm for the Kubernetes layer, and have a workflow all ready, using helm-diff, but it is currently blocked on Helm issue #1918 – turns out you can use the Tiller from inside the cluster to bypass all the RBAC controls we’ve implemented. So, for now, we’ve done this using Ansible’s templating, and using kube-diff to generate the diffs. As soon as that issue is resolved, however, we will switch to Helm.

The configuration for each of these layers is stored in a separate repo, but builds for each of the layers involve building the entire layer cake – to check for upstream effects of changes.

This diagram will hopefully help explain:

The workflow works like this:

Conclusion

So far, we’ve found this workflow to be great for our internal dev speed and completely compatible with our compliance regime, while meeting the original three goals: ease of re-provisioning, declarative config, and incorporates automated testing.

I’d be very interested to hear thoughts from other people doing similar (or different) things.

That said, if you would like to do something like this yourself, my recommendations would be:

Exit mobile version