Dynamically adding and removing dedicated Concourse workers

Written by Gaël Lambert | Feb 25

Here at Cycloid, we use open-source pipeline engine Concourse in Kubernetes. Over time, we noticed a problem - every time we wanted to add a new worker (in Concourse, the worker is the entity responsible for executing the scheduler's operations), the process seemed long and inefficient. We decided to see if we could improve it by contributing to Concourse.

TL;DR

We added a way to dynamically add and remove dedicated Concourse workers.

The context

As things are currently set up in Cycloid, several of our customers share the same instance of Concourse. Each customer is known as a Concourse team, and for each new team, there's a new worker associated with that account.

For security reasons, we encourage our customers to provide their own workers - that way, the data isn’t shared with Cycloid and you have access to your own infra.

The challenge

Until now, tsa-team-authorized-keys flag (a Concourse flag) was used to provide customers’ worker’s SSH public keys, essential for communication between the worker and the scheduler. It worked well to achieve this and prevent customers from connecting to each others' workers but it’s not very scalable. It was time-consuming and didn't feel very streamlined, especially when we had to add multiple new teams every day - you need to provide flags for as many keys you want to authorize.

Challenge 1 - providing authorized teams keys via a single file
Challenge 2 - Concourse has to restart each time you add a new worker key into one of the authorized key files.

Before we looked at these challenges more deeply, this was how we did things:

--tsa-team-authorized-keys=teamA:/teamAfile

--tsa-team-authorized-keys=teamB:/teamBfile
....

--tsa-team-authorized-keys=teamN:/teamNfile

One awesome solution!

Cycloid loves open source, so it was pretty natural that we started to think of a fix - surely if we were having this problem, other people were too, right?

First, we decided that instead of passing each authorized key one by one, we would add a new key flag. This flag targets the YAML file that contains authorized keys for each team, more or less following this format:

- team: teamA
  ssh_keys:
  - ssh-rsa keyA

- team: teamB
  ssh_keys:
  - ssh-rsa keyB

This solved the first challenge. The YAML improved the dynamic configuration using a K8S config map and the new file was nice and easy to maintain (as well as being really easy to generate from our Cycloid API). It centralized all the worker keys in one file, but the file was still only loaded once at startup, which meant that every time a new key was added, Concourse had to restart.

To solve the second challenge, we implemented configuration reload in Concourse, so there was no longer a need for a restart.

So, combining fix 1 and 2, we ended up able to automate the process of creating a new team with dedicated workers, and then uploading the new definitions without restarting Concourse.

Today in Cycloid, when we have a new organization on Cycloid SaaS, the process looks like this: using the Cycloid API, we automatically generate the YAML team file. This file is then added to a Kubernetes config map. When this file is updated, a signal is sent to Concourse and that, in turn, triggers the refresh. Now we enjoy the free time we obtain with this contribution by drinking beer while watching the automation do the job - basically, the whole point of DevOps thinking, right?

If you want to go deeper into the implementation, check out PR#5652 and PR#3936 on GitHub.

View full post