What makes a successful and sustainable platform engineering portal? Is it the level of observability it permits? Is it how many useful things it can do with your Terraform? Is it how customizable and user-friendly it is?
Well, yes, all this and more. This is the fifth article in our series on sustainable platform engineering and we’ve discovered that the sustainability of a portal isn’t always obvious at first sight - but one thing is crystal clear. Building a platform engineering portal is a colossal waste of money and time - and therefore very unsustainable - if no one uses it. You need to make sure the possible dealbreakers are addressed long before you ever make a move or else sustainability will be nothing but a pipe dream.
One way to avoid these dealbreakers is to focus on them before you even think about creating any features. One of the most important of these “dealbreaker defusions” is the one we’ll discuss today - governance and security. In this domain, balance is everything and if your team have so much of an inkling that it’s off-kilter, your platform will be sunk before you can even say “principle of least privilege”.
The overall aim for security and governance in platform engineering is to keep things tight enough to keep ops happy and flexible enough to keep users happy. Too tight, and all of the positive DevX attributes that teams love like confidence, autonomy, and self-learning will be completely absent (ops lock things down so techs are unable to make any security decisions for themselves). Too loose and, obviously, it will be a free for all that would risk losing money (wasting paid-for cloud resources) and almost certainly compromise the security of the business.
In many organizations, however, governance is still a bit of a dumpster fire. There’s a lack of clarity and a sense of randomness about policies and best practices and, as a result, infra management and resources. It’s also one of the few places where you can have a clear and direct impact on sustainability - clear up the mess, impose order, and start proactively streamlining and optimizing your usage. But when it comes to imposing order, there’s a risk of the extremes we spoke about earlier - and it’s that risk that might undermine the usability of your platform.
There’s a sweet spot, a compromise between the needs of Ops and the experience you offer your users - get it right, and your portal will thrive. Here’s what we think you need to concentrate on:
User experience: excellent user experience, especially when it comes to traditionally sticky areas like governance, can really up your chances of a platform being a success. When you’re enabling specific and fine-grained controls from the Ops end, the way they are presented to the users is key - without an accessible interface that allows people some degree of autonomy and independence about their decisions, tight controls are annoying and undermining. The perfect solution is to pre-program really specific security options but conceal any complexity behind a user-friendly interface. This means that techs can confidently and autonomously make decisions about ops-related topics, but the only options available to them are ones that have already been okayed by those who know.
IaC adoption: good governance is made infinitely easier by the adoption of infra-as-code, facilitating better management and making it possible to create a centre of excellence within your organization. As ever, the better managed a situation is, the more sustainable it is - you cut out waste, streamline processes, and easily keep an eye on all the goings-on. At Cycloid, our governance and security tools are based on Terraform, so the tool you’re going to use to render all of your resources in IaC is an important step in the process. Infra Import creates automatically generated Terraform that allows you to always apply pre-defined best practices and help track infrastructure drift. Then, once the IaC is in order, we use InfraPolicies, the Terraform-based implementation of Policy as Code, to provide control over changes to your infra while simultaneously defining validation rules.
Quotas, automatic approvals and role-based permissions: when you have well-controlled governance, it becomes easy to set up appropriate quotas and permissions, even when you have multiple roles or even businesses to deal with. This removes risk and decision fatigue from non-expert members of your team. On a larger scale, they also contribute to sustainability, since you can keep precise and proactive track of resources, which in turn avoids waste and helps you optimize spending. Cycloid does this with Roles and Multiple Organizations. Meanwhile, Quota Management offers unobtrusive governing constraints to limit resource consumption. Your ops team will set resource quotas for private clouds (Nutanix, VMWare) in advance and control exactly who can consume what.
It’s the platform engineering team’s job to make sure you get the governance balance right. It’s something that will need to be carried out with the help of the Ops team and the feedback of everyone involved, but it’s something you should oversee and, crucially, help translate into your portal’s reality. And implementation is where a 3rd party tool can help - it can provide the buffer and UX-strong link between the tight controls your Ops need and the flexibility techs want. It can offer ways of hiding tight controls and fine-grained options behind an interface that everyone can use and that’s no small order.
We hope you see better how governance and sustainability are linked. Governance amps up visibility and observability and what can be properly seen can be properly controlled. When you can control something, you can ensure it’s working as efficiently and rationally as possible. That’s the essence of sustainability, and if you achieve that on your platform team or the portal you produce, you’ll be in the best possible position to play the long game - for both your business and the planet.