Building cloud platforms on a blank canvas

Most cloud platforms aren’t designed; they’re inherited. They evolve. They’re patched, extended, reworked over time. Ours wasn’t. We started from scratch.

When you’re starting with a blank canvas, everything is a decision, and at the beginning, I didn't even realise how many decisions there would be.

It’s not just “spin up some compute and networking”. I was suddenly responsible for things like security models, access control, monitoring, alerting, scalability, naming conventions, how things get redeployed - the list gets long very quickly! The tricky part is, the choices you make early on tend to stick. There were things I never expected to be responsible for, like incident management platforms, service portals, and procedures.

For me, building a platform for financial institutions added another layer of pressure. It’s not just about making something that works. It has to be secure, auditable, and predictable. At the same time, if it’s painful to use, engineers will work around it, find the loopholes, and that’s where problems can start.

Naming is important

One of my main goals was pretty simple: make the platform easy to use for engineers who aren’t part of the platform team. We started with naming. It's not the most exciting topic, but it turned out to be one of the most important.

When you’re looking at cloud resources, you’re usually just trying to answer a few basic questions:

  • What is this?
  • Where is it running?
  • Is it dev, test, or production?

If you can’t answer those quickly, everything slows down: debugging, support, even just understanding what’s going on. We built our naming conventions around solving that problem. Every resource name tells you the service, the environment, and the region. Nothing clever or cryptic, just consistent and obvious. It sounds small, but it makes a big difference when you’re dealing with hundreds (or thousands) of resources. You don’t need to dig through dashboards or documentation just to figure out what you’re looking at, you can see it straight away.

Security

The next big area I had to think about right from the start was security. And yeah… it’s probably everyone’s least favourite topic in tech. It’s easy to say “let’s just lock everything down,” but if you go too far, you end up slowing engineering teams down to the point where they avoid the platform altogether, or they look for those loopholes and workarounds, which is exactly what you don’t want. Getting this balance right early on can save a lot of pain later.

That said, security is never a one-and-done thing. It’s always evolving as your platform grows, but putting some solid thinking into it from the beginning makes it much easier to manage over time. A lot of our approach came from experience of seeing how things were done in larger organisations and thinking, "I don’t want us to end up like that". So we started simple. What does everyone actually need?

The answer was visibility.

Most engineers don’t need full access, they just need to understand what’s there. What’s running, where it is, and how things are connected. That alone solves a lot of day-to-day friction.

We made a conscious decision to default to read access for infrastructure, but with clear boundaries. For example, engineers shouldn’t be able to read sensitive data in client environments without a proper break-glass process. The same goes for logs in higher environments - access to those environments needs to be controlled and approved.

We introduced a baseline “infrastructure reader” level of access that gives visibility into what resources exist and what they’re doing, without exposing anything sensitive. Anything beyond that, especially in higher environments, has to go through a controlled, auditable process using privileged identity management (PIM). This means that a user has to request and be approved for higher rights rather than having them by default. Every step up in access is intentional, time-bound, and traceable. At its core, everything was built around least privilege but in a way that doesn’t get in the way of people just trying to do their job.

The iterative process

Finally and perhaps most importantly when you are making every single decision on that blank canvas, you have to be willing to accept when you get it wrong. Not in a “give up and walk away” sense, but in recognising that your first idea is rarely the one that sticks.

At the start of this post, I mentioned that it’s easy to look at an inherited legacy platform and wonder, “Why did they do it like this?” But when you're building from scratch, you quickly realise how those decisions happen. An implementation that made perfect sense in a past role, or one that a cloud provider highly recommends, might completely miss the mark for your current engineers. I’ve lost count of how many things we thought would work perfectly the first time, only to realise they made sense for someone else’s environment, not ours.

That’s just the reality of building from the ground up. It’s an iterative process. You try things, you learn, you adjust, and you move forward. Even with a clean slate, I still made mistakes along the way.

And honestly, that’s where the true value lies. When you’ve seen firsthand what doesn’t work, it fortifies your architecture. You aren't just blindly following trends to meet those strict financial compliance requirements; you deeply understand the trade-offs. You know exactly why a particular security model, access control, or connectivity architecture or service was chosen because you’ve tested the alternatives and learned from countless POCs and trial-and-error runs or even your first implementation that after a few weeks you realised wouldn't scale.

You’re not going to get it right the first time, and that’s exactly how it should be. Platforms aren’t designed flawlessly in one go. They’re shaped by the mistakes you’re willing to learn from. We set out to build a platform that was secure, auditable, and genuinely usable for our engineers without them needing to find workarounds. We didn't achieve that on day one, but by embracing the trial and error of the blank canvas, we ensured we were building a solid foundation and not just another inherited mess for the next team to fix.