Digital transformation usually requires organisational redesign, behavioural change and significant technology adoption. As well as using the new generation of digital collaboration tools, moving your technology estate to the Cloud is an essential step for many organisations.
We've discussed Cloud Transformation and Technical Debt in another post. Now that you are commited to moving to on-demand, opex-funded cloud infrastructure, whether it runs a multi-million pound business or is supporting a start-up aiming to grow, you should be asking yourself "how good is my cloud set-up?"
This can be a hard question to answer. You may have multiple accounts with your cloud service provider; you may be running multiple applications that use multiple cloud services in various combinations. You may have policies defining the access and use of cloud services by your employees and contract staff, but do you know if these policies are being applied or monitored?
More generally, the question itself is vague. What does 'good' mean anyway? How can you define 'good'? These are the sorts of questions that keep leaders awake at night.
We have particular knowledge of Amazon Web Services (AWS), so will answer the question for that specific domain. Other cloud providers have different ways of approaching the challenge.
AWS is by far the biggest provider of cloud services in the world. They currently offer well over 100 different products and services that you can use for your business, from general compute power and storage capacity, to really specialised services like Blockchain or capabilities for the Internet of Things (IoT). It is an often bewildering array of options that you can self-serve into any combination you like. And it keeps growing, in depth and breadth.
AWS recognised that the question of how good a specific combination of these services was could turn quite complex quite quickly. So they came up with the Well Architected Framework.
In essence, this framework tries to answer the question by breaking the problem down into five key areas, or 'pillars' in the AWS jargon:
How good is your set-up at protecting information, systems and assets?
How good is your set-up at recovering from infrastructure or service disruptions? Can it grow or shrink to meet demand?
- Performance Efficiency
How good is your set-up at using resources efficiently and at adapting to technological change?
- Cost Optimisation
How good is your set-up at running in the most cost-effective way?
- Operational Excellence
How good is your set-up at running and monitoring your systems?
Your 'set-up' is not simply your infrastructure and code. It is also, crucially, the processes and procedures that you have in place to implement, monitor and adapt.
Your system could be perfect on day one, but things can easily and quickly degrade in any of the above areas if you don't have the mechanisms in place to stay up to date.
The Well Architected Framework takes you through a series of questions that, in effect, force you to take a good, hard look at everything that you are doing, and identify gaps in your set-up when compared with current architectural best practice.
For example, in the Security section it asks "How do you manage credentials and authentication?" A simple question that leads to many more: Do you enforce multi-factor authentication (MFA) on log-in? Do you force people to rotate passwords regularly? Do you have a written identity and access management policy?
As you answer these questions, the gaps easily become apparent, as do the paths to remediation. Enforce MFA. Enforce password rotation. Write down your policies.
When it comes to Reliability, the Well Architected Framework asks: "How do you monitor your technology resources?". Again, a simple question that leads to others: Are you even monitoring the default metrics?". Are you sending notifications based on the monitoring? "Do you perform automated responses on events?" Each forces you to look at that area of your infrastructure. Now, you may decide that even though your services are sending notifications, you are not initiating any automated responses based on events - so a human has to decide what to do. This may be fine in your case, right now. But the review forces you to understand that and decide what levels of risk you are prepared to take. There are often no binary answers, just trade-offs.
The Well Architected Framework sounds dry, and it can be, but it is a fascinating and eye-opening item of work. For one thing, it allows the reviewer a privileged look inside other people's operations. You have to dig around in detail so you learn a lot, both good and bad, about how your clients go about building their technology and making decisions.
Further, your work can be really helpful. We did one Well Architected Review for a food delivery start-up which had ambitions to grow and had doubts about the suitability of their existing set-up. We were able, among other things, to find security vulnerabilities (like databases accessible from the open internet) as well as to provide guidance on how to create separation between development and production infrastructure (have two separate AWS accounts under one Organisation) that was more amenable to expansion and easier to control.
But more importantly, perhaps, we were able to offer some peace of mind to the client. Yes, there were some findings that they needed to think about and change, ahead of moving into a growth phase; but they didn't need to rip everything up and start again. That in itself was probably worth the consultancy fee!
When your business, and your AWS footprint, is bigger and more mature, a Well Architected Review will involve more people and take longer. In one case, with a FinTech company, it would sometimes take days, email trails and meetings to answer single questions. But the principle is the same and the outcomes can be just as revealing. Of course, in those cases, there then comes the problem of implementing and communicating change, but that is for another article ...
It is worth pointing out that AWS offers a free online tool to run through the Well Architected Review. So you don't necessarily need to hire a consultant who will dissemble about how it needs to be done, and hint at the required black arts. The instructions are all there to follow and it is definitely not rocket science.
What we find is that most people on a client site are already fully occupied with their day job and do not have the mind-space to dedicate to a job like this. It needs to be done in a methodical fashion and within a reasonable amount of time. A further challenge is often that the required cloud expertise is distributed among multiple people in the organisation. So, getting someone external in who has done it before, who can see it through and who understands the questions that need to be asked and the level of answer that is required can end up being a good investment.