Anomaly Detection: Four steps to avoiding unforeseen cloud costs
Data volumes are simultaneously exploding and migrating, making it difficult for businesses and their CIOs to both monitor and secure across their environments. With so many moving parts in the cloud, it can often run away with businesses because as it expands to handle the data that it’s given, so does its cost. This is both a blessing and a curse, as flexibility is a vital attribute of a successful business, but it means that unanticipated cloud costs are common and hard to track.
Anomalous, unexpected costs can crush businesses’ confidence in their budgets and forecasts, impact their burn rate and make it difficult to understand their wider spending patterns. Yet, despite this, many wait until their bill arrives before reacting. To avoid these unpleasant surprises, which can be caused by a repository error, crypto mining incident or simply because servers have been spun up and forgotten about, businesses can turn to Anomaly Detection tools with great success.
Anomaly Detection systems can continuously autonomously analyze cloud billing data, alerting businesses to anomalies and allowing them to investigate further. The benefits of this are threefold:
- It allows businesses to detect cost spikes early to stop them in their tracks and ensure the first time they hear about them isn’t in a hefty bill.
- It drives cost accountability by tracking spending patterns, allowing staff to be more aware and accountable for the costs of their work, and preventing peak months.
- It provides cloud data governance by flagging unwanted activity at every level.
To reap these rewards, there are four key steps businesses need to take to set up an Anomaly Detection system.
- Defining the metric
Businesses first need to define what it is they’re monitoring for, such as cost or usage. Additionally, they need to establish how granular they want to go and how often they want the system to observe costs, be that hourly, daily, weekly or monthly. The final step is setting the period over which the evaluation is done, which determines the size of the data set. Shorter periods will draw out subtle volatility that may not otherwise have been seen but overstate the significance of small changes. Longer periods will provide greater context and rhythm, but won’t be as sensitive to changes.
- Data preparation
Next, a system needs to be created that presents a business’ cloud usage data into a readable structure, such as by aggregating and aligning the source data according to the previously defined scope and observation frequency. In preparing data like this, the quality of the source data should also be assessed with questions like, "does it update frequently enough for our needs?" and "is the data accurate enough?".
- Analysis on repeat
Data analysis can be conducted using rules, statistics or modelling, each with the purpose of identifying whether a sample is anomalous or not, as well as highlighting its impact and duration. The accuracy of this part of the system is crucial and should be built based on the unique considerations and preferences of each business. To be done right, a feedback loop should be put in place that enables constant refinements to be made to the analysis system.
- Reporting strategy
Finally, a reporting strategy needs to be defined that draws stakeholders’ attention to the alerts that matter without risking alert fatigue, which can be done via email, Slack or other channels. This is a delicate balance between a system that’s too sensitive and provides a high number of false positives, and one that misses anomalies by not being sensitive enough. It’s up to each business to decide what works best for them.
After taking these steps, businesses should have a solid Anomaly Detection system in place -- but their work doesn’t stop there. To keep up with their evolving cloud data, environments and costs, they will have to constantly maintain and tweak their systems even after they’ve been built. However, those businesses that either build their own or invest in an out-of-the-box Anomaly Detection system will be automatically alerted to cloud cost anomalies, allowing them to spend their valuable, finite resources on rectifying the underlying cause rather than constantly monitoring for issues. This will put them light-years ahead of those that are carrying on with their eyes closed until their next cloud bill arrives.
Photo Credit: Andy Dean Photography/Shutterstock
Matan Bordo is Product Marketing Manager, DoiT. Matan has been with the company since 2020. In this role, Matan works as a product expert, educator and collaborates with DoiT’s engineering, leadership, sales and customer success teams to raise the profile of the company’s product portfolio. Prior to this, Matan worked in venture capital and then in product marketing at superQuery, which was acquired by DoiT.