The next trending topic to come under our examination is the re-emergence of security data lakes – and the role these data lakes play tackling security.
Telemetry data from your infrastructure is not new data, and once would have been stored in your Network Syslog Server. But storing it in this way makes it hard (if not impossible) for any kind of advanced analysis – looking for those tell-tale signs of malicious activity that might have gone undetected previously.
By storing this data in its raw form in a security data lake instead, you’re essentially ensuring that the data can be made available for the increasing range of AI-based applications that can make use of it, boosting your cybersecurity effectiveness significantly. It’s worth mentioning that this topic has links to our previous post on AI Ops as it’s all about making the best use of the data you already have, but storing it in a form that these AI tools can use.
Data lakes have been powering business intelligence for years – is it finally time to also extend their use to security data?
Data lake features – value add or trap?
In talking about security data lakes, we’re essentially talking about cloud which then leads the discussion towards the challenges of vendor lock in. The unfortunate reality is that you can’t head down a security data lake path by following a vendor agnostic model. You do need to first pick the security vendor you want to work with and then look to utilise all their capabilities.
In terms of data lakes for security, it comes down to the vendor you choose and the processing power of the cloud. Data analytics, machine learning and additional layers of smarts within each vendor solution are unique, and very customised. These features are all designed to display the sheer processing power of each individual solution.
On the other hand, while these cloud-based solutions have so many smarts and are powerful processors, the specific features of each vendor’s cloud can lock you in and make going to another vendor in the future prohibitive. At some point, we may see a vendor independent solution in this space, but right now, you do have to embrace the idea of vendor lock in. For this reason, you need to think more about your overall infrastructure first and the third-party apps or tools you use. Then choose the vendor that best meets your needs there, whilst ensuring your tools integrates with that vendor’s security data lake.
This is a term that gets bandied about a lot in IT, but in this case if you do decide to go down a security data lake path, you are essentially building the foundations for advanced cybersecurity capabilities in years to come. In particular, the goal of real-time security analytics.
While there are storage tools like AlienVault (now AT&T Cybersecurity) and Splunk that can store data and provide a lot of capability right now, it’s all retrospective. The real value in this space is moving towards the goal of real-time security analytics. The ability to analyse the data as it is being generated, with AI tools piecing things together and making sense of activity to make decisions and recommendations as things happen, rather than retrospectively.
At this point in time, security automation via data lakes is only available in the cloud. So, the first potential roadblock for organisations is whether you’re comfortable (or allowed) to store this kind of sensitive telemetry data in the cloud. What data governance restrictions exist in your organisation and do you know exactly where it will be stored?
There is a potential future where on premises solutions may be offered with limited functions and capabilities for the most popular cloud-based data lakes, but for now it’s all cloud.
Even with this slight caveat, the future for security data lakes is starting to look exciting. If we then consider the potential for open, vendor agnostic cloud-based and on-premises solutions where there are less constraints on migrating data between clouds and vendors then we feel this is something you need to be thinking about in the near future, if not right now.
What’s the score?
For this reason, we’re giving it an MIS score of 4 – Important: You definitely need to know about this – now.
Even with this score, determining whether a data lake is something you actually need right now is not straightforward and as we’ve mentioned there are several considerations to take into account including:
- 1. While moving to the cloud can mean a lesser investment upfront, and a quicker path to achieving ROI, storing your data in a cloud data lake may be prohibitive in the future should you wish to change providers.
- 2. If cloud-based storage is even a palatable option for your organisation and their data governance policies
- 3. Regulations in your industry or region for how the data is stored.
In the meantime, when looking at vendors for your security automation, think holistically first and what can be achieved either on-prem or cloud. Prioritise what your organisation is looking to secure and focus on that before then considering what the vendor can offer in terms of data lakes, cloud and AI Ops offerings. While these features are still something that you need to consider in terms of your security automation, it’s better to not make them the primary reason for selecting a vendor.