Improving the ROI of SIEM, logging and security monitoring

Major security incidents are very rare in most organizations. They are typically a three or even six sigma event, way down on the long tail of possible events. Like most other events that fit this profile (e.g. a correlated fall of 10% or more in the share market) they can also have a major impact on an organization even leading to bankruptcy. As the global financial crisis showed us though these events, called Black Swans by Nassim Taleb do happen and often not in the way we think they will. Unlike financial markets, while they seem ideal for risk transfer, insurance will not help restore your brand.

This is the reason that most companies invest money in a security department, process and technology (OK it is really regulation but lets say it is partially prevention, detection and recovery from security incidents also). Security monitoring is often seen by regulators and auditors as a key control in a holistic and effective information security strategy. However it is difficult to prove the Return on Investment (RoI) on security monitoring for most organizations, and even Risk Reduction on Investment (RRoI) is difficult to quantify. This post is about some simple strategies to improve that visible RoI.

Cheaper centralized storage

Logs are the lifeblood of every security monitoring system. Every system you have generates logs. These logs take storage – both in online systems as well as backups and archives. This storage costs money. These logs also get overwritten if they are not sufficiently configured and archived which means you lose valuable information. Addressing both of these issues is a way to improve RoI.

Log consolidation is not a new field, however at least anecdotally no company I have worked at does this well. This is mainly because the organization and IT systems were never built from the ground up to support such a thing and even new systems continue to follow what legacy systems do.

The logic is simple – rather than each individual system storing logs have them export these logs to a central log repository. Most systems support standard export mechanisms e.g. syslog, write to file and ship, write to database and export, web services. This means the local storage allocated to logs can be significantly reduced.

However does this not just shift the volume to the central store? A simple answer is yes. However when storage is consolidated in a central store there are more ways of reducing costs, the simplest being:
  • Bulk discounts purchasing larger amounts of central storage rather than paying for a smaller amounts per server
  • Suitable for purpose - the storage can be lower performance than what you need to run your mission critical systems thus lower cost, you may not need backup and DR for the archives or at least not to the same extent as your primary servers
  • Apply intelligence to retention – e.g. failed or negative events are useful for a shorter period of time than success or positive events. E.g. a large number of failed login attempts may indicate an attack in progress but this is only useful if you act on it quickly. There is no point keeping these raw events for a long time - although you may keep the meta data or reports for trending. Success event on the other hand: user Joe Blogs logged into Y system at X time, is useful for a lot longer in performing forensic analysis. If your logs are centralized it is easier to keep success events for say 12 months, and fail events for 30 days, or if you cannot be bothered doing that at least keep all for 30-90 days and archive
  • Easier to just keep meta data – log consolidation tools such as Splunk and most SIEM tools such as Arcsite and RSA Envision allow you to only keep the meta data or index of the log information, the rest can either be discarded or archived on cheaper storage
  • Cloud storage securely. You can easily purchase cloud storage, create an encrypted volume with Truecrypt and pump your archived logs to there. You can also benefit from compression of archives in particular – a 10 / 1 ratio is a reasonable expectation.  Cloud storage can be a lot cheaper than enterprise NAS, SAN or physical disk. E.g. if you store over 50 TB you can pay as little as  $37 month for Amazon S3, $55 if you want 99.999999999% durability. In comparison for example, the National Institute for Health (NIH) gets charged $750 per TB / month. How does this compared to your costs?
It is a win-win, you get the centralized logs you want for correlation and monitoring, IT and Business management cut their storage costs. However make sure you price the additional network costs of transfer e.g. if you have highly geographically distributed infrastructure it maybe cheaper to pay the upfront cost of having local collectors which aggregate and filter the raw logs and archive locally rather than transferring everything across expensive WAN links. Your Riverbeds or other WAN optimization should assist greatly with log traffic though as they compress well. QoS is is also invaluable in ensuring logs do not flood your network bandwidth.

On-boarding

I have seen many SIEM projects start so well only to delivery nothing or very little a year down the track. One of the main reasons for this is the cost and time taken to on-board systems. Security monitoring has a tipping point – unless you have sufficient coverage and sufficient you have a hard time proving value.

The problem with many SIEMprioritizations.

The reason I like Splunk is this is avoided, being effectively a Google for logs you can point virtually any log at the system without performing any mapping beforehand and simply use its Google type engine to search through it. This is what makes Splunk such a good log aggregator and log repository which you can then sit a SIEM over the top of to get the correlations, workflow and alerts.

If you can for example onboard all your systems in one month or even one quarter which is not unreasonable with Splunk you have a much better chance of being able to demonstrate value with your monitoring project

Smarter correlations

Interestingly on-boarding systems that you are actually not going to monitor can sometimes be more importantly than attempting to get 100% coverage. This is because the most valuable correlations are with data such as:

  • HR systems – leavers and movers data is very useful. E.g. if you correlate data sent externally from your proxies or email servers or even better data loss protection system with people in the last two weeks of employment you have an useful data leakage risk reduction mechanism
  • Change systems – correlating the approved username, system and approved timestamp with login information from actual systems enables you to detect unauthorized change and provide a deterrent for changes that bypass change controls
  • Anti-malware systems – correlate with firewall logs and proxy logs to work out if the malware detected on your endpoints is attempting to dial home or actually succeeded in being part of botnet
  • CMDB systems – data on the importance and dependencies of systems can be very useful in prioritizing an alert and remediation action

It is also important to build a baseline of activity e.g. the admin users that SSH to the critical database servers: what is the median number of connections, from which IP’s, which devices, which locations and which times. Once you have a good baseline and standard deviations it is easier to build value adding alerts that detect two sigma deviations.

As I discussed in my metrics post, it is also very important to build a Red, Amber, Green (RAG) measurement for all your alerts and a defined procedure of what exactly will be done about it and by whom. This should include the steps to investigate and resolve, not just ah… we will investigate it and escalate. How will you do this? Who will do each step? Break it down so that it is clear.

Spending time even getting one of these done each month end-to-end will enable you to show value from your monitoring quickly.

Resourcing

This is an emotive topic, and there is strong temptation to outsource security monitoring buying it in as a service or to use offshore resources. This is mainly because security monitoring has built a reputation of being a tick-box exercise for the regulators and not really adding any value. Viewed as commodity it is easy to make the case for off shoring or outsourcing as a cost reduction. There is also a perception it is difficult to hire and motivate skilled onshore resources in this area.

All of these can be true. However, hopefully the other steps I have discussed above will assist with showing value as well as providing a more challenging role for skilled resources. The main problem I have seen with outsourcing monitoring is that when the managed service SOC detects an alert all they can do is call you or call someone in the organization. So you have not really reduced the resources that actually make a difference. Off shoring can have a similar problem, if you only get resources that can follow a basic set of instructions, you still need someone else to take their call or escalation.

A better approach is to spend the resources in an engineer who can setup an alert end to end, even an small number of these are better than a team that supposedly monitors for everything but does very little with the results. With smart correlations and scrooge like design on what actually requires human attention you can greatly reduce the actual resources you need while actually providing a lot of value with monitoring.

Like Timothy Ferris describes in his book the Four Hour Work Week your focus needs to be eliminate then automate. Elimination can be achieved by focusing on a specific risk you want to detect e.g. users in the last two weeks of employment that are taking company confidential information with them. You can then build a specific correlated alert that provides a high degree of signal to noise. Once this is proven to be working you can automate the actions e.g. send the alert directly to HR or compliance in the local office that can actually do something about it. A security person does not actually then need to be involved in operational monitoring at all, metrics can also be monitored and reported on automatically to management proving the value of your monitoring.

Conclusions

I believe that security monitoring can be an effective detective control and provide risk reduction benefit. The key is to do it smart and show the value in what the business and IT management care about: reduced cost and tangible proven risk reduction.

No comments:

Post a Comment

Author

Written by