Practical lessons learned from implementing a Data Loss Prevention (DLP) program

Last year I had a chance to be involved in implementing a data loss prevention tool (Symantec Data Loss Prevention) and also to evaluate the RSA offering.

This a post of my reflections on the experience and some lessons learned which will hopefully benefit anyone needing to this currently or in the future...
The starting point is why invest in DLP? I would be lying if I said it was not, at least partially, curiosity by the security team about what was the flavour of the year (not much unlike cloud computing - more on that in a different post). Of course officially it was justified by being linked to a regulatory/audit issue. My risk assessment (which it was doubtful was ever read by exec management but is brilliantly gathering virtual dust at the moment) at least said it was worth investing in DLP because:
  • Intelligence and discovery - we did not really know where our most important information was being stored, who was using it and where it was travelling around the network both internally and externally. Our surveys had said it was in email and share drives - so we kind of did know but we should really verify with our new cool toy
  • Data leakage prevention - a "high risk" was identified that along with fashionable industry trends that our people were constantly stealing our proprietary information. This would assist with that in theory
  • Process improvement - we would identify poor processes e.g. developer sending source code to work on an compile at home because they lacked a powerful enough system at work or lacked a laptop, a HR feed being sent externally that we had not secured, long forgotten but sensitive documents that were being stored on share drives which anyone could access
All of this was of course high probability and high impact therefore the money must be spent! Also it made great story for the security team actually doing something and thus hopefully justifying our existence and a nice tick-box for the regulators.

So we got the money, luckily the licence was "free" because the heads at Group had been treated to a very nice track day (I mean invested in the long term strategy of an effective security solution). So we just needed some resources (backfill and contractors with some highly dedicated FTE :), VM's and spare span ports.

We made the decision to scope it quite tightly, "just" one location, one internet egress point, one server for storage scanning, no endpoint agents (mainly because this part was not licenced). We were going to get coverage of corporate email through another project (which actually never ended up happening) so we decided to do internet (webmail - which was allowed, ftp, internet access and transfers etc). Also we would be really smart and not use the default policies (because we didn't have much PII or card holder data that we were aware of) and focus on targeting the highest risk BU's (usual suspects: HR, Finance, Legal, as well as some specific ones to our company).

The project was quite a typical IT project, at the start it seemed like it was never going to be able to be done in 3 months (the silly date we committed to exec management and the regulator). But amazingly we got it done - of course only the technology (which kind of worked - because we didn't have time or money to do a proper POC or Pilot and we were tied to the technology that group licenced), but no process or people - no surprise really that's the hard part which we all know we should do but it is really quite hard and not as much fun as playing with the new toy).

So I promised some practical lessons learned:

What we did well that you should also do:
  • Build a business case on a risk assessment - and sell it to senior and executive management, get their sponsorship and skin in the game (e.g. tie to their objectives or their commitment to the regulator). Even if its all made up at least gives you some defence when the eventual question on why did we spend money on this and what do you do anyway question comes up
  • Start small and scale up - I'm in two minds about this one, in principle I believe for all technology projects you should do a POC, a pilot and then a staged roll-out with clear gating and benefits & costs measurement for expanding the scope i.e. think strategically, implement tactically. For a project to succeed you should be able to break it down and implement it in phases, with ideally the largest being 3 months. The problem for DLP is that it has a tipping point - it is only effective when you have a high coverage of your monitoring points (i.e. email, internet, storage, endpoints). Therefore if you start small you will face this question of but I can just use this proxy and get around your monitoring etc etc.
  • Tailor your policies and target a small number of high risk BU's - don't believe they hype and marketing. No DLP system currently will work well out of the box. It will not make it you automatically PCI or Sox or HIPPA compliant. The majority of the out of the box rules are useless (worse than actually because they will spam you), the same goes for the response rules and the reporting. Most products are also not yet mature enough to give you a good workflow and a great UI. Also it is a very dark art to be able to build an effective policy that gives you an acceptable error rate on false positives and false negatives (60% CER is what you should aim for at the current maturity and capability of the tools). So again don't boil the ocean - aim for 1 policy every 3 months (that's right 1, any more is foolish). If you handle credit card information that is a no brainer and the regex is fairly mature - just make sure you do some googling and get the proper regex with also bank identifier numbers to improve accuracy). Unfortunately most products don't yet support a simple way of keeping this up-to-date e.g. RSS feed or web services. PII is also a good starting point - you should hopefully be able to sweet talk your way or use FUD to get a feed from the HR system
    What we didn't do but wish we had (note this is only theory upon reflection, it may or may not give you a better result but I think it will)
    • Use only index or fingerprint match policies - this is called different things by different technologies but basically it involves taking a hash of an actual document and then finding a match of this based on a % e.g. this document you just tried to email to your gmail account matched 50% of your customer list database. Resist the urge and the vendor advice to use key words, phrases, exact data matches (i.e. where you pull a variable set of records from a structured database such as a HR db) - instead put this into a file or a set of files, index this and search using that. You will get a lot less false positives and make your investigation a lot easier. Proven regex such as credit card numbers is probably the only exception to this rule, DO NOT use other regex such as source code
    • Focus on prevention not just detection - one of my axioms is that any process which does not require manual intervention is automatically better. Don't just put in a sensor to gather "intelligence" or just to take a look or because you are scared of false positives stopping business processes. Do it properly, implement the endpoint agent and the network and storage prevention tools, not the detection. Only implement just the detection if you have another means of automatically acting on it e.g. you have the Juniper Unified Access Control system or equivalent and based on an alert or log entry from the DLP system you can write a policy rule to e.g. block a switch port, block at the firewall or proxy, drop the user session. Learn the lessons of the advantages of IPS over IDS and do it smart. Again start slowly -do 1 policy at a time, test it, pilot it by applying it to a small number of users, then roll it out it. Focus your humans on being alerted on a block action and ensuring that it was legitimate - rather than the other way around of detecting only and requiring a human to investigate and action
    • Have a change control process and systems - seems basic but we didn't have the money or time to do it. Make sure you have a non prod, pre-prod and prod system - including the network tap/span. You must be able to effectively test policies and changes, pilot it on a small number of users before you push it to production
    • Only implement a policy when you can do it end to end - i.e. have the people and process. When you define a policy you must be able to document clearly and get approved:
      • what does good look like (i.e. what are we expecting to capture including estimated numbers e.g. 2 people a month sending out client lists via corporate email to webmail which you can check against actual), 
      • what should we not get (e.g. 100 hits a day) and what are we going to do if this happens, 
      • what specifically are we going to do when we have a hit (ideally this is automated as per above), 
      • a full RACI i.e. who is going to do it (ideally this is just monitoring), 
      • what metrics are we going to collect and report on (and how are we going to use this to prove value to senior management), 
      • who should get informed on hits (actual and summary), 
      • who owns this policy, 
      • how often will it be reviewed and by whom (automated via workflow), 
      • what is the source data and all its attributes (what systems, what interfaces, how often updated, how do we update/change it)
    • Endpoints are worth doing  - even thought it is a pain to deploy yet another agent to every endpoint globally it is one of the most valuable parts of a DLP system. It gives you all sorts of valuable capabilities including: you block very effectively and efficiently without requiring ICAP servers and all sorts of in-line integration, cover transfers to removable media, printing (yes this is a really easy way to steal data), copying from share drives to other share drives or to the local endpoint storage. It also has a visual end user impact - if you don't want to block at the start you can just start users providing reasons for doing actions that your DLP system detects. Endpoint DLP is s so valuable and gives you such easy coverage of email and internet access that I would suggest actually doing this first before network and storage.
    • Integrate with workflow, ticketing and SIEM - yes it is hard to do and maybe it is a release 3 item (9 months in) but to get and more importantly prove value from this investment, you need to integrate with something like sharepoint for the workflow (because currently most of the systems are not mature enough to provide a good workflow or integration out of the box). The same goes for ticketing and sending logs to your central logging system or SIEM.
    • Monitor encrypted traffic - put your network prevention box behind your SSL terminator e.g. Bluecoat proxy, F5. This will mean that you can examine SSL protected content (one of the easiest ways to get around the DLP system). Also a more advanced tasks (maybe release 7/8) is integration with your email encryption product (if you have one). With PGP Universal gateway for example it will accept email sent to it by the DLP system, it can then decrypt the email with the corporate Additional Decryption Key (ADK) and proxy this email back to the DLP system (or to a mailbox that goes past the DLP sensor). Another point for the endpoint protection you can examine any traffic either before the user encrypts it or after it has been decrypted by their browser or other software.
    That's probably enough rambling on DLP for now. Well done if you got through all that. Overall conclusions: DLP is quickly becoming a default and sexy security investment, if you have the basics in place (very few do) it can add a fair bit of value. Does it help you leapfrog? Maybe if you do it right.


    1. I'm still HIGHLY suspect about the real value that DLP adds to most environments. The main issue is that in the modern workplace there are just so many conduits to funnel information around, that unless you capture all of them the benefit is still there but it's limited to the low hanging fruit, ie the people that say are inadvertently breaking company policy. Now while that is of benefit in itself, I'd argue that particular problem should be solved elsewhere, and probably more easily eg access controls over the data.
      So if you're concerned about the average user on your system, I think there are better ways of approaching the problem, whereas if you're concerned about the technical people targeting information they want to get out, then your DLP has to be so pervasive that not a single method is overlooked, which right now is very difficult indeed.

    2. @sterror I agree with all of that except maybe that access controls are sufficient. One of the big selling points of DLP is that it allows you a control mechanism for everyone who legitimately has access to the data. But yes could you similar benefits with lower cost with a good IDS and Web app firewall - sure. In fact I would be suprised if the market doesnt mature for someone like Cisco or Juniper, or cloud email providers like Postini or Microsoft to buy or partner with a leading DLP vendor.

      With coverage it is the 80/20 rule though, and it is an arms race and there is a tipping point. People in general will do what is easiest and most convienent. So if you cover corporate email, internet, storage and endpoints (including removable media and printing) over say a 1-2 year project. Assuming you are still there and there is management commitment you will have at least the 80% covered. That's also why I said start with endpoint. You did remind me that I wanted to add SSL termination to the coverage point which I will do now.

    3. Hi RS,

      I've read this a bit late...

      I wish you had taken a look at our (Websense) solution. I won't go all sales-ie on you but some of the technical points you've made are covered quite well.

      I agree with you on the protect vs monitor approach, and definitely on the risk oriented business case - a DLP system is first and foremost a risk assessment and mitigation tool. From my experience, if a DLP project starts as a "cool tech, let's turn everything on and see what we get" project - it has far less chance of success than a "we have a business issue, let's apply this technology to understand and solve it" project. My definition of success is "turned on enforcement on at least part of the policy within 6 months".

      -- Arik

    4. @strerror you are absolutely right if you think of DLP technology as technology that's supposed to stop any and all malicious and unintentional data leaks from the organization.

      However, if you look at it as a tool that:

      1. Mitigates the huge risk you get from the daily activities of non-security-conscious people who use business channels (such as email, web) to communicate with people outside of the organization on daily basis with abandon and little oversight if any

      2. Makes it more difficult for the malicious people to get by with simply attaching whatever they want to an email or copying it to a mass storage device or whatnot, making them go out of their way to send it - is useful because it leaves only the industrial spies in the game and it might catch on to them when they try to do it the trivial way

      A smart and dedicated malicious attacker will get the data out. A DLP system will not stop them, and I don't know of a system that will outside of full on 1-on-1 surveillance at all times when they're handling sensitive information. It will make their life an order of magnitude more difficult though.

      -- Arik



    Written by