Data Security in the era of AI

By John Reeman | March 18, 2024 |

Data Security, DLP, DPSM, and AI

Data Loss Prevention (DLP) solutions have been around for over a decade. Back in 2006, I remember deploying Vontu, a DLP pioneer before it was acquired by Symantec, into several large global investment banks as well as more mainstream banks and retail businesses in the city of London and across Europe (France, Germany, and Spain).

A lot has changed in the world since those days. Big data came along, SIEM, CASB, and other technologies went through the usual hype cycles, markets and tech consolidated, the Cloud went through a massive boom period, new devices such as the iPhone appeared on the market, and in the last 12 months AI has reappeared as the new kid/buzz word on the block, although spoiler alert, it's not that new, with origins from over 50 years ago!

Since then, I've moved countries, founded a software company, become a consultant again, had the opportunity to be a CISO for a global law firm, and am now back full circle to running my own cybersecurity consulting practice for the third time in my life.

In this article, I'm going to focus on data security and how organisations need to rethink how to secure it, particularly in the modern world and now that we have the big beast of AI everywhere.

Data Loss Prevention (DLP)

The traditional approach to securing your data with DLP solutions was to look at:

Data at Rest
Data in Motion
Data in Use

Although this was meant to discover insider threats, it has, over time, been used as a yardstick and measurement by external auditors to ascertain how organisations monitored for external data leaks. At a rudimentary level, this was targeted at email and web usage to understand employees' risky behaviors. While it may have answered the who, how, what, and when questions, it did little to put preventative controls around the data and secure that data adequately enough to prevent data leakage.

Data Security Posture Management (DSPM)

Being proactive and prepared is an essential approach to safeguarding your data.

DPSM goes beyond DLP to essentially provide five critical core capabilities:

Data discovery - Automatically find and classify sensitive, regulated, critical, and dark data. Uncover shadow data and build a dynamic inventory for all your data everywhere.
Creating a data map, mapping user access to data - Track who has access to sensitive data, and automatically generate a map of user access. Reduce permissions to least privilege and eliminate overprivileged access issues.
Tracking data flows and connections across and outside of the organisation - cross-border transfers, data residency, monitor data processing & sharing, secrets in dev data.
Protecting against data exposure - Automate and orchestrate remediation for high-risk data with trigger alerts and workflows based on activity. Remediate data security issues by enforcing controls over your sensitive data.
Assess and report on data security posture - Get in-depth reporting on potential data risks, policy violations, and data security vulnerabilities easily. Monitor your data through assessment scores and reports to protect against unauthorized exposure.

Not knowing and holding data for excessive periods of time

Today, most organisations, when asked where their sensitive data is or who has access to it, "don't know." If you are an organisation conducting identification checks, where are you storing that data, and for how long? Ask yourself if you need to keep the data for an extensive period. Not having to keep data for extensive periods of time could help reduce your risk and overall liability.

AI and GEN AI

In the last year, generative AI has emerged as a new game changer across many industries (legal, tech, government, manufacturing, pharma, retail, and more). It enables machines to create content, imitate human intelligence, and solve complex problems autonomously. Organizations must embark on a journey of data preparation and automation to fully harness the potential of generative AI, ensuring that their data is governed, labeled, and compliant with ethical and regulatory standards.

The growing volume of data and the propensity of its use means that organisations can no longer rely on traditional and manual data processing methods to manage unstructured data. The only way to manage data in the future will be with automation, ironically AI.

So, to adapt and innovate, organisations must:

Control what data can be shared, by whom, to which LLMs or AI applications
Audit and inspect what data is being shared with LLMs & AI. Based on privacy, sensitivity, regulations, and access
Build out policies for data usage for AI
Enforce or be alerted when policies are breached.

Use Cases

Below are just a few simple use cases for data security posture management:

Data discovery and classification - Automatically find, map, and classify your data everywhere
Cross-border transfers - Create policies for GDPR and other data transfer regulations
Cloud migrations - Manage identity data to the cloud and in the cloud
Compliance and Governance - Meet data privacy and protection regulations
Data Privacy Automation - Proactively managed privacy requests, preferences, regulation timelines, deletion workflows, and remediation all in one place.

Conclusion

Organisations need to become more aware of the data and sensitivity of the data they hold, the need to keep that data for X years, and fundamentally, the triad of security, namely confidentiality, integrity, and availability, needs to be upheld.

A new approach that goes beyond DLP use cases and wraps a bubble around the management, access, and changes to the data for its entire lifecycle is required. This approach needs to map sensitive data into clusters so that if, as an organisation, you were to succumb to a data breach, you can very quickly identify where your sensitive data is, have effective Know Your Customer (KYC) procedures and be able to pull together a Notifiable Data Breach Report (NDBR) quickly when you need to in the moment of a crisis.

Forward-thinking organisations and certainly those that hold vast amounts of unstructured sensitive data, such as Law firms, Banks, and Healthcare, should start evaluating the risks from exposure of their data through AI Tools (ChatGPT, CoPilot, Gemini, Hugging Face, and more), third-party sharing, and cloud APIs.

This should be looked at, so as not to disrupt or stop using these technologies but from the viewpoint of mitigating risk to gain a competitive advantage over other market players.

Posted in AI, Cyber Security, Cyooda Security, How-To, Uncategorized