Machine learning is becoming a strategic perimeter for GDPR compliance
Privacy advocates have placed an unfair stigma on machine learning.
Despite what you may have heard through the mass media, ML is not some fiendish tool for invading people’s privacy. Regardless, now that European Union’s General Data Protection Regulation has taken effect, there’s an even stronger scrutiny of ML applications in target marketing, customer engagement, experience optimization and other use cases that touch personally identifiable information, or PII.
But in fact, ML is becoming a key element in how organizations manage compliance with GDPR and other privacy mandates. The core of ML’s role in GDPR compliance is in its use as a tool for discovering, organizing, curating and controlling enterprise PII assets across complex, distributed application environments.
In recent months, Wikibon has seen a surge in products that incorporate ML for discovery purposes into broader GDPR compliance solution portfolios. This is a key enabler for driving automated processing of data subjects’ requests to grant or deny consents on uses of their PII within complex data environments. It’s also essential for the transparent accounting on how their PII is being used and managed, as well as for issuing prompt notifications when that data has been breached.
Here are some noteworthy vendors of PII discovery solutions for GDPR compliance. In the following discussion, we call out the different GDPR use cases and deployment scenarios that each addresses:
- ML for PII discovery in a DevOps pipeline: BigID Inc. uses ML to continuously track changes in PII across production and development environments in the data center or cloud. Its BigOps uses ML to discover, contextualize and catalog PII across all data stores. It plugs into open-source DevOps environments such as Jenkins to automatically monitor changes to PII across the development lifecycle. And it uses ML to compare its data with suspected pirate database to determine rapidly where there has been a breach that requires prompt notification.
- ML for PII discovery to accelerate “right to be forgotten” processing: Loom Systems uses ML to analyze logs and unstructured machine data for immediate visibility into the IT environments. Its Sophie for GDPR has a “find my PII” feature that automates the collection of sensitive log data set, enabling rapid location and deletion of PII, upon data subject request, under the PII “right to be forgotten” mandate.
- ML for PII discovery at the network level: DB Networks uses ML to discover databases containing PII and automatically map how the information is being processed. Its DBN-6300 performs passive scanning on a network terminal access point rather than using active scanning, which can miss undocumented databases. It is available as a physical appliance or in an Open Virtualization Format and supports database management systems including Oracle server, Microsoft SQL Server and SAP Sybase ASE. The virtual machine supports VMware vSwitch, dvSwitch and a software-defined network platform configured to allow network tapping.
- ML for PII discovery for rapid remediation across hybrid clouds: Informatica LLC provides an ML-driven data discovery and remediation solution that helps enterprises to automatically discover new and existing PII and other data assets across hybrid clouds, identify and mask sensitive data, and perform risk analyses to determine effective courses of remediation. It embeds metadata-driven AI to provide data managers with recommendations for automating and accelerating privacy and security workflows. And it integrates with customers’ investments in existing Informatica solutions, including Enterprise Data Catalog, Informatica Data Quality, Axon Data Governance and Secure@Source.
- ML for PII discovery of sensitive data in alphanumeric and pixel–level digital formats: MinerEye uses ML to continuously identify, organize, track and protect PII and other information assets. Its Data Tracker uses ML to sift through enterprise data repositories at a byte level, and even uses computer vision, a form of deep learning, to do so at a pixel level. It can run these scans on archived information at rest of live data streams in real-time. It continually tracks vast amounts of PII, using ML to adapt and cover changes in form and file. It can identify and track sensitive data anywhere within the organization or out in the cloud. It can alert enterprise compliance administrators to suspicious data behavior, especially regarding assets of critical importance.
- ML for PII discovery and rollup in a virtual enterprise data catalog: Waterline Data Inc. uses ML to create a constantly updated virtual view of PII and other data stored in databases and other structured data stores within an organization. Its GDPR Data Management Application builds upon Waterline’s existing Smart Data Catalog, which helps business analysts find, organize and classify data without information technology department involvement. The GDPR-specific application assists data privacy officers and data stewards with issues specific to GDPR and other regulations by automatically identifying regulated subject data along with its contextual use and lineage. Integrated access control mechanisms can impose automatic processes to make data-compliant, as well as generate compliance reports and workflows that align with specific GDPR articles. Using ML, the platform can be trained to look for certain types of data, such as a policy or driver’s license number, and discover it across all data sets. The system can assist with risk assessment planning by comparing data types to those covered by GDPR, shortcutting a process that can take weeks in many organizations.
For insightful comments on GDPR compliance challenges, check out my interview with leading analytics consultant, speaker and author Bernard Marr at the recent DataWorks 2018 Berlin:
Image: TheDigitalArtist/Pixabay
Since you’re here …
… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.
If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.