While the word “data” has been common since the 1940s, managing data’s growth, current use, and regulation is a relatively new frontier.
Governments and enterprises are working hard today to figure out the structures and regulations needed around data collection and use. According to Gartner, by 2023 65% of the world’s population will have their personal data covered under modern privacy regulations.
As a result, growing global compliance and regulations for data are top of mind for enterprises that conduct business worldwide. These companies face a unique set of data governance challenges regarding infrastructure and compliance on local, national, and international levels. Some organizations are choosing to confront these challenges with the help of tools like machine learning (ML) and artificial intelligence (AI) to automate, streamline, and scale compliance.
“The scale of information that every company is bringing in has absolutely gotten massive logarithmic growth. People selling information. Whether that’s appreciated or not, a lot of companies are using information that they didn’t generate, that someone else did and now they have to take ownership of it.”
– From a recent episode of the TWIML AI Podcast
Adam Wood, director of data governance and data quality at a financial services institution (FSI)
Listen to the full podcast episode here.
“It is pretty impressive just how much has changed in the enterprise machine learning and AI landscape. Thinking back to the conversations I had in late 2019, early 2020, most of the mainstream organizations I was talking to, meaning not the Facebooks and the Googles of the world, had very similar machine learning and AI journeys. If the organization had any experience with machine learning, it was concentrated in some team that was tucked away in a dark corner somewhere that maybe had years of experience building out some niche use case like a fraud model at a credit card company or churn models at a phone company. For the rest of the organizations though, machine learning and AI were much newer ideas. And by the time we got to 2020, if an organization had experience with machine learning, it was largely through investments in what I call lab-types of environments.”
-From a recent Cloudera roundtable event
Sam Charrington, founder and host of the TWIML AI Podcast
As countries introduce privacy laws, similar to the European Union’s General Data Protection Regulation (GDPR), the way organizations obtain, store, and use data will be under increasing legal scrutiny. A rapidly evolving privacy landscape means organizations must weave solutions into business strategy and data architecture, which introduces challenges and disruptions for those businesses operating on a global scale.
For example, the concept of nationalism in data regulation means that countries might craft a different set of rules based on where data originates. If that data carries specific attributes, it can’t leave the country. These rules force global businesses to create and navigate a complex data infrastructure and architecture to become compliant. Most organizations piece together physical locations, hybrid cloud strategies, or a combination of the two as a solution. However, they still aren’t out of the woods when it comes to data governance challenges at the global level.
“There are still a ton of challenges associated with getting machine learning and AI to scale…as the portfolio of deployed models has expanded, we’re facing all these new questions about how to best create and manage reliable, scalable, and cost effective infrastructure to support the model life cycle. So questions like on-prem versus cloud versus hybrid clouds still linger, harnessing GPUs for deep learning, and advanced analytics still present significant, both technical and economic challenges for folks…of course, as the hype cycle continues, the expectations placed on data and AI teams have never been higher. And the pressure to get use cases to my market remains really, really high.”
– From a recent Cloudera roundtable event
Sam Charrington, founder and host of the TWIML AI Podcast
Common data governance challenges for global enterprises:
Setting up a multidisciplinary data team
What used to be known as the unicorn data scientist is now a team of individual specialists with clearly defined roles: data science, machine learning, engineering, and DevOps. Organizations need a multidisciplinary team to maintain, monitor, and regulate data compliance systems.
Infrastructure
Juggling local, national, and regional regulations across the globe for obtaining, protecting, and using data are often in conflict. Data governance at this level requires flexibility, agility, and automation that can be difficult for some to achieve.
Organizations also struggle with improving the creation and management of reliable, scalable, and cost-effective infrastructure that supports data model cycles. And while centralization of data is typically a good solution for simplicity, it ignites additional challenges for global governance.
Speed
Often if appropriate infrastructure is established, such as hybrid cloud, there are such enormous quantities of data to deploy to models that the models become more and more complex. And as complexity increases, latency ticks upward. Speed becomes necessary to maintain customer satisfaction and business operations.
The NVIDIA and Cloudera partnership helps turn what used to take days into minutes for data engineering workflows. Running Cloudera Data Platform (CDP) on NVIDIA GPUs results in a 5X+ performance at half the cost of an equivalent CPU-based system. When the RAPIDS Accelerator for Apache Spark on CDP Private or Public Cloud leverages NVIDIA-certified systems, it pushes performance boundaries, powers use cases faster, and reduces data engineering costs.
Deliver use cases to market
As data science moves forward, the development of new use cases will continue, and the pressure is on to deliver results quickly while remaining compliant at the same time.
Even though the goal is to use data, many organizations struggle to balance regulation with data usage and how to locate, store, and secure data so that it’s usable for creating data sets and models by data scientists.
“Governance used to be very inflexible. The regulations are changing more and more. New ones are being added to the table all the time. And the data science world has become incredibly flexible and needs to be moving fast.”
– From a recent episode of the TWIML AI Podcast
Adam Wood, director of data governance and data quality at a Financial services institution (FSI)
“So the majority of what we work on right now are ways to automatically detect and catalog sensitive information across the company and across the borders. There are different countries that we do business in and need to make sure every single security and privacy regulation is being followed down to the letter. What we’ve been able to do is bring solutions forward that enable the data science community to understand where information lives, what it means, how to access it, and how to do so responsibly using privacy management and consent management—to make sure anything that we’re using the information for is always in line with the regulations we face.”
– From a recent episode of the TWIML AI Podcast
Adam Wood, director of data governance and data quality at a financial services institution (FSI)
In a recent episode of the TWIML AI Podcast, host Sam Charrington discusses situations and solutions for these common problems with FSI director of data governance and data quality, Adam Wood. Sam and Adam met during a Cloudera Data Leaders Roundtable this past spring to discuss GPU-accelerated machine learning.
Adam and Sam discuss topics such as:
- Leveraging ML/AI for governance and automation
- Increasing speed with collaboration and reuse
- The future of governance tooling
- What data lineage means for regulation and data scientists
- The importance of scalable and automated processes to adhere to privacy standards
- Merging consent and privacy with the underlying data stores
- Tying together consent management and the data science community
This episode covers many more topics and is an insightful and thought-provoking listen for any organizational leader facing challenges of data governance and regulation on a global scale.
Listen to the full episode here.
“Governance used to get in the way. It can’t do that anymore. When you’re chopping at the revenue stream of your company, you’ve got to make changes.”
– Adam Wood, director of data governance and data quality at a financial services institution