Pillars of Knowledge, Best Practices for Data Governance

Author Chris J. Preimesberger is Editor Emeritus of eWEEK

With hackers now working overtime to expose business data or implant ransomware processes, data security is largely IT managers’ top priority. And if data security tops IT concerns, data governance should be their second priority. Not only is it critical to protect data, but data governance is also the foundation for data-driven businesses and maximizing value from data analytics. Requirements, however, have changed significantly in recent years.

Data governance used to be considered a “nice to have” function within an enterprise, but it didn’t receive serious attention until the sheer volume of business and personal data started taking off with the introduction of smartphones in the mid-2000s. When data collection became out of control, storage costs spiraled quickly up and new privacy regulations emerged, IT managers and their data-watching lieutenants realized that governance was quickly becoming a requirement.

Today, modern data governance programs must address not only an exponential increase in data volumes, but also cloud, hybrid architectures, and more reliance on advanced analytics. Trends toward more self-service data access and real-time applications have made data governance more visible and critical than ever before.

But it’s still not easy. Data governance has always required a combination of people, processes and technology to work. The good news is that technology has evolved significantly, offering new advantages to companies that commit to data governance best practices. While early technology focused on one capability, today’s tools consider the entire data and analytics lifecycle.

There is now more emphasis on classifying and cataloging data as it’s collected, so that nothing is forgotten or left unsecured. Data governance platforms are much simpler to navigate, and automation has taken most of the drudge work of classification out of human hands. As a result, it’s easier to find people to handle highly routine-oriented data governance positions. The most effective data governance software, such as Cloudera  Shared Data Experience (SDX), discovers, maps, and classifies an enterprise’s data correctly with each entry.

Effective data governance must extend beyond the IT organization. All employees need to be aware of the data they are creating on a daily basis, how to store and access it securely, and how to maintain that same security when working with outside partners, customers, and contractors.

Top Pillars and Best Practices for Instituting a Realistic Data Governance Programs

There’s not one right way to institute or advance data governance programs. In fact, it can take many forms. For example, Deutsche Telekom’s Data Intelligence Hub provides a global data marketplace for securely sharing, analyzing, and working with data across diverse industries.  

It may seem daunting to overcome the challenges of making data available at scale to users across multiple environments in a safe and compliant manner, but it comes down to several key pillars and related best practices.

  1. Discovery: Find and classify all your data in order to create visibility

All businesses first need to know what data they have and where it’s all located. This is essential to delivering data-driven insights. Factors such as siloed platforms and the absence of centralized data stewardship all regularly contribute to a lack of data visibility. 

A 2019 Experian study showed that 95% of organizations see negative impacts from poor data quality, resulting in wasted resources and additional costs, ineffective business initiatives, poor customer experience, and delayed data migration projects. Organizations are facing a data tsunami with more data being generated than ever, making it even more difficult to discover, catalog, and keep track of all this information. The faster data is cataloged, secured, and made available to the right people, the faster future business success will happen.

  1. Classification: Data types are key to the process

Classification of all data types is a crucial step in this process. Without knowing what the data is supposed to represent (example: Are all nine-number strings social security numbers?), we won’t know what constitutes valid data and who is allowed to see it in which form—for the designated window of time. 

A top-notch system will include an easy-to-navigate data catalog that provides a single-pane view to administer and discover all data assets. The data is profiled and enhanced with rich metadata—including operational, social, and business context—creating trusted and reusable data assets and making them discoverable.

  1. Availability: Make the right data available only to the right users

Enterprises must give their users access to the data they need, or the business cannot function correctly. However, accomplishing this means taking into account the complexities of today’s IT infrastructures. 

Enterprise IT managers must be able to manage data accessibility across multiple platforms, environments, and geographies, each of which can come with its own data compliance and privacy regulations — such as the GDPR, CCPA and others. Specific types of data must also be made securely available to people outside the home organization, such as contractors and company partners. 

If you did your discovery as in Best Practice No. 1, you can associate data through classification with data access policies. The right people then always have access to the data in the right form. The result is that you can widen data access to more users for them to deploy more use cases to drive more insight, yet without having to ask IT each and every time they need a bit more.

This all requires a proactive governance strategy. It’s no longer enough to react to regulatory and compliance requirements as they arise. Instead, enterprises must drive business value through consistent and comprehensive access policies.

As organizations look to maximize the value of their informational resources, they are realizing that highly proactive data-driven business functions also depend on a foundation of well-managed data. The aggregate population of data consumers has grown larger in nearly all organizations and includes a higher number of data consumers that have less-technical skill sets. Consistent data-access policies, data curation, and overall data quality are important for successful results in such an ecosystem. 

  1. Security: It must serve data throughout a system

DevSecOps staff and IT managers must gain control over the security of all enterprise data — whether it’s in a cloud store, residing on a laptop at a remote worker’s home office, or sitting in a conventional data center storage array. Old-school processes in data security vary by network, cloud versus on-premises storage, encryption method or authentication type; various verticals, such as health care, financial services, scientific research, the military, and public sectors have their own regulatory issues to address, meaning extra layers of security. Governance must be maintained through all these disparate layers, and that can be a challenge at times.

Disconnected security controls can restrict some users from accessing the data they need while inadvertently giving other users access to data that should remain private.

Enterprises are trending away from password security and instead are moving toward biotech (fingerprints, facial recognition) and multi-factor processes; these have been shown to cut down breach instances substantially. As a result, the biometrics market is estimated to be worth a staggering $49 billion by 2022 and huge investments are being made in the development of new algorithms and systems to improve biometric accuracy. Companies indeed are taking notice.

Ransomware is currently the biggest cyber threat to enterprises in 2021, and it remains the most challenging one to fend off. Ransomware depends upon a human mistake in clicking on a dangerous email, image or other document, and IT cannot control everything humans do. 

DevSecOps, data governance, MFA, biotech security—nothing—can stop a human from inadvertently clicking on an email that ends up giving control to a bad actor and endangering an entire organization. That requires governance of the human mind, and that’s far beyond IT security’s jurisdiction.

Cloudera SDX Makes All Data Secure by Design with Consistent Policies

Cloudera provides secure data governance for hybrid architectures that span both cloud and on-premises data workloads. The SDX platform is a foundational part of Cloudera Data Platform architecture, unlike bolt-on approaches to security and governance espoused by other vendors.

Independent from computing and storage layers, SDX features an integrated set of security and governance technologies built on metadata to deliver persistent context across all analytics as well as public and private clouds. Putting metadata to work here is like using an automated roadmap to locate the right information more quickly than having to scan large data loads from top to bottom.

Consistent data context simplifies the delivery of data and analytics. It uses a data-access model that can be used by several clients, yet is secure; data is identified once and applied everywhere it’s needed.

SDX is designed to reduce risk and operational costs by delivering consistent data context across deployments. IT can deploy fully secured and governed data lakes faster, giving more users access to more data, without compromise.

In conclusion …

A top-flight data governance strategy and toolset that works with various security processes has long been known to be the most effective and cost-efficient way to keep business data clean, safe, and up-to-date over time. Working from a basis of truth and safety is the only way an enterprise can do business optimally and with as little data-flow pain as possible.

For more information on Security and Governance with Cloudera Shared Data Experience (SDX), watch our demo.

Cloudera Contributors
Cloudera Contributors

Leave a comment

Your email address will not be published. Links are not permitted in comments.