This blog post was published on Hortonworks.com before the merger with Cloudera. Some links, resources, or references may no longer be accurate.
Last week, I had the opportunity to not only attend the Gartner Data & Analytics Summit in Dallas, TX but also deliver a talk on “Driving Digital Transformation through Global Data Management”. It was a very timely topic given that a major focus area in the event was Data Management. Digital Transformation, as a term, has been around the block several times over the last couple of decades. However, enterprises today are actively launching several digital transformation initiatives across multiple industry verticals. Most of these initiatives are either to enhance their customer experience or to boost operational efficiency of how they conduct their business or to impact their bottom line in a positive way. The fuel behind these initiatives is primarily — Data! In spite of efforts to gather all relevant data into a datalake or some flavor of it, enterprises realize that data can still reside across their on-premises data stores, on their multi-cloud footprint and even come from their streaming sources such as social feeds, clickstreams and edge devices. The challenge to not only look across all these data sources but be able to manage, govern, secure and analyze such data as addressed by Global Data Management (GDM). A company with a solid GDM strategy can deliver successful digital transformations.
In spite of revolutionary advancements in data storage and management tools, we are still hearing about the same old problems from enterprises – volume, velocity and variety. A simple lunch table conversation with another attendee at the event was interesting. He said that he works at a large payment processor (name withheld) and that they deal with over a billion transactions a day. They are currently setup on a data warehouse running on a popular relational database. They are using a reporting tool to generate dashboards against the data warehouse. However, the setup is struggling to scale and is proving to be expensive to manage and grow. They are interested in a big data solution to handle their growing storage needs and to scale up. They also believe that a good stream processing engine combined with a streaming analytics solution can handle their velocity of transactions to deliver actionable intelligence. Currently, their challenge is only with structured data. They haven’t even figured out what to do with their unstructured data but they do understand that the big data solution can address that challenge as well.
It is important to note in the above example that even though a modern data architecture powered by big data solutions can address the immediate needs of that enterprise, they still need to devise a GDM strategy for the following reasons:
- Metadata-driven data cataloging – Look at related data sets across all their data-at-rest stores as well as what is coming through the streams as data-in-motion. For example, if you are launching a Customer 360 initiative, you need to be able to access customer data that exists inside your cloud CRM, legacy systems, operational DBMS, social feeds, clickstreams, branch office data etc. You need a single pane view to gain such an insight. Vendors are starting to come up with solutions that allow for metadata tagging of such data assets. Data catalogs or similar data virtualization concepts will start to gain momentum in this space.
- Data life cycle management – Perform data life cycle management functions on such a diverse set of data sources. This may include but not limited to functions such as data backup/recovery, disaster recovery (DR) etc.
- Data Governance and Security – Govern and secure all the data across multiple data sources. Data Governance is an important topic not just with data-at-rest but also with data-in-motion. All data that enters the enterprise goes through a series of transformations for reasons of data quality (DQ), data enrichment or just augmentation to pass it in a different format for subsequent consumption. But, we live in a world of compliance and regulations. Take, for example, GDPR – it is important for enterprises to understand who has access to what data, who changed it, when was it changed etc. This level of security, auditability and accountability is necessary and can be delivered through a strong GDM solution.
- Data Intelligence – Now that you can look across multiple data sources with ease, the next step will be to harness the value of your data. Combine the power of looking at historical insights from your data-at-rest and perishable insights from data-in-motion to create actionable intelligence, powered by pattern recognition, complex event processing (CEP) and machine learning models.
Plan for Future Uses, Not Just For Today
Analysts are talking about the need for a logical data warehouse (LDW) or a data fabric that can look across all your data sources. This is the same as the single pane view that was described above. The current focus for such an LDW is primarily on data access and subsequent analysis. However, I encourage organizations to think beyond that with a proper GDM strategy and plan for other data management functions you will require to support your digital initiatives. 80% of data analysis is being done on only 20% of the data an enterprise holds. However, rather than worrying about how much goodness exists in the dark data, it will be pertinent to focus on channelizing the most critical and relevant data points to make your more key business decisions at the right time.
There are several data management challenges including dealing with multi-clouds, getting a common view of security and governance, and managing all data, regardless of type or location. The key is to design a future-proof architecture so that you can avoid any type of vendor lock-in. For such an architecture, a lot of Fortune 1000 companies rely on open source technologies. This not only gives you freedom of choice of tools or platforms but also guarantees that you don’t get stuck with any vendor for a long time.
So, go ahead. Choose your digital transformation initiatives and lead with confidence. But, please make sure that you have defined a good GDM strategy in place and have chosen an equally good GDM solution to ensure your success with your digital initiatives.