Cloudera and Accenture demonstrate strength in their relationship with an accelerator called the Smart Data Transition Toolkit for migration of legacy data warehouses into Cloudera Data Platform
Accenture’s Smart Data Transition Toolkit
Data warehousing is the backbone of every data driven organization, providing mission critical analytics. Today, modern data warehousing has evolved to meet the intensive demands of the newest analytics required for a business to be data driven. While this “data tsunami” may pose a new set of challenges, it also opens up opportunities for a wide variety of high value business intelligence (BI) and other analytics use cases that most companies are eager to deploy.
Traditional data warehouse vendors may have maturity in data storage, modeling, and high-performance analysis. Yet, these legacy solutions are showing their age and can no longer meet these new demands in a cost-effective manner. The key questions that need to be answered are:
- Do you have workloads you wish would run faster, but you just can’t make it happen without an expensive solution from your existing data warehouse?
- Are you looking for your data warehouse to support the hybrid multi-cloud?
- Are your business users asking for new analytics that just can’t be done, or done efficiently, in your existing data warehouse?
- Are you looking to include log, semi-structured, or sensor data in your analytics?
- Are you looking to be able to scale your data volume to a petabyte or more?
- Do you need to onboard thousands of new analytics users and hundreds of new use cases without impacting performance?
If you do not have answers to the above questions with your existing data warehouse, then you might prefer choosing a Cloudera Data Platform Data Warehouse (CDW) solution. Cloudera Data Platform (CDP) Data Warehouse enables IT to deliver a cloud-native, self-service analytic experience for BI analysts that goes from zero to query in minutes. It outperforms other data warehouses on all sizes and types of data, including structured and unstructured, while scaling cost-effectively past petabytes. Running on CDW is fully integrated with streaming, data engineering, and machine learning analytics. It has a consistent framework that secures and provides governance for all data and metadata on private clouds, multiple public clouds, or hybrid clouds.
Accenture, one of Cloudera’s premier technology partners, looked at this opportunity jointly with Cloudera and built a framework of tools called the Smart Data Transition Toolkit. This toolkit helps customers migrate their legacy data warehouses into CDW. The Accenture Smart Data Transition Toolkit simplifies the movement of data from expensive, inflexible legacy data platforms into the CDP.
Accenture’s Smart Data Transition Toolkit – A Deeper Look
Accenture’s Smart Data Transition Toolkit leverages six proprietary accelerators to reduce the cost of CDP migration by as much as forty percent (40%). Each of these accelerators support multiple legacy systems, including Teradata, Netezza, Oracle, etc. The Accenture Smart Data Transition Toolkit is also tightly integrated with Cloudera Data Platform for cloud data management and Cloudera Shared Data Experiences for secure, self-service analytics.
Below is a description of the various elements of the toolkit (as shown above).
- Pulse helps in discovery and understanding the bottlenecks in existing legacy data warehouses
- Smart Schema Optimizer helps in migrating and creating schemas on CDW by leveraging Hive Metastore. These schemas will be created based on its definitions in existing legacy data warehouses
- Smart Query Convertor converts queries and views to be made compatible on CDW
- Smart DwH Mover helps in accelerating data warehouse migration
- Smart Data Validator helps in extensive data reconciliation and testing
Here is the flow of events during migration by leveraging tools from Smart Data Transition Toolkit.
Accenture’s Smart Data Transition Toolkit Integration with Cloudera Data Platform (CDP) Data Warehouse
Let’s take a look at how Accenture´s Smart Data Transition Toolkit is integrated with CDW. In the initial phase, Accenture has built an integration with CDW to migrate legacy data warehouses like Netezza, Teradata and Oracle. If there are any other legacy EDW to be migrated, it’s easy to incorporate them into Accenture´s Smart Data Transition Toolkit as a source for migration into CDW.
CDW provides the flexibility to store your data anywhere either on Cloud or on-premise. The flexibility can also provide you with a variety of options to store your data which you can migrate from legacy EDWs. If you choose to run CDW on a Public Cloud infrastructure, then you can store data in either Amazon S3 or ADLS depending on the chosen Public Cloud infrastructure. If you choose to run CDW on-premise, then you can store your data either on HDFS or Ozone object store built for on-premise.
The data from your existing data warehouse is migrated to the storage option you choose, and all the metadata is migrated into SDX (Shared Data Experiences) layer of Cloudera Data Platform. Once the data is on Cloudera Data Platform, customers have the flexibility to deploy CDW either on a public cloud or private cloud to meet all use case requirements. CDW is a managed data warehouse service that runs Cloudera’s powerful engines (Impala, Hive LLAP) on a containerized architecture to let you meet SLAs, onboard new use cases easily, and minimize costs.
Some of the key benefits of Accenture’s Smart Data Transition Toolkit on Cloudera Data Platform Data Warehouse are as follows:
- Migration of legacy EDW into CDW
- Consideration of both data & metadata in the migration
- Easy UI based migration with native integrations
- Provides flexibility for customers to choose either Hive or Impala for SQL engine
- Tight integration with SDX (Shared Data Experience)
- Supports all deployment flexibility (Public Cloud, Private Cloud, Multi-Cloud and Hybrid)
- Validation of results for consistency checks
- Supports both Data Warehouse Experience & Data Warehouse with Data Hub Clusters on Cloudera Data Platform.
Case Study: Accenture’s Experience on Legacy Data Warehouse Migration into Cloudera with a Health Insurance Company
Business Problem & Background
The client decided to migrate away from their relational database-centric Enterprise Data Warehouse as an ingestion and data processing platform after the maintenance costs, limited flexibility, and growth of the RDBMS platform became unsustainable with the increased complexity of the client’s data footprint. A modern data and NoSQL-based ecosystem, when integrated with elements of the existing RDBMS data warehouse platform, provided the client with the scale and flexibility to meet the organization’s hunger for data, data-based analytics, and more integrated views of their members:
- Internal analysis showed that over 80% of the processing time in the EDW platform was on data ingestion and preparation tasks – these functions migrate to a modern data platform at substantially reduced costs.
- Due to the high storage cost in the legacy EDW solution, 100% source data capture proved cost-prohibitive – this led to continuing and costly change cycles to load incremental source updates as business requirements changed.
- The legacy platform could support daily load cycles at best, not meeting business demands for shorter availability in critical use-cases.
- On-Premise Cloudera deployment
- Separated Big Data cluster from other programs for Data Science / Discovery to isolate workloads
- Migration of historical data from EDW Platform
- Mainframe CDC using IBM Infosphere Data Replicator (IIDR)
- Relational CDC using Oracle Golden Gate
- Ingested over 2,000 source system objects
- Complex security views configuration supporting regulatory and internal access controls
- Leveraged delivery accelerators as well as a Data Quality framework customized by the client
- The centralized complete views of verified and data-quality validated source system data within the Data Fabric helped the client streamline both security and data integration efforts across their internal application footprint
- The program leveraged changed-data capture (CDC) components for mainframe and relational systems to capture source system updates in near real-time
- Data updates supported batch and near-real-time use cases as required by the business timeline – one use case provided end-to-end data availability from the source in as low as a few seconds
- The program enabled ingestion of over 80% of the original EDW source loads in the first year, including over 1,200 table objects just for the EDW migration scope and 500+ tables for additional program value not supported by the EDW
The Cloudera and Accenture technology alliance combines Accenture’s deep industry experience, analytics skills, and global delivery with Cloudera’s Data Platform (CDP) to increase enterprise-wide data visibility, reduce data management costs, manage risk, and address compliance requirements. Together, Cloudera and Accenture provide a complete solution for transforming data into clear and actionable insights. We deliver on proven technology on-premise or in the cloud, globally. Clients benefit from a seamless and rapid delivery use cases by combining the expertise and scale of both companies. If you have any challenges managing your legacy data warehouses, the Cloudera-Accenture technology partnership can help to solve those challenges to get your analytics up and running on a modern cloud native platform – CDP Data Warehouse.
To learn more about CDP & the Smart Data Transition Toolkit:
Nandhini NR , Cloudera Practice Lead, ATCI
Rajeev John, Product Owner, SDTT
Aniruddha Ray, Data Capability and Innovation Lead, ATCI
Copyright © 2021 Accenture. All rights reserved. Accenture and its logo are trademarks of Accenture.
This document is produced by consultants at Accenture as general guidance. It is not intended to provide specific advice on your circumstances. If you require advice or further details on any matters referred to, please contact your Accenture representative.
This document makes descriptive reference to trademarks that may be owned by others. The use of such trademarks herein is not an assertion of ownership of such trademarks by Accenture and is not intended to represent or imply the existence of an association between Accenture and the lawful owners of such trademarks. No sponsorship, endorsement, or approval of this content by the owners of such trademarks is intended, expressed, or implied.
Accenture provides the information on an “as-is” basis without representation or warranty and accepts no liability for any action or failure to act taken in response to the information contained or referenced in this publication.