Service Management Group (SMG) offers an easy-to-use experience management (XM) platform that combines end-to-end customer and employee experience management software with hands-on professional services to deliver actionable insights and help brands get smarter about their customers. The XM platform, smg360, helps customers across verticals, including restaurants, retail, and healthcare, drive changes that boost loyalty and improve business outcomes.
With data at the heart of its business, SMG has for many years pursued the most cutting-edge data management technologies. As SMG continued to innovate, the scale, variety and velocity of data made its legacy warehouse environment show its limits. Given the prohibitive cost of scaling it, in addition to the new business focus on data science and the need to leverage public cloud services to support future growth and capability roadmap, SMG decided to migrate from the legacy data warehouse to Cloudera’s solution using Hive LLAP.
The case for a new Data Warehouse?
Cloudera’s Hive LLAP scales linearly at a fraction of the cost, on commodity hardware, in either public or private clouds. New data formats can be added in LLAP easily through the flexibility provided by Hive. Moreover, LLAP drastically reduces traditional Hive overhead when executing SQL, enabling near real time queries and ad-hoc analytics. LLAP operates on open columnar data formats like ORC which are often used by Data Science tools like Spark, seamlessly enabling AI and Data Science on the same datasets.
With increasingly stringent privacy laws like General Data Protection Regulation (GDPR), SMG’s has strict requirements regarding Personally Identifiable Information (PII) on behalf of its customers. In this case, Apache Ranger in conjunction with Hive LLAP provides fine grained authorization and access controls and centralized auditing on a multi-tenant environment.
The Offload Journey
SMG’s engineering team had multiple deep dive sessions with the Cloudera’s Hive LLAP developers during the initial phase of evaluating candidate technologies for its business needs. This helped the team identify the key architectural design elements of Hive LLAP, such as caching, and enabled their use cases and performance requirements. There were some key ingredients that made this transition successful.
Data-driven Proof of Concept
SMG’s team provided Cloudera with both the data and the workloads – after appropriate sanitization, so no sensitive data was exposed – for the peak usage hour on their legacy system. This was critical to the PoC in three ways. First, it ensured an apples-to-apples comparison of results between the legacy system and LLAP. Secondly, it ensured quick identification of performance or scale bottlenecks. Finally, it allowed Cloudera to test and QA any new features before releasing them to SMG.
Specifically, the busy hour simulation clearly identified that 95% of the legacy data warehouse queries could run on Hive LLAP with minor tweaks. By sustaining 30 Queries per Second (QPS) right off the bat, SMG was confident Hive LLAP could support the required concurrency with just a few optimizations.
The PoC established that Cloudera was well positioned to meet SMG’s scale requirements with little impact on query performance. Not only that, Cloudera’s team identified additional performance improvement opportunities and worked with the SMG to scale it linearly to 200 QPS while improving query performance by up to 2x over the next few months. Scalability is key as data volumes grow at SMG. The following graph shows how linear scalability in Hive LLAP was achieved as Queries per Second (QPS) were increased.
Detailed 18-Month Transition Plan
Experienced platform owners know that any critical and large-scale system transition requires substantial effort, time, and quite a few iterations. This transition was no different as this was one of the most critical business systems for SMG. Extreme care needed to be taken to ensure that there wasn’t a business disruption during the transition. Therefore, after the successful PoC, SMG developed an 18-month transition plan and built a parallel system on Cloudera Hive LLAP technology to run alongside the existing platform until the migration was completed in a stepwise manner.
Co-Development Partnership
Similar to any complex enterprise adopting a new product, SMG’s implementation of Hive LLAP needed things that had not been done by any other customer. But the close partnership, open communication, and Cloudera’s ability to address the requirements played a critical role in the success of this transition plan. This included partnering with Oalva – SMG’s Hadoop technology service provider and a proud partner and reseller of Cloudera solutions. Oalva brought years of big data, data warehouse and Hadoop expertise to the table. They advised SMG on best practices based on their experience with many Hadoop implementations across a variety of disciplines.
SMG Has Completely Offloaded All of smg360 to Cloudera
Today, SMG has offloaded 100% of queries and reports to Cloudera’s platform. These queries include very complex, sub-second, ad-hoc queries at an average rate of 50 Queries per Second (QPS) and at a peak rate of 90 QPS. A typical query complexity involves 20+ column aggregations, multi-step filtering, and complex hierarchical rollups. Most importantly, we completed this transition with 99.9% uptime and minimal customer impact.
Establishing an Affordable Disaster Recovery Site
The cost effectiveness of Cloudera’s platform enabled SMG to build a Disaster Recovery (DR) environment, not previously built because of the cost of scaling the legacy data warehouse. This allowed SMG to create an active-active environment setup. Shared among multiple customers, the environment has enough extra capacity built into each site that in the event of a disaster all traffic can be handled by a single site. This ensures consistent customer experience and improved reliability across all of SMG’s services.
Data Science Services
Because of the limits of the legacy data warehouse’s proprietary data format, data scientists had to extract data from it before they could run any AI/ML algorithms. On Cloudera’s platform, SMG Data Scientists have fast and easy access to the data they need to be able to unleash a host of functions, particularly Predictive Analytics, as the data ingested can now be simultaneously used for ad-hoc analytics as well as for running AI/ML tools. Today SMG can leverage tremendously more Data Science on both structured and unstructured data. New use cases are being identified regularly and developed, since the data is now unlocked and made available in an easily consumable form.
New Frontiers Offered by Cloudera’ Cloud Data Platform (CDP)
Looking to the future, SMG is building out its platform to continue to capitalize on the latest technological advancements. The team is excited about building their future roadmap in both private and public clouds. With the ability to separate storage from compute, smg360 will be able to support highly skilled power users and offer them the ability to infinitely slice and dice their data as needed.
Cloudera’s most recent offering, Cloudera Data Platform (CDP), offers a cloud based, consumption-driven model available via public cloud providers as well as on-premise data centers. CDP enables new features such as auto-scaling, auto-suspend-resume and the seamless sharing of data via Cloudera’s unique Shared Data Experience (SDX).
SMG realized a 2.5X reduction in total cost of ownership (TCO) by migrating to Cloudera
To learn more about the impact that SMG’s migration had on its business, check out the customer success story.
Additional Resources:
- TDWI Best Practices Report – The Modernization of the Data Warehouse
- How to migrate your data warehouse to Cloudera with Smartoffload