The Corner Office is pressing their direct reports across the company to “Move To The Cloud” to increase agility and reduce costs. And next to those legacy ERP, HCM, SCM and CRM systems, that mysterious elephant in the room – that “Big Data” platform running in the data center that is driving much of the company’s analytics and BI – looks like a great potential candidate.
Perhaps one of the most significant contributions in data technology advancement has been the advent of “Big Data” platforms. Their ability to ingest, process and store massive data volumes — literally zettabytes of high variety, volume, and velocity of both structured and unstructured data married together — have been yielding remarkable results. These platforms have truly made what was not long ago considered impossible, possible.
Historically these highly specialized platforms were deployed on-prem in private data centers to ensure greater control, security, and compliance. These platforms yield insights into very complex business challenges and are very much worth the investment. They have accelerated advancements from medical and energy R&D, to what and how we drive on the road, to how we socialize, shop and study to how and what we watch on TV and personal devices.
Considering the potential security risks and the gravitational pull of “if it isn’t broken, don’t fix it!”, a deeper cloud vs. on-prem cost/benefit analysis raises more questions about moving these complex systems to the cloud:
- Is moving this particular operation to the cloud the right option right now?
- What about hybrid? What about multi-cloud?
- How can we mitigate security and compliance risk?
- If we get on one cloud provider and find ourselves locked in – how do we get out or move; would it even be possible?
- What technologies will change or be available 3 years from now?
- Will building on open-source remain our safest option?
- If “Cloud Only” is our only option, is that really an option?
- How do we maintain visibility to all data and systems for security/compliance?
In the hyper-drive to “Move To The Cloud”, software vendors and Cloud Service Providers (CSPs) see these big data clusters as fantastic prospects for generating big revenue. Due mainly to their large storage and compute requirements as well as related software and integration complexities, they see these Big Data Elephants as Big Dollars in their pockets if they can move these elephants out of the data center and into the cloud.
There are now tens of thousands of instances of these Big Data platforms running in production around the world today, and the number is increasing every year. Many of them are increasingly deployed outside of traditional data centers in hosted, “cloud” environments. From a pure TCO and ROI perspective, these systems are often just as big as well.
These successful Big Data platforms draw from a large number of open-source projects and commercial software components designed for zettabyte scale, then configured into secure, reliable operations that typically run on highly sensitive or regulated data. They are not plug-n-play SaaS applications. They took major investments to develop, build, tune, and stabilize to a productive state by the early developers of these technologies. These platforms represent far more than just “Hadoop”.
The only constant is change, however. Valuable lessons and results have been obtained and technologies have evolved. For those engaged the longest, the lessons from the experiences run the deepest and are represented in their solutions. Later arrivals into the market had the benefit of learning from those pioneers, proposing other ways to achieve similar results with “easier, simpler” tools or methods with varying degrees of effect.
Over time, additional use cases and functions expanded from original EDW and Data Lake related functions to support increasing demands from the business. More sources, data, and functionality were added to these platforms, expanding their value but adding to the complexity, such as:
- Streaming data ingestion.
- Streaming data analytics.
- Data science & engineering
- Machine learning-based process optimization
As these needs evolved and demands grew, this challenged even the most agile IT groups to meet the demand and keep up with increasing business requirements, while trying to maintain and refine what is already there and working. The result often would be LOBs branching off from IT and creating their own “shadow IT” systems with CSPs to solve for their more time-sensitive needs that IT just can’t get to “in time”. Now, it’s not just one elephant in the room; it’s a herd of elephants in the building. And outside, in the cloud. More distributed data silos, more proprietary systems, creating more lock-in.
Adding to the challenge: original SME teams may have changed or left, raising support concerns. Vendors come and go. New priorities emerge. New technologies evolve, raising questions about what may have been the right way then, is there a better way now. As a result, Big Data administrators pursued running “in the cloud” to reduce costs, simplify workloads, and have greater flexibility. Many have found that some of the options available may be easier for some tasks. It may be faster for some tasks. But it may not be cheaper. And security profiles shift. In some cases, that can be an acceptable trade-off for increased business agility.
But the “elephant in the room” is NOT ‘Hadoop’. Nor is it a proprietary cloud-based version of SQL data warehouse, Apache Spark, or Apache Kafka offered by specialized cloud-based software providers that claim that they can deliver an equivalent Big Data platform experience – in the cloud or otherwise. There is no one single tool or component that can deliver successful enterprise-grade results. Big Data is an ecosystem as well as a philosophy. Successful Big Data platforms today inherently span both on-premise and cloud ecosystems, and thus “hybrid” by default.
The real “elephant in the room” is the hyperscale CSPs. These CSPs are NOT the ones who are the primary drivers of development. They facilitate access to what has been developed. While that does have it’s own intrinsic value, everything must still be acquired, provisioned, constructed, maintained and supported – using their available components, many of which are “open source”, such as Hadoop, Spark or Kafka – but on their cloud domain, locking you into their ecosystem. And THEN their storage and compute services and charges will apply.
But then the costs start running out of control. As a result, the “Move to the Cloud” mantra of recent years is evolving to a mantra of “Reduce Cloud Costs” by many of those corner offices who made the early moves.
Some valuable lessons have been learned from moving to the cloud, and have raised new questions in the quest to help reduce cloud costs. For example:
- Would it be easier, faster, more secure, and cost-effective to move some of these processes back on-premise?
- Can “Private Cloud” work for us as a viable option?
- What options are available that marries both modalities in support of a “hybrid” form factor – but maintains a single control pane?
Many vendors help you get to the Cloud and then lock you in. We have learned from working with thousands of customers that most enterprise customers are architecting for Hybrid.
If you have discovered that this may be the case for you, there is good news. There is an easier, faster, more secure way of adapting for this likely future. And if any of those Big Data environments are running any version of either Hortonworks or Cloudera in those ecosystems, you have some very easy, fast, secure options that other administrators may not have.
Whether you want to move your in-house platform to:
- Any of the major CSPs – or a combination of them;
- An alternate hybrid “private cloud” model adjacent to what you already have to support expansion flexibility options, or
- Right where you’ve got it – but convert it to an easier, faster, more reliable and secure configuration to keep your future options open, or
- An additional or new, more adaptable platform from the ground up
For over 10 years, we have built and helped to deliver very successful solutions and outcomes for these types of challenges for the largest and most successful companies. At Cloudera, we have the longest history, the deepest resource expertise, and the most modern performant, reliable, and scalable enterprise data cloud options available to make it happen. And in 5 easy, fast, secure steps here is how we could do it for you. The path is unique for every customer, but this the path we would follow with you:
Step 1: Inventory & Evaluation of active workloads for candidacy to move to cloud
- What are the existing workloads
- Cloud suitability for these workloads
Step 2: Recommendation of which best candidates to cost-effectively retain or migrate
- OpEx savings and probable ROI once migrated
- Small low-cost way of “Getting Started”
- Identify the risk factors:
Step 3: Security Requirements Assessment
- E.g. HDFS ACLs, metadata authorization hooks, and dependencies that may lead to risks, exposures, failures later on.
Step 4: Cloudera Data Platform Option Recommendation & Trial/Set up
- Which new Cloudera form factors make the most sense, and what workloads are the easiest, fastest, lowest risk to initiate with
Step 5: Workload Migration & Performance Optimization
- Fine-tuning for optimal PSR – Recently migrated
- Fine-tuning for existing cloud/on-prem workloads
There it is. 5 simple steps to finding the right solution options for at the very least making a more informed decision on the way forward. Or simply try it on your own right now. For us, it’s an easy, secure, fast-as-you’re-comfortable-running pace that can happen in a matter of weeks, if not days. From start to finish. Let’s Talk!