Driving Agility and Scalability through Smart Data

Last year presented business and organizational challenges that hadn’t been seen in a century and the troubling fact is that the challenges applied pains and gains unequally across industry segments. While brick-and-mortar retail was crushed a year ago with mandated store closures, digital commerce retailers realized ten years of digital sales penetration in only three months. In 2020, a McKinsey study reported that “Industry 4.0  Industrial innovations are expected to create up to $3.7 trillion in value by 2025.”, but “that value won’t be spread evenly. It’s already clear that a small number of organizations are running away with the first-mover advantage.”

Why is this? Pages can be written on this topic, from addressing proof of concept scale-up planning and execution to organizational changes that are needed for successful digital transformation. Cloudera sees success in terms of two very simple outputs or results – building enterprise agility and enterprise scalability. We have an even more simple view that to achieve these solid and high return on investment outputs, you need to focus on data – as business insights, decisions, prescriptive and preventative recommendations start and end with data. 

Let’s start at the place where much of Industry’s 4.0 data is generated – at the Edge. In the last five years, there has been a meaningful investment in both Edge hardware compute power and software analytical capabilities.  Real-time and time series data is growing 50% faster than static data forms and streaming analytics is projected to grow at a 34% CAGR. Streaming data systems are a relatively new addition to enterprise data systems and have evolved to providing business-critical roles. Thus, it’s no surprise in this era of rapid development that tooling hasn’t evolved yet for streaming systems as compared to the more traditional batch systems. Contrast this with the skills honed over decades for gaining access, building data warehouses, performing ETL, creating reports and/or applications using structured query language (SQL).

Benefits of Streaming Data for Business Owners

If business agility and scalability are the goals, it’s best to understand what characteristics streaming data from the Edge has from the Line of Business perspective.

Unpredictable Data Volume and Flow

Data streaming from the Edge has the volume of a fire hose, but the key difference is that the rate of flow is not uniform.  As businesses are moving more and more towards real-time data movement instead of hourly/daily batches, data bursts become more visible and less predictable mainly due to two reasons:

  • Once the hourly/daily batch windows are removed, there’s nothing left that aggregates and averages out lows and peaks. If there is a data burst lasting for five minutes, followed by a calm period of another five minutes, the data flow system has to deliver the expected performance throughout both periods without wasting resources. A batch system ingesting data every hour would have averaged out these bursts.
  • Moving to real-time data flows is an opportunity to connect new streaming data sources to the data lifecycle, which did not fit the previous batch model. While these new sources increase the amount of data that a data flow system has to process, more often than not, these sources are sending data via unreliable network connections with each network outage resulting in its own data burst.

What does this mean for a Line of Business owner?  Either they have to build rigid architecture for the highest maximum data surge, or build a system that is elastic and scalable.  The business’s dilemma is balancing the need for high-performance data processing with the associated compute costs. Building the rigid system is counter to the goals of maximizing agility or scalability.

Organizational Access

With the move towards streaming data and the desire for line of business owners and their teams to gain access to data faster, centralized data teams struggle to keep up with the ever-growing list of data flows that the business users want to implement.  No one wants to wait three weeks with a Jira ticket request to develop a report – by that time, the once important event might have passed, and now even bigger enterprise insights are needed.  Consider an unstable distillation column where immediate insight might have resulted in process stabilization, but now the process is out of control producing economically unprofitable output.  

In the typical manufacturing enterprise, only a small team has the core skills needed to gain access and create value from streams of data. This data engineering skill set typically consists of Java or Scala programming skills mated with deep DevOps acumen. A rare breed. The result is that streaming data tends to be “locked away” from everyone but a small few, and the data engineering team is highly overworked and backlogged.

Democratization of Data

One solution to these business limitations is a concept called the democratizing of data which refers to a mechanism that provides a self-serve paradigm and culture for an ever-growing internal audience to get the data they need to add value to the business. They no longer need to ask a small subset of the organization to provide them with information, rather, they have tooling, systems, and capabilities to get the data they need. Data democratization has been a topic of conversation for the last few years – but mostly centered around data warehousing and data lakes. Data democratization directly contributes to optimized agility and, if built properly, scalability.

Available Solutions 

So what is the solution to the characteristics of data-in-motion and the organizational limitations preventing people from accessing data? Cloudera has tackled the challenge with two fundamental approaches.  

Cloudera’s first decision was to continue to invest in the industry leading Cloudera DataFlow because of the following attributes:

  • Easy for your data team to use 
  • Promotes trust in the data that is collected  
  • Provides trusted ingestion of your Edge data without interruptions 

Cloudera’s second smart decision was to turn to tried and true tools (based upon well known SQL) that enable and encourage data democratization. Cloudera acquired Eventador, an Austin-based company that specialized in simplifying access to streaming data to do just this.

SQLStream Builder, which enables analysts and personas like those to access real-time streaming data with just simple SQL-like tools. Solving a fundamental problem for key personas (developers, data analysts and data scientists) in organizations that need immediate access to such data but are unable to due to organizational limitations and complex code implementations. They no longer have to depend on any skilled Java or Scala developers to write special programs to gain access to such data streams.

What does all this mean for those in business leadership roles? 

As business leaders, controlling shared capital costs originating from your IT, Operations, R&D or Supply Chain groups is a challenge. Understanding your capital needs from your IT group will help you assess this dynamic and changing environment that digital transformation has placed the business in. Scalable architecture and workflows control both variable and capital costs in an enterprise by right sizing capital costs and ensuring surges of data volume can be addressed at any time. Enabling common data access or data democratization within your business organization promotes agility and lowers overhead costs by allowing access to a greater number of skilled workers such as developers, data analytic and data scientists that can be placed within the business group, focused on the business needs, and directly responding to business needs at your speed and on your clock.

Read more about David’s perspective and passion in promoting the digitization of industrial enterprises or learn more about Cloudera Data Flow.


by ashishkumarji on

this is really good, thank you for sharing

Leave a comment

Your email address will not be published. Links are not permitted in comments.