Live data-streaming offers businesses exciting new opportunities to transform the way they operate, leveraging real-time insights to drive better decision making and enhance operational efficiency.
To find out more about how live-streaming data might impact the sector I sat down for a chat with Dinesh Chandrasekhar, Head of Product Marketing in Cloudera’s data-in-motion Business Unit.
Hi Dinesh, thank you for joining us for today’s Q&A. To start off, what are the advantages of a forward-looking data-in-motion strategy?
Data-in-motion is predominantly about streaming data so enterprises typically have two different ways or binary ways of looking at data.
One is data at rest, for example in a data lake, warehouse, or cloud storage and from there they can do analytics on this data and that is predominantly around what has already happened or around how to prevent something from happening in the future.
Data-in-motion on the other hand is data constantly coming into an enterprise or into the cloud that doesn’t have any finite end to it. In a financial services context, this could be trades or transactional data.
When you have data-in-motion at that kind of pace and volume, it could consist of hundreds of thousands of data points. Having a data-in-motion strategy means that businesses have a plan to capture data in real-time and understand what the data means to the company as well as how to respond to it.
Being able to understand the implications and to act on them is the most important part of a forward-looking data-in-motion strategy. Streaming data implementation is only as good as a company’s ability to harness the value of the data and react to it in real-time.
A good example of this is fraud detection, credit card fraud is a big problem in the financial services industry which can mean a significant financial loss. Real-time streaming data allows businesses to add context to data points to better understand their meaning.
For example, if a credit card was used in the United States and shortly afterward the same card was used in Spain to withdraw the same amount, these two events in isolation could appear legitimate.
The card is being used for its purpose and the amounts are relatively insignificant. However, in the context of time and geography, these two events point to a pattern of fraud.
With real-time streaming data, this context and detection is instantly available and the second fraudulent transaction can be blocked immediately.
From a business standpoint, companies can save money as well as improve operational efficiency. For example, a bank can get real-time data on ATM performance and be alerted when they are low on cash or not working correctly.
These small events can cause frustration for customers with the result that they look at changing banks. By understanding potential issues in real-time, financial firms can pre-empt these problems and improve customer service and satisfaction.
What are some of the biggest challenges businesses face in leveraging streaming data insights and how can they overcome these?
If you were to look at real-time streaming data, the first three hurdles are the classic 3 Vs which are Volume, Velocity, and Variety.
In terms of volume, businesses deal with hundreds of thousands of endpoints like sensor data from ATMs. A financial service firm or bank may have thousands of ATMs across the country, which means a huge amount of data points constantly feeding back information.
Businesses need to be able to ingest huge volumes of data from these data points as well as handle, process, and store this vast amount of data.
Then they need to move to data separation so that they not only ingest the data but prepare the data so that it becomes processable. Enriching the data can be daunting for businesses in the context of the volume of data they are ingesting.
Too much data to process in a timely manner is another big challenge, the true value of data lies in processing it in real-time and responding accordingly. If you cannot respond to data in real-time, it becomes useless.
The key next thing is velocity at which data comes through, it is important at an operational level and businesses need to have a streaming analytics platform that can understand the data, handle the volume and manage the variety and format of the data coming in as well.
Beyond volume, velocity, and variety the two biggest challenges for businesses around streaming analytics are security and governance. Organizations need to handle them in a transparent way because data hacking can happen at any point in the data-in-motion journey.
Security needs to be treated at a mission-critical level and data security also needs to be a core part of a business’s strategic approach. The governance aspect is perhaps even more important and businesses need to be able to understand where the data comes from.
Data lineage, personally identifiable information or PPI and metadata all fall under a broad data governance banner which is critically important in terms of what needs to be protected and mapped out.
Once the data gets into a place where organizations are processing it in real-time, users need access to it in real-time as well.
Finally, having the right platforms and skillsets within the business to be able to process data and get the right insights in real-time is critical to leveraging the power of data to drive meaningful change.
Can you talk about some of the technology that helps make managing live streaming data possible?
Cloudera DataFlow offers the capability for Edge to cloud streaming data processing.
This type of end-to-end data processing that starts at the Edge and ends in the cloud is made possible by using Apache NiFi.
NiFi is software from the Apache Software Foundation which is designed to help the flow of data through an organization. With a combination of MiNiFi and NiFi, businesses can collect data out from the Edge into their organization and leverage messaging capabilities to scale up volume.
With hundreds of thousands of data points or endpoints or inputs, companies today have a deluge of data and in order to be able to handle that and distribute it to other applications that need that data in real-time, a solution like Apache Kafka can help distribute it to all the other applications.
Finally, a stream processing and analytics solution like Apache Flink can read data in real-time from Kafka and understand complex and patterns events and correlate that to help provide insights for businesses and decision-makers.
That combination of MiNiFi, NiFi, Kafka, and Flink is what makes for a true data-in-motion platform and empowers companies with the ability to ingest, scale, and process data in real-time.
CDP is our enterprise data strategy which makes it possible for businesses to leverage complex data workflows, across any different environment and this makes it truly differentiating. This can extend to streaming analytics capabilities into any cloud environment.
To find out more about Cloudera’s data-in-motion philosophy, you can download a copy of A Blueprint for Enterprise-wide Streaming Data Architecture.
Stay tuned for Part II of our Q&A with Dinesh as we dive deeper into how live streaming data and technology is helping businesses within the financial service sector..