Live data-streaming offers businesses exciting new opportunities to transform the way they operate, leveraging real-time insights to drive better decision making and enhance operational efficiency.
To find out more about how live-streaming data might impact the financial services sector, I sat down for a chat with Dinesh Chandrasekhar, Head of Product Marketing in Cloudera’s data-in-motion Business Unit. If you missed Part 1 of our Q&A, you can catch-up on it here.
In Part II of our Q&A, Dinesh will be looking at how businesses can leverage technology like Apache Flink and Apache NiFi to promote low latency processing of high-volume, high-velocity data.
Hello Dinesh, thank you for joining us for Part II of our Q&A on streaming data. Can you talk a bit about how businesses best use Flink within a streaming architecture and what is it about the solution that promotes low latency processing of high-volume streaming data?
Within the architecture, Flink is a stream processing engine which means that it can process different sets of streams, translating to millions and millions of data inputs coming in from a variety of sources.
All these inputs that are streamed into an enterprise can be processed by a real-time streaming solution like Flink. If a business has a database and needs to find out what stock was traded, or which stock had the most number of trades in a particular time frame, that is relatively simple to process as there are defined data points. But when the data is more complicated and unbounded then how do businesses understand trends and patterns?
With a stream processing engine like Flink, they can define logical time windows, these are chunks of time that might be 5 seconds or so, and it can start analyzing data within these time frames.
Say a particular stock is suddenly being traded extremely widely because somebody got a tip, or the company is about to be bought or sold. There would be a flurry of transactions on the stock which would in turn affect the value. This means that the exchange would have to intervene and would likely introduce a breaker.
How does that happen in near real-time? With the way systems are set up, if an exchange let it go even for a few minutes it can go completely unchecked with huge financial implications.
But if an exchange is able to process data in real-time and detect an unnatural pattern on this stock being traded at extremely high volumes which is affecting the value or price, it can trigger a stop immediately, preventing further disruption or manipulation. That is something that a solution like Flink can do in the background.
Flink could be running in the background and defining patterns and analyzing two different events too. In Part I, we talked about the credit card example and in this example, Flink could define context around geography and time and block a potential fraudulent transaction immediately.
It is something Flink does extremely well and the key term here is low latency. Low latency is the least time it takes to respond back in terms of processing.
Laden with high latency processes, companies potentially lose millions of dollars, so low latency processing helps these types of events being caught promptly and in a timely fashion. And this is a key advantage for businesses in leveraging Flink. Alerting businesses that these events are happening and preventing potentially disruptive events is crucial in a fast-moving sector such as financial services.
This is critical in high-volume scenarios too because processing different types of volumes and of complex data is not easy and that is where a streaming analytics solution like Cloudera DataFlow, which can leverage Flink, can help.
Can you talk to us a bit about the benefits of NiFi for financial services businesses?
One of the things you notice within the financial services space is the tremendous amounts of data that businesses deal with in their day-to-day financial transactions.
Banks especially deal with a broad range of data such as bank-to-bank transfers, customer and international transfers, deposits, withdrawals, credit applications, and so on. All of these happen continuously and repetitively on a daily basis, amounting to petabytes worth of information and data.
This requires massive amounts of data ingestion, messaging, and processing within a data-in-motion context. One of the primary challenges that banks and financial institutions have is the data ingestion aspect of this and how to incorporate the data they collect into their architecture.
From a data ingestion standpoint, NiFi is designed for this purpose. It was originally designed primarily to ingest high volumes of data and over the years as NiFi has evolved to become even more powerful.
The library of 300+ NiFi Processors has evolved too and over the past few years, it is noticeable that NiFi has become even better at collecting data from a variety of data sources. It can now push data into an organization like a firehose at huge volumes and at high speeds.
The primary benefit of NiFi is that you can collect massive amounts of data and move that data in a timely manner at speed. Secondly, NiFi has a lightweight version or an agent called MiNiFi, which does the job of collecting and processing the data at the Edge so that not all data needs to be sent back into an organization for immediate processing.
When NiFi and MiNiFi are deployed as a combination at the Edge, businesses are able to collect data from the source with no latency or data loss. The edge can be meaningful in the Financial Services world as this could be an ATM kiosk or a bank branch or a computer of a loan processor.
The third advantage of NiFi is its unique ability to connect with hundreds of data sources and edge endpoints. Therefore allowing organizations to push edge data into any cloud source including AWS, Google, Azure or any on-premises data warehouse or data lake. The ability to process data from anywhere makes it truly pluggable and easily scalable.
As mentioned, one of the key challenges for financial services companies is in managing vast volumes of data and data sets at scale. And NiFi and MiNiFi give companies this type of capability as well as the ability to do it at speed.
Watch how you can set up Stream Processing with Apache Flink on CDP.
To find out more about Cloudera’s real-time streaming data offerings, visit here.
Stay tuned for Part III, the final installation of our Q&A with Dinesh as he expands on technology like Flink, NiFi and Kafka and share how Cloudera is supporting businesses in integrating these solutions to leverage new opportunities within the financial service sector