This blog post was published on Hortonworks.com before the merger with Cloudera. Some links, resources, or references may no longer be accurate.
As much as data marketplaces seem viable, they have not gained traction still. There are some marketplaces that are available for industry/public data sets like D&B Data Exchange, AWS Public Datasets etc. However, the vision for ecosystems to start sharing enterprise data sets is yet to materialize. More than any of the other technical reasons that may be quoted here, the most prominent reason is Trust. Public data sets, such as weather, stocks, census etc can be verified, qualified and consumed fairly easily without any larger impact downstream. But, when key business decisions or actionable intelligence need to be arrived at based on enterprise data provided by a 3rd party company, trust becomes a key concern. Can a biopharmaceutical company easily conclude, without any repercussions, that a drug is safe for human consumption based on research data purchased on the marketplace? Can a car company model its autonomous vehicle intelligence based on driving data acquired through the marketplace? Trust becomes a critical point in any of these scenarios. Every enterprise wants to know the credibility of the source, the reliability of the data, the ability to understand overall provenance of the data to identify any tampering or quality issues.
Enter Blockchain. As Blockchain is rearing its head as a viable technology in certain industry verticals like Transportation/Logistics, Manufacturing, Retail etc., it is still far away from becoming mainstream in other industries. However, as the interest and hype persevere, the evolution of the technology into a stronger variant is inevitable. Even in its current state, the promise of the blockchain technology is that data is immutable, auditable and completely traceable. The key benefit of using blockchain is to gain trust. Permissioned blockchains within certain ecosystems can lead the way for new digital data marketplaces where enterprises can freely trade in data and can trust the sources as well. Ecosystem-driven blockchains are bound to happen and this is a fantastic use case for that. Imagine an ecosystem of car manufacturers, OEM parts vendors, insurance companies, DMV etc. being part of one blockchain. The data exchange within that blockchain is of extreme value to all the participants of that ecosystem. Data remains contained within that ecosystem as well.
Easier said than done, there are still a few hurdles to overcome with this model too –
- Speed – With latency and transaction commit times taking longer in today’s blockchain technology, loading large datasets on to the blockchain is not the ideal model. The data might still reside on a different data store but the metadata associated with the data set might be on the blockchain. This then raises the concern of tampering with the data.
- Ethical boundaries still exist around data sets related to specific industries. For example, healthcare – even if the patient identifiable data is anonymized, how much of that data can be shared and with whom? Big Pharma is notorious for preying on such information and raising the price of drugs for certain pockets. What if pharma companies can purchase this data easily on the marketplace? So, even with permissioned blockchains, how do you control access to data within a defined ecosystem of participants? And if you start controlling it, does it become centralized, thereby defeating the whole point of the blockchain?
- Incentivization – How do we incentivize ecosystem participants to provide data for the betterment of the ecosystem? If it is purely for monetization, the competitive nature of the participating companies within the ecosystem itself can be a hindrance.
We are stepping into the next era of digitization and blockchain (maybe the next evolutionary phase of it) will be a defining technology for enabling data exchanges across marketplaces within ecosystems. There are already blockchain powered marketplaces like Ocean Protocol or IOTA data market, that enables you to connect to live sensors across the world and receive real-time streaming data from that sensor for a subscription price. There will be more such solutions coming forth – more catered to specific verticals to gain more adoption and limit challenges.
In my third and final post in this blog series, I will get into the technical details of such an implementation and what it takes for an enterprise to setup one or even participate in one. In the meanwhile, please check out this webinar from Trimble Enterprise Transportation Services, where they explain how they have streamlined their logistics operations and boosted efficiency with multiple blockchains that are powered by ML models and data streamed through Apache NiFi.