COVID-19 vaccines were developed in record time. One of the main reasons for the accelerated development was the quick exchange of data between academia, healthcare institutions, government agencies, and nonprofit entities.
“COVID research is a great example of where sharing data and having large quantities of data to analyze would be beneficial to us all,” said Renee Dvir, solutions engineering manager at Cloudera.
Without committing to openly shared data, the New York Times asserted in February 2021, coronavirus vaccines would have taken much longer to develop. A Chinese lab announced the discovery of the virus in January 2020 and released the genome sequence to the public a few days later, enabling labs around the world to start working on vaccines. The first vaccine became available in December 2020.
Collaboration between multiple entities produced a positive outcome for the development of COVID vaccines, but organizations may not always be so willing—or able—to share data.
There are a few reasons for that. In the private sector, data is viewed as a way to gain a competitive edge, so companies must grapple with the question of the public good outweighing their own interests. When sharing data, organizations have to wrestle with questions such as, “How will data be used once fed into artificial intelligence (AI) and machine learning (ML) engines?” and “How to prevent manipulation?” and “How to keep it safe from manipulation?”
Regulations can complicate sharing, especially when laws on data privacy and security differ from one jurisdiction to another. This creates challenges for organizations to work toward a common goal, Dvir said. “The result of this is, as we see, the bigger companies that naturally have more data can get more data-driven insights and move faster while the smaller research initiatives have less to go on.”
With COVID, organizations were able to overcome these data-sharing challenges. Thanks to the Bermuda Principles agreement of 1996, a mechanism was in place for sharing human genome data within 24 hours of generation. As different researchers made discoveries about the virus and its effect on humans, they shared their data.
Sharing for the Public Good
Most data collected today resides in the private sector, noted a 2018 Harvard Business Review article. “Typically ensconced in corporate databases, and tightly held in order to maintain competitive advantage, this data contains tremendous possible insights and avenues for policy innovation,” wrote the authors Stefaan G. Verhulst and Andrew Young. “But because the analytical expertise brought to bear on it is narrow, and limited by private ownership and access restrictions, its vast potential often goes untapped.”
Efforts are under way to break the natural tendency of corporations to hoard their data. For instance, the National Institutes of Health (NIH) in 2014 launched the Accelerating Medicines Partnership (AMP) to encourage collaboration in developing treatments and cures.
“AMP partners share a common goal of increasing the number of new diagnostics and therapies for patients and reducing the time and cost of developing them by jointly identifying and validating promising biological targets for each of these disease areas,” according to the NIH.
When it comes to sharing data for the public good, healthcare is an obvious choice because of the generally accepted view that public health is a shared responsibility. But data exchanges can also serve the public in other contexts. Think of research focused on education, space exploration, and infrastructure enhancements.
“Data is everywhere and data is part of everything. There are currently more devices connected, collecting and sending data than there are humans. So just imagine how much more insight and useful information companies can unlock if this data is collected, stored and analyzed correctly,” Dvir said.
One data-sharing approach involves so-called collaboratives between private corporations, government, academia, and nonprofits. An example is Facebook’s Disaster Maps initiative, which shares data with partner organizations such as the Red Cross, UNICEF, and the World Food Program.
In another use case, Waze is sharing traffic data with more than 60 cities to improve urban planning and ease congestion. “Private data sets often contain a wealth of information that can enable more-accurate modeling of public services and help guide service delivery in a targeted, evidence-based manner,” Verhulst and Young wrote in the Harvard Business Review.
But Is It Safe?
Even in a humanitarian context, the sharing of data raises some questions. One issue is how much should users be told about the ways in which their data is applied. When using digital platforms for commerce, information, banking, and healthcare, people willingly share private information without necessarily understanding its ultimate use.
Laws such as the European Union’s General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) address this by putting strict controls around what data can be collected and stored when and for how long.
Still, some risks remain. Once sent out into the world, data can be changed, manipulated and misused.
A Forbes article raised concerns about Facebook’s Disaster Maps, suggesting governments could use the captured data for Orwellian purposes. “Once governments become accustomed to their partners mapping civil society in motion, it will only be a matter of time until Facebook finds itself under court order to provide real-time maps for far more nefarious purposes,” author Kalev Leetaru wrote.
Of course, organizations aren’t helpless in trying to safeguard the data they share. “There are several best practices, such as employing data anonymization and utilizing encrypted communication channels, which can bolster efforts to safeguard sensitive information while still extracting important insights. Firms can conduct governance audits, refine data sharing agreements, and construct guardrails,” said Lloyd Danzig, chairman and founder of the International Consortium for the Ethical Development of Artificial Intelligence.
A level of trust is necessary when organizations share data, but so is the need for accountability. If the process of sharing between multiple parties is to be trusted, accountability requires tracking, visibility, and security.
As Cloudera’s Dvir put it: “When it comes to privacy and data manipulation, trust isn’t enough. Entities also need to uphold local compliance regulations. Data platforms in use for sensitive data need to provide governance and security methods to meet each organization’s requirements and enable getting the maximum out of existing data without compromise or misuse.”
A critical part of Cloudera Data Platform (CDP), Shared Data Experience (SDX), Dvir said, supports data exchange between multiple parties using a number of management, security and compliance and audit controls to protect the data and the process of sharing it.
Safeguards built into a platform are essential. Still, each organization in a data-sharing arrangement must commit to an ethical approach to ensure a free, honest, mutually beneficial process. Anything else is probably self-defeating. “Having safeguards in place is necessary not only for compliance but also gives freedom to collect more data and get more value out of it,” Dvir said.
Learn more about the concept of “privacy by design” and the four steps to instilling privacy as a core business process in your organization.