Have you ever considered how much data a single person generates in a day? Every web document, scanned document, email, social media post, and media download? One estimate states that “on average, people will produce 463 exabytes of data per day by 2025.”
Now consider that the federal government has approximately 2.8 million civilian employees and the department of defense has another 2 million active duty, Guardsmen, and Reservists. Add that to the nearly 19 million employees in state and local agencies and you have a combined agency population greater than the number of residents in Florida, the third-largest state. Each of these employees generates volumes of data on a daily basis. And this doesn’t even touch on the data generated by citizen services interfaces, machine or device-generated data such as video feeds, sensors, and communications data. The list could go on and on.
With these massive volumes of data, it’s common for agencies and enterprises to determine the data that is readily accessible and essential to mission success and prioritize for analytics. They essentially shine a light on the data that is most available and perceived as relevant to decision-making outcomes, while an unquantifiable amount of data stays in the “dark,” unused or unknown.
What is dark data?
Gartner defines dark data as “The information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing).” Some examples include employee records, internal and external communications, photo, video, and audio files, IoT sensor data, and streamed data.
By 2025, it’s estimated that the amount of data created, consumed, and stored will reach 180 zettabytes, with up to 90% of that unstructured and nearly all of it unused for decision making. This dark data resides everywhere in the enterprise, siloed in multiple data repositories, from laptops and mobile devices to data lakes and applications.
The purpose of this blog isn’t to emphasize the cyber risk of dark data but to spotlight its implications. Dark data is inherently considered unusable or not prioritized for analytics, which means it may be stored in less secure repositories, forgotten about or unmanaged, and left vulnerable to a data breach.
In the past, the accessibility and processing of this data was time and cost prohibitive to analytics at scale, but in the new age of hybrid data platforms finding, understanding, and utilizing dark data is possible and has huge implications for government applications, especially as AI and machine learning take hold within the enterprise.
- Agency Operations: Much of dark data is produced in day-to-day operations and has the potential to provide deep insight into how to improve operational efficiency for the public sector workforce while reducing costs for citizen services.
- IoT Insights: While data from IoT devices and sensors are regularly utilized for real-time alarms and control, understanding and analyzing IoT data opens the door to systems prediction use cases such as condition-based monitoring for aircraft maintenance, or optimization of emissions and water quality controls to avert future environmental catastrophes.
- Citizen 360: Having a complete picture of the citizen and their interactions with government agencies has wide-ranging positive outcomes from reducing fraud, waste, and abuse, to rooting out bias and providing more accurate and timely services.
Making dark data actionable
Organizations that succeed in optimizing the process of discovering, classifying, and leveraging their dark data to feed AI and machine learning models are well poised to reduce risk and unlock valuable insights that drive operational efficiencies. Enabling a solution to move previously untapped data to an analytics platform provides a broader and far more accurate view of data across the entire enterprise.
According to a Forbes analyst, “If IT can provide a unified data architecture that serves as an integrated layer connecting data endpoints and processes, it can make mission-critical data more discoverable, pervasive and reusable across all environments of an organization, including hybrid and multi-cloud environments.”
CDOs need to get a handle on their dark data now, as their organizations continue to gather increasing amounts of information every day. To help federal, state, and local government agencies minimize risk and leverage these “dark data” as a strategic asset, Cloudera Government Solutions provides universal data distribution and data in motion capabilities to discover, classify, and move all (including dark) data anywhere, to any application.
Learn how Cloudera Professional Services provides the building blocks to optimize your data value and deploy all modern data architectures.