Can Machine Learning Protect Your Digital Identity?

by Ronda Swaney

Posted in Business | August 23, 2018 3 min read

This post was published on Hortonworks.com before the merger with Cloudera. Some links, resources, or references may no longer be valid.

A digital identity is a collection of information about a person, business, or device that exists online. Just as your eye color and height help identify you in the real world, digital identifiers can do the same in cyberspace. Your username and password combinations, and answers to security questions, provide identity clues in the digital arena.

The Value of Identity

In a connected world, the value of digital identity is obvious—that’s why stealing it is big business. Research firm Cybersecurity Ventures projects cybercrime damages will cost six trillion dollars annually by 2021. Cybercrime is a broad term that encompasses everything from coordinated attacks on a business (DDoS, malware, ransomware, or phishing) to selling valuable information about individuals (bank log-in credentials, credit card numbers, or social security numbers).

Businesses try to protect themselves by learning about the types of attacks, asking users to be suspicious of unusual contacts, and bolstering their cybersecurity operations. In response, criminals simply refine their attacks or figure out ways around the barriers that businesses put in place.

How Machine Learning Tracks Identity

Personal identification has progressed over the years in order to make it harder for bad actors to gain unauthorized access. It has evolved from simple usernames and passwords to incorporating tokens, two-factor authentication, and, most recently, facial recognition.

Machine learning plays a role in the most recent iteration of identity protection. Machine learning looks for data patterns, statistics, and averages. With that information, it gains predictive ability. Deep learning is a subset of machine learning that trains a computer to act more like the human brain, allowing the computer to learn by example as a human might. Deep learning was key in the advancement of facial recognition. Images were used to train machines to find face landmarks, such as the eyes, nose, and mouth. Machines then turned that knowledge into math: distance between the eyes, length of the nose, width of the mouth, etc. In essence, each person has their own “face equation” derived from that math. With that information, devices like phones are identifying authorized users by their faces.

Online behavior is another aspect of digital identity. How fast do you type? Do you regularly mistype your passwords? Which websites do you typically visit? How big is your vocabulary? What sentence structures do you use? Machine learning looks for behavior patterns that are unique to an individual. Just as your face is unique to you, so is your online behavior.

Using Deep Learning to Fight Fraud

Fraud hits the insurance sector particularly hard. This sector has long used statistical data models to aid in this fight, but those tools mostly help with the discovery of fraud after the fact, and they depend on special investigation units to perform much of the work. Machine learning provides a proactive method to discover patterns of insurance fraud before fraud occurs. The sheer processing power of machine learning gives agents the ability to process claims much faster and more accurately than humans can. A data platform for machine learning and predictive modeling can use rules-based flags on streaming data to catch fraudulent or invalid claims. As claims data flows into the system, real-time notices alert special investigation claims analysts. With this information, they prioritize claims investigations based on the highest likelihood of fraud.

The finance sector faces fraud challenges, too, whether through breaches that steal customer data, easy-to-hide criminal transactions, or real-time payments that demand much faster risk analysis. Again, the industry has turned to machine learning for help. The sheer volume of data that must be analyzed exacerbates the problem. Storage limits can require companies to archive historical data, limiting its availability. That means a day’s trading data can’t undergo risk analysis until the business day is over. This delay creates an unacceptable risk exposure for both money laundering and rogue trading. By using an appropriately equipped data platform, finance companies can accelerate their speed to analytics and extend their data retention. They can analyze data from the current workday and keep it highly available for years to come.

There will always be criminals looking to circumvent security measures. But as machine learning improves tracking fraud patterns and out-of-character behaviors, businesses will get better at stopping fraud before it happens.

To learn more about how machine learning could help your business, read more here.

Ronda Swaney

Freelance author and journalist @RondaSwaney

More by this author

Editor's Choice

Business

Generative AI for the Enterprise

Technical

Building Trust in Public Sector AI Starts with Trusting Your Data