Apache Spot (Incubating): Fighting Cyber Threats via an Open Data Model

Categories: Hadoop Platform Security & Cybersecurity Use Case

Last week, the open source Open Network Insights (ONI) project, now called Spot, was accepted into the ASF Incubator. Here are the highlights about its open data model approach and initial use cases.

One of the biggest challenges organizations face today in combating cyber threats is collecting and normalizing data from numerous security event data sources (often up to thousands of them) to build the required analytics. This process often results in those analytics becoming dependent upon specific technologies for detecting threats, and prevents the necessary flexibility and agility for keeping up with these ever-increasing (and complex) attacks. Thus, technology lock-in is often a byproduct of today’s status quo, as it’s extremely costly to add new technologies (or replace existing ones) because of these downstream analytic dependencies.

Apache Spot (incubating), formerly known as Open Network Insight (ONI), is an open source project (based on the Cloudera platform) originally developed by Intel Corp. engineers for cybersecurity analytics, and premised on a community-driven open data model that addresses the need for agility and de-coupling of analytics from threat detection to counter constantly evolving use cases. In this blog post, you’ll get an overview of Spot’s unique open data model and initial use cases.

Spot’s Open Data Model Strategy

Today, Spot’s primary use case is network traffic analysis for network flows (NetFlow, sFlow, and so on), DNS, and proxy. Essentially, Spot enables identification of threats through anomalous event detection using both supervised and unsupervised machine learning. However, Spot’s open data model strategy aims to extend Spot’s existing capabilities to unlock a broader set of cybersecurity use cases than are currently supported.

The availability of an open data model, which can be applied “on-read” or “on-write”, in batch or stream, will allow for the separation of security analytics from the specific data sources on which they are built. This “separation of duties” enables the Spot community to build analytics that are independent of specific technologies and provide the flexibility to change underlying data sources, without impacting the analytics. This approach will also give security vendors the opportunity to build additional products on top of the open data model to drive new revenue streams.

For example, to support this broader set of use cases, Spot could be extended to collect and analyze other common event-oriented data sources analyzed for cyber threats, including but not limited to the following log types:

  • Active Directory/identity management
  • User/entity behavior analysis
  • Endpoint/asset management
  • Proxy
  • Web server
  • Operating system
  • Firewall
  • Intrusion prevention/detection
  • Data loss prevention
  • Network meta/session and PCAP files

You can learn technical details about Spot’s open data model (and supported data formats) here.

Use Cases for Spot, Today

Even with this strategy for adding community-driven extensions, today, Spot is unique in its ability to address the following cybersecurity use cases, which are not effectively addressed by legacy technologies:

Detection of Known and Unknown Threats via ML and Advanced Analytic Modeling

Current technologies are limited in the analytics they can apply to detect threats. These limitations stem from the inability to collect all the data sources needed to effectively identify threats (structured, unstructured, and so on), and inability to process the massive volumes of data needed to do so.

Legacy technologies are typically limited to rules-based and signature detection; they are somewhat effective at detecting known threats but struggle with new threats. Spot addresses these gaps through its ability to collect any data type in any volume. In addition to the various analytic frameworks that are currently supported (including machine learning), Spot unlocks a whole new class of analytics that can scale to today’s demands. The topic model for detecting anomalous network traffic is one example of where the Spot platform excels.

Reduction of Mean Time to Incident Detection & Resolution (MTTR)

One of the challenges organizations face today is inability to detect threats early enough to minimize adverse impacts, which stems from the limitations previously discussed with regard to limited analytics. It can also be attributed to the fact that investigative queries often take hours or days to return results.

Legacy technologies can’t offer a central data store for facilitating such investigations due to their inability to store and serve the massive amounts of data involved. This limitation cripples incident investigations and results in MTTRs of many weeks and months; meanwhile, the adverse impacts of the breach are magnified, thus making the threat harder to eradicate. Spot addresses these gaps by providing the capability for a central data store that houses all the data needed to facilitate an investigation, returning investigative query results in seconds and minutes (vs. hours and days). Therefore, Spot can effectively reduce incident MTTR and reduce adverse impacts of a breach.

Threat Hunting

It has become necessary for organizations to “hunt” for active threats as traditional, passive threat-detection approaches are not sufficient. “Hunting” involves performing ad-hoc searches and queries over vast amounts of data representing many weeks or months of events, as well as applying ad-hoc algorithms to detect the proverbial needle in the haystack. Traditional systems do not perform well for these types of activities as the query results sometimes take hours and days to be retrieved. These traditional systems also lack the analytic flexibility to construct the necessary algorithms and logic. In contrast, Spot addresses these gaps in the same ways it addresses others: by providing a central data store with the required analytic frameworks that scale to the required workloads.

Join the Mission

The Spot project already includes contributors from Cloudera, Intel, eBay, Webroot, Jask, Cybraics, Cloudwick, and Endgame, but for success in its mission, more contributors will be needed. With its open data model approach, Spot has the potential to become a touchpoint project that galvanizes the developer community to help combat the serious (and constantly mutating) cybersecurity threat faced by virtually every organization, and connected consumer, on the planet. We’re confident that ASF governance would go a long way toward making that vision a reality.

For more background about the motivation behind Spot, read this.

Mark Grover is a Software Engineer working on the Spark team at Cloudera, and a committer to Spot.

Morris Hicks is a Systems Engineer at Cloudera, and a committer to Spot.

Facebooktwittergoogle_pluslinkedinmailFacebooktwittergoogle_pluslinkedinmail

8 responses on “Apache Spot (Incubating): Fighting Cyber Threats via an Open Data Model

  1. Chinmay

    Apache Metron which is already in incubation promise the same thing about the cyber security analytics. How is it different from that?

    1. Justin Kestelyn Post author

      There is some overlap with Metron, yes. However, Spot’s open data model sets it apart.

  2. David Sunny Kondru

    Hi Team,
    We would like to be a contributor for this programme, Please let us know how to approach and whom to…
    Thanks
    David

  3. Nathan Segerlind

    This post is from September.
    When will the open data model show up in the incubator?

  4. Praveen kumar

    Is there any process document for installing Apache spot on cloudera platform ?
    Thanks in advance
    Praveen