Hadoop Security is the latest book from Cloudera engineers in the Hadoop ecosystem books canon.
We are thrilled to announce the availability of the early release of Hadoop Security, a new book about security in the Apache Hadoop ecosystem published by O’Reilly Media. The early release contains two chapters on System Architecture and Securing Data Ingest and is available in O’Reilly’s catalog and in Safari Books.
The goal of the book is to serve the experienced security architect that has been tasked with integrating Hadoop into a larger enterprise security context. System and application administrators also benefit from a thorough treatment of the risks inherent in deploying Hadoop in production and the associated how and why of Hadoop security.
As Hadoop continues to mature and become ever more widely adopted, material must become specialized for the security architects tasked with ensuring new applications meet corporate and regulatory policies. While it is up to operations staff to deploy and maintain the system, they won’t be responsible for determining what policies their systems must adhere to. Hadoop is mature enough that dedicated security professionals need a reference to navigate the complexities of security on such a massive scale. Additionally, security professionals must be able to keep up with the array of activity in the Hadoop security landscape as exemplified by new projects like Apache Sentry (incubating) and cross-project initiatives such as Project Rhino.
Security architects aren’t interested in how to write a MapReduce job or how HDFS splits files into data blocks, they care about where data is going and who will be able to access it. Their focus is on putting into practice the policies and standards necessary to keep their data secure. As more corporations turn to Hadoop to store and process their most valuable data, the risks with a potential breach of those systems increases exponentially. Without a thorough treatment of the subject, organizations will delay deployments or resort to siloed systems that increase capital and operating costs.
The first chapter available is on the System Architecture where Hadoop is deployed. It goes into the different options for deployment: in-house, cloud, and managed. The chapter also covers how major components of the Hadoop stack get laid out physically from both a server perspective and a network perspective. It gives a security architect the necessary background to put the overall security architecture of a Hadoop deployment into context.
The second available chapter is on Securing Data Ingest it covers the basics of Confidentiality, Integrity, and Availability (CIA) and applies them to feeding your cluster with data from external systems. In particular, the two most common data ingest tools, Apache Flume and Apache Sqoop, are evaluated for their support of CIA. The chapter details the motivation for securing your ingest pipeline as well as providing ample information and examples on how to configure these tools for your specific needs. The chapter also puts the security of your Hadoop data ingest flow into the broader context of your enterprise architecture.
We encourage you to take a look and get involved early. Security is a complex topic and it never hurts to get a jump start on it. We’re also eagerly awaiting feedback. We would never have come this far without the help of some extremely kind reviewers. You can also expect more chapters to come in the coming months. We’ll continue to provide summaries on this blog as we release new content so you know what to expect.
Ben Spivey is a Solutions Architect at Cloudera, and Joey Echeverria is a Software Engineer at Cloudera.