Cloudera Developer Blog · Security Posts
Post written by Cloudera Software Engineer Aaron T. Myers.
Apache Hadoop has had methods of doing user authorization for some time. The Hadoop Distributed File System (HDFS) has a permissions model similar to Unix to control file and directory access, and MapReduce has access control lists (ACLs) per job queue to control which users may submit jobs. These authorization schemes allow Hadoop users and administrators to specify exactly who may access Hadoop’s resources. However, until recently, these mechanisms relied on a fundamentally insecure method of identifying the user who is interacting with Hadoop. That is, Hadoop had no way of performing reliable authentication. This limitation meant that any authorization system built on top of Hadoop, while helpful to prevent accidental unwanted access, could do nothing to prevent malicious users from accessing other users’ data.
Prior to the availability of Hadoop’s security features, the only way an organization could meet the requirement for data access protection was to run multiple distinct Hadoop clusters, and to segregate the groups who have network access to these clusters. This has obvious cost effectiveness implications, but, more importantly, limits the flexibility an organization has with respect to data storage options. One of the inherent powers of Hadoop is the ability to store and correlate all of an organization’s data. This is impossible if one must a priori relegate data to multiple distinct clusters based on security requirements. Furthermore, because of some organizations’ internal security policies, certain types of data could not be stored in Hadoop at all.
Today’s Hadoop World talk comes from Owen O’Malley and talks about some of the biggest challenges facing Hadoop: Security and API Compatibility.
Over the past several months, Yahoo! has been leading the charge in both areas. This work will enable wider use of Hadoop within Yahoo! as well as lower the barrier for new users – particularly those working with sensitive data. A big thanks to Yahoo! and everyone else in the community helping out.