One of the more confusing topics in Hadoop is how authorization and authentication work in the system. The first and most important thing to recognize is the subtle, yet extremely important, differentiation between authorization and authentication, so let’s define these terms first:
Authentication is the process of determining whether someone is who they claim to be.
Authorization is the function of specifying access rights to resources.
In simpler terms,
What is Kerberos & SPNEGO?
Kerberos is an authentication protocol that provides mutual authentication and single sign-on capabilities.
SPNEGO is a plain text mechanism for negotiating authentication protocols between peers; one notable application of this is Kerberos authentication over HTTP.
What is Alfredo?
Alfredo is an Open Source Java library providing support for Kerberos HTTP SPNEGO authentication.
Post written by Cloudera Software Engineer Aaron T. Myers.
Apache Hadoop has had methods of doing user authorization for some time. The Hadoop Distributed File System (HDFS) has a permissions model similar to Unix to control file and directory access, and MapReduce has access control lists (ACLs) per job queue to control which users may submit jobs. These authorization schemes allow Hadoop users and administrators to specify exactly who may access Hadoop’s resources.
Today’s Hadoop World talk comes from Owen O’Malley and talks about some of the biggest challenges facing Hadoop: Security and API Compatibility.
Over the past several months, Yahoo! has been leading the charge in both areas. This work will enable wider use of Hadoop within Yahoo! as well as lower the barrier for new users – particularly those working with sensitive data. A big thanks to Yahoo! and everyone else in the community helping out.