Phil Langdale is a software engineer at Cloudera and the technical lead for Clouderas SCM Express product.
What is SCM Express?
As powerful and useful as Apache Hadoop is, anyone who has setup up a cluster from scratch is well aware of how challenging it can be: every machine has to have the right packages installed and correctly configured so that they can all work together, and if something goes wrong in that process, it can be even harder to nail down the problem.
Understandably, this can be a serious barrier to adoption. After all, how can you appreciate how great Apache Hadoop is if you never managed to get it set up in the first place? At Cloudera, we didn’t want to see anyone stopped before they could even start.
The Service and Configuration Manager (SCM) is a new part of Cloudera Management Suite in Cloudera Enterprise 3.5 that allows administrators to manage their Hadoop installations from a central console with just a few clicks of a mouse. It makes it easy to create and modify service installations and ensures that all the machines in a cluster are correctly and consistently configured.
This already provides a significant benefit to Hadoop administrators, however we wanted to take things even further and provide a tool that allows someone with no Hadoop experience to start up a fully functional cluster in a matter of minutes. We also wanted it to be freely available so that anyone can experience why Hadoop is so great.
So, SCM Express was born: from a single 500K download, you can bring up a Hadoop cluster of up to 50 nodes without editing a single configuration file, or even having to know what a Hadoop configuration file looks like. Additionally the cluster can be tuned to reflect the hardware on which it is running, so that it’s not just functional, but useful. We’ve codified many of the best practices and recommendations from our Solutions Architects so that the services you deploy can immediately benefit from their experiences and insights.
And it’s not just Hadoop; SCM Express can also install and manage Apache Zookeeper, Apache HBase and Hue. Hadoop is an ecosystem and not just a single product, so SCM Express lets you experience some of the breadth of that ecosystem.
How it works
When you download SCM Express for the first time, you get a small self-executing installer that will go through the process of installing the SCM Server. It sets up a package repository that’s appropriate for your Linux distribution and then installs the SCM Server from there. This will also allow you to download updates, just as you would for anything else installed on the machine.
Once the server is up and running, it provides a web-based user interface that walks through the process of identifying the hosts that you want in the cluster and then installing the necessary Apache Hadoop packages on them. In this way, you don’t need to do any manual work on those machines. As with the SCM Server, we install CDH (Clouderas Distribution Including Apache Hadoop, which is a packaged and tested distribution of open source Apache Hadoop and Apache ecosystem components) from our package repository, so that it too can be easily updated. The whole installation process is package based, so it’s easy to maintain in the long term.
After the cluster hosts have been identified and CDH is installed on them, SCM will create the services you select. At this time, it evaluates the physical characteristics of the hosts to decide which ones are best suited for which roles (which one should run the HDFS NameNode or the MapReduce JobTracker?) It also factors the size of the cluster into these calculations (for a small cluster, it makes sense to run the NameNode and JobTracker on the same machine, but for a large one, they should be separated). It will also use these physical characteristics to inform the configuration of the created services (the java heap size should reflect the amount of physical RAM in the machine, and the number of mappers and reducers should reflect the number of CPU cores).
Once the services are created, it will go through the process of bringing the services up for the first time. This isn’t always a simple matter of just starting processes; you have to format an HDFS filesystem before you can use it, for example.
When all that is done, your services are running and ready to go, and you’re also ready to appreciate the benefits that SCM Express provides in helping you maintain your newly deployed Hadoop cluster.
If you think you’re ready to take the plunge and upgrade to Cloudera Enterprise, it’s easy to switch over from SCM Express to full SCM; all your data and configuration carry over in-place.
We’re really proud that we’re able to offer SCM Express to the world. Apache Hadoop is an incredibly powerful tool for solving all sorts of problems and answering all kinds of questions from all your data, and now anyone can install it and experience it for themselves.