Cloudera Developer Blog · Cloudera Manager Posts
Some things for which we are thankful, the 2013 edition (not listed in order):
1. The entire Apache Hadoop community for its constant and hard work to Make the Platform Better,
2. Cloudera’s users, customers, and partners for their continual and helpful feedback to help guide us through #1,
Cloudera Manager 4.7 added support for managing Cloudera Search 1.0. Thus Cloudera Manager users can easily deploy all components of Cloudera Search (including Apache Solr) and manage all related services, just like every other service included in CDH (Cloudera’s distribution of Apache Hadoop and related projects).
In this how-to, you will learn the steps involved in adding Cloudera Search to a Cloudera Enterprise (CDH + Cloudera Manager) cluster.
Installing the SOLR Parcel
In our example, the cluster uses a CDH 4.4 parcel and is running Apache ZooKeeper, HDFS, and Apache HBase services. (Parcels are a really useful way to deploy new software and do painless upgrades via Cloudera Manager.)
We are pleased to announce the beta release of Cloudera Enterprise 5 (CDH 5 and Cloudera Manager 5). This release has both Cloudera Impala and Cloudera Search integrated into CDH. It also includes many new features and updated component versions including the ones below:
I’ve always held a strong bias that education is most effective when the student learns by doing. As a developer of technical curricula, my goal is to have training participants engage with real and relevant problems as much as possible through hands-on exercises. The high rate at which Apache Hadoop is changing, both as a technology and as an ecosystem, makes developing Cloudera training courses not only demanding but also seriously fun and rewarding.
I recently undertook the challenge of upgrading the Cloudera Administrator Training for Apache Hadoop. I more than quadrupled the amount of hands-on exercises from the previous version, adding a full day to the course. At four days, it’s now the most thorough training for Hadoop administrators and truly the best way to start building expertise.
While developing the course, I collaborated with some of the most knowledgeable Hadoop administrators I could find, including Eric Sammer, Amandeep Khurana, Kathleen Ting, Romain Rigaux, and many other smart folks at Cloudera. The upgrades to the curriculum and exercises are based on best practices used to resolve our customers’ biggest problems. These insights resonate throughout the course, including the determination that administrators should learn installation, configuration, maintenance, monitoring, and troubleshooting using the standard Hadoop tools. Although we certainly hope that Hadoop users take advantage of Cloudera Manager to simplify and streamline many of these tasks, we believe that every good administrator needs to first take a look under the hood and tinker with Hadoop’s guts. There’s no replacement for get-your-hands-dirty experience to achieve expertise.
Cluster in the Cloud
Cloudera Manager 4.7 is an update to Cloudera Manager 4 and contains a number of bug fixes and usability improvements. Furthermore, we have introduced new features such as:
StackIQ takes a “software defined infrastructure” approach to provision and manage cluster infrastructure that sits below Big Data platforms such as Apache Hadoop. In the guest post below, StackIQ co-founder and VP Engineering Greg Bruno explains how to install Cloudera Standard on top of StackIQ’s management system so they can work together.
The hardware used for this deployment is a small cluster: one node (i.e. one server) for the StackIQ Cluster Manager and four nodes as backend/data nodes. Each node has two disks and all nodes are connected via 1Gb Ethernet on a Private Network. The Cluster Manager node is also connected to a Public Network using its second NIC. (StackIQ Cluster Manager is used in similar deployments between two nodes and 4,000+ nodes in size.)
Step 1. Install StackIQ Cluster Manager
The following guest post is re-published here courtesy of Gerd König, a System Engineer with YMC AG. Thanks, Gerd!
Cloudera Manager is a great tool to orchestrate your CDH-based Apache Hadoop cluster. You can use it from cluster installation, deploying configurations, restarting daemons to monitoring each cluster component. Starting with version 4.6, the manager supports the integration of Cloudera Search, which is currently in Beta state. In this post I’ll show you the required steps to set up a Hadoop cluster via Cloudera Manager and how to integrate Cloudera Search.
Furthermore, I want to introduce a small level of automation using the tool Ansible, to avoid performing the same manual steps again and again – thereby being able to replay them whenever you want, or to just use it for any upcoming cluster installations.
Cloudera’s new Parcels installation format has been released, and I’m excited to highlight just how useful (and mind-blowingly cool) it is to system administrators and anyone responsible for maintaining a CDH cluster.
If you haven’t read about or played with Parcels, they make components of the distribution significantly easier to manage, install, and upgrade. The new Parcel distribution format works with Cloudera Manager 4.5 and later. When you perform installations and upgrades using Parcels, you get access to new Cloudera Manager features such as:
The following guest post, from Mike Pittaro of Dell’s Cloud Software Solutions team, describes his team’s use of the Dell Crowbar tool in conjunction with the Cloudera Manager API to automate cluster provisioning. Thanks, Mike!
Deploying, managing, and operating Apache Hadoop clusters can be complex at all levels of the stack, from the hardware on up. To hide this complexity and reduce deployment time, since 2011, Dell has been using Dell Crowbar in conjunction with Cloudera Manager to deploy the Dell | Cloudera Solution for Apache Hadoop for joint customers.
Cloudera Manager does a great job of deploying and managing the Hadoop layers of a cluster, but it depends on an operating system to be in place first. Meanwhile, to complement those capabilities, Dell Crowbar is a complete automated operations platform, designed to deploy layers of infrastructure on bare-metal servers and all the way up the stack.
This is a great day for technical end-users – developers, admins, analysts, and data scientists alike. Starting now, Cloudera complements its traditional mailing lists with a new, feature-rich community forums intended for users of Cloudera’s Platform for Big Data! (Login using your existing credentials or click the link to register.)
Although mailing lists have long been a standard for user interaction, and will undoubtedly continue to be, they have flaws. For example, they lack structure or taxonomy, which makes consumption difficult. Search functionality is often less than stellar and users are unable to build reputations that span an appreciable period of time. For these reasons, although they’re easy to create and manage, mailing lists inherently limit access to knowledge and hence limit adoption.
The new service brings key additions to the conversation: functionality, search, structure and scalability. It is now considerably easier to ask questions, find answers (or questions to answer), follow and share threads, and create a visible and sustainable reputation in the community. And for Cloudera customers, there’s a bonus: your questions will be escalated as bonafide support cases under certain circumstances (see below).