Cloudera Developer Blog · Ops And DevOps Posts
Oozie’s new HA qualities help cluster operators sleep well at night. Here’s how it works.
One of the big new features in CDH 5 for Apache Oozie is High Availability (HA). In designing this feature, the Oozie team at Cloudera had two main goals: 1) Don’t change the API or usage patterns, and 2) the user shouldn’t even have to know that HA is enabled. In other words, we wanted Oozie HA to be as easy and transparent as possible.
Understanding how checkpointing works in HDFS can make the difference between a healthy cluster or a failing one.
Checkpointing is an essential part of maintaining and persisting filesystem metadata in HDFS. It’s crucial for efficient NameNode recovery and restart, and is an important indicator of overall cluster health. However, checkpointing can also be a source of confusion for operators of Apache Hadoop clusters.
Cloudera’s own enterprise data hub is yielding great results for providing world-class customer support.
Here at Cloudera, we are constantly pushing the envelope to give our customers world-class support. One of the cornerstones of this effort is the Cloudera Support Interface (CSI), which we’ve described in prior blog posts (here and here). Through CSI, our support team is able to quickly reason about a customer’s environment, search for information related to a case currently being worked, and much more.
This FAQ contains answers to the most frequently asked questions about the architecture and configuration choices involved.
In December 2013, Cloudera and Amazon Web Services (AWS) announced a partnership to support Cloudera Enterprise on AWS infrastructure. Along with this announcement, we released a Deployment Reference Architecture Whitepaper. In this post, you’ll get answers to the most frequently asked questions about the architecture and the configuration choices that have been highlighted in that whitepaper.
More and more customers are using automation/configuration management frameworks alongside Cloudera Manager.
As Apache Hadoop clusters continue to grow in size, complexity, and business importance as the foundational infrastructure for an Enterprise Data Hub, the use cases for a robust and mature management console expand.
Learn the new features and enhancements in Cloudera Manager 5, including support for YARN, management of third-party apps and frameworks, and more.
The response to the Oct. 2013 release of Cloudera Enterprise 5 Beta has been overwhelming, and Cloudera is busily working closely with several customers to incorporate their feedback.
Get an overview of the available mechanisms for backing up data stored in Apache HBase, and how to restore that data in the event of various data recovery/failover scenarios
With increased adoption and integration of HBase into critical business systems, many enterprises need to protect this important business asset by building out robust backup and disaster recovery (BDR) strategies for their HBase clusters. As daunting as it may sound to quickly and easily backup and restore potentially petabytes of data, HBase and the Apache Hadoop ecosystem provide many built-in mechanisms to accomplish just that.
StackIQ takes a “software defined infrastructure” approach to provision and manage cluster infrastructure that sits below Big Data platforms such as Apache Hadoop. In the guest post below, StackIQ co-founder and VP Engineering Greg Bruno explains how to install Cloudera Enterprise on top of StackIQ’s management system so they can work together.
The hardware used for this deployment is a small cluster: one node (i.e. one server) for the StackIQ Cluster Manager and four nodes as backend/data nodes. Each node has two disks and all nodes are connected via 1Gb Ethernet on a Private Network. The Cluster Manager node is also connected to a Public Network using its second NIC. (StackIQ Cluster Manager is used in similar deployments between two nodes and 4,000+ nodes in size.)
The following guest post is re-published here courtesy of Gerd König, a System Engineer with YMC AG. Thanks, Gerd!
Cloudera Manager is a great tool to orchestrate your CDH-based Apache Hadoop cluster. You can use it from cluster installation, deploying configurations, restarting daemons to monitoring each cluster component. Starting with version 4.6, the manager supports the integration of Cloudera Search, which is currently in Beta state. In this post I’ll show you the required steps to set up a Hadoop cluster via Cloudera Manager and how to integrate Cloudera Search.
At Cloudera, we believe that Cloudera Manager is the best way to install, configure, manage, and monitor your Apache Hadoop stack. Of course, most users prefer not to take our word for it — they want to know how Cloudera Manager works under the covers, first.
In this post, I’ll explain some of its inner workings.