Category Archives: Cloud

Apache Impala (incubating) vs. Amazon Redshift: S3 Integration, Elasticity, Agility, and Cost-Performance Benefits on AWS

Categories: Cloud Impala Performance

As measured across multiple dimensions (see analysis below), Impala provides a better cloud-native experience than Redshift for a number of common use cases.

Impala 2.6 brings read/write support on Amazon S3, which provides cloud capabilities such as direct querying of data from S3, elastic scaling of compute, and seamless data portability and flexibility that are unique amongst cloud-based analytic databases. With more and more users looking to deploy and run in public-cloud environments,

Read More

Apache Impala (Incubating) on Amazon: Performance and Cost Considerations for S3 vs. EBS

Categories: Cloud Impala Performance

The benchmark testing results detailed below can help you make an informed decision about AWS storage options for Impala.

In a recent post, you learned how Impala 2.6 on S3 delivers cloud-native features unmatched by other analytic databases in the cloud. With support to read/write data from Amazon S3, Impala provides cloud capabilities such as direct querying of data from S3, elastic scaling of compute, and seamless data portability and flexibility not found on other cloud-based analytic databases, 

Read More

Analytics and BI on Amazon S3 with Apache Impala (Incubating)

Categories: Cloud Impala Ops and DevOps Performance

Thanks to new optimizations for running Impala on Amazon S3, doubling cluster size on AWS doubles multi-user performance while keeping total workload cost roughly the same.

With public-cloud deployments becoming increasingly popular, Cloudera is continuing to build out the capabilities of its platform to best take advantage of the cost-effective and flexible nature of the cloud. The current release of Cloudera’s platform (5.8) includes a major step forward in that area with Impala 2.6 able to store and query data directly from the Amazon S3 object store.

Read More

Cloudera Navigator Optimizer Graduates from Beta, is Now Generally Available

Categories: Cloud Cloudera Navigator Hadoop

This new release includes, among other things, support for “slicing and dicing” workloads by user/application/report, workload breakdown by similar queries, and alerts for Apache Hive and Apache Impala (incubating) best practices.

Cloudera Navigator Optimizer enables database architects and database administrators (DBAs) to gain in-depth understanding of their SQL workloads running in data warehouse environments or on Apache Hadoop. Navigator Optimizer makes planning offload projects more predictable by assessing risk and reducing development costs.

Read More

What’s New in Cloudera Director 2.1?

Categories: CDH Cloud Cloudera Manager Hadoop

This new release contains, among other things, support for usage-based billing, deployments to Microsoft Azure, and deployments across providers or regions.

Cloudera Director is a manifestation of Cloudera’s commitment to provide a simple and reliable way to deploy, scale, and manage Apache Hadoop in the cloud of your choice. Cloudera Director enables you to deploy production-ready clusters for big data applications and successfully run workloads in the cloud. With Cloudera Director 2.1,

Read More