Category Archives: Performance

Apache Impala (incubating) vs. Amazon Redshift: S3 Integration, Elasticity, Agility, and Cost-Performance Benefits on AWS

Categories: Cloud Impala Performance

As measured across multiple dimensions (see analysis below), Impala provides a better cloud-native experience than Redshift for a number of common use cases.

Impala 2.6 brings read/write support on Amazon S3, which provides cloud capabilities such as direct querying of data from S3, elastic scaling of compute, and seamless data portability and flexibility that are unique amongst cloud-based analytic databases. With more and more users looking to deploy and run in public-cloud environments,

Read more

Apache Impala (Incubating) on Amazon: Performance and Cost Considerations for S3 vs. EBS

Categories: Cloud Impala Performance

The benchmark testing results detailed below can help you make an informed decision about AWS storage options for Impala.

In a recent post, you learned how Impala 2.6 on S3 delivers cloud-native features unmatched by other analytic databases in the cloud. With support to read/write data from Amazon S3, Impala provides cloud capabilities such as direct querying of data from S3, elastic scaling of compute, and seamless data portability and flexibility not found on other cloud-based analytic databases, such as Amazon Redshift.

Read more

Resolving Java Lock Contention in Apache Solr: A Performance-Analysis Detective Story

Categories: Performance Search Testing

This case study is an instructive example of how performance analysis is a multi-faceted process that often leads one in surprising directions. 

Apache Solr Near Real Time (NRT)  Search allows Solr users to search documents indexed just seconds ago. It’s a critical feature in many real-time analytics applications. As Solr indexes more and more documents in near real time, end-user expectations for performance get higher and higher.

However,

Read more

Analytics and BI on Amazon S3 with Apache Impala (Incubating)

Categories: Cloud Impala Ops and DevOps Performance

Thanks to new optimizations for running Impala on Amazon S3, doubling cluster size on AWS doubles multi-user performance while keeping total workload cost roughly the same.

With public-cloud deployments becoming increasingly popular, Cloudera is continuing to build out the capabilities of its platform to best take advantage of the cost-effective and flexible nature of the cloud. The current release of Cloudera’s platform (5.8) includes a major step forward in that area with Impala 2.6 able to store and query data directly from the Amazon S3 object store.

Read more

New Study: Evaluating Apache HBase Performance on Modern Storage Media

Categories: Guest Hardware HBase Performance

For the first time, this new study by Intel software engineers analyzes the performance impact of using Apache HBase on various modern storage technologies.

As more “fast” storage technologies (such as SSD and NVMe SSD) emerge, organizations with big data use cases want to make better use of them to achieve better throughput and latency. But to this point, there have been no detailed analyses published about the true significance of that performance boost, nor about how to best mix fast and “slow”

Read more