Category Archives: Guest

Building a Data Science Portfolio: Storytelling with Data

Categories: Data Science Guest

The following post by Vik Paruchuri, founder of data science learning platform Dataquest, offers some detailed and instructive insight about data science workflow (regardless of the tech stack involved, but in this case, using Python). We re-publish it here for your convenience.

Data science companies are increasingly looking at portfolios when making hiring decisions. One of the reasons for this is that a portfolio is the best way to judge someone’s real-world skills.

Read More

New Study: Evaluating Apache HBase Performance on Modern Storage Media

Categories: Guest Hardware HBase Performance

For the first time, this new study by Intel software engineers analyzes the performance impact of using Apache HBase on various modern storage technologies.

As more “fast” storage technologies (such as SSD and NVMe SSD) emerge, organizations with big data use cases want to make better use of them to achieve better throughput and latency. But to this point, there have been no detailed analyses published about the true significance of that performance boost,

Read More

How-to: Process and Index Medical Images with Apache Hadoop and Apache Solr

Categories: CDH Guest Search Use Case ZooKeeper

Thanks to Karthik Vadla, Abhi Basu, and Monica Martinez-Canales of Intel Corp. for the following guest post about using CDH for cost-effective processing/indexing of DICOM (medical) images.

Medical imaging has rapidly become the best non-invasive method to evaluate a patient and determine whether a medical condition exists. Imaging is used to assist in the diagnosis of a condition and, in most cases, is the first step of the journey through the modern medical system.

Read More

How-to: Build a Prediction Engine using Spark, Kudu, and Impala

Categories: Guest Impala Kudu Spark

Thanks to Richard Williamson of Silicon Valley Data Science for allowing us to republish the following post about his sample application based on Apache Spark, Apache Kudu (incubating), and Apache Impala (incubating).

Why should your infrastructure maintain a linear growth pattern when your business scales up and down during the day based on natural human cycles? There is an obvious need to maintain a steady baseline infrastructure to keep the lights on for your business,

Read More

The Barclays Data Science Hackathon: Using Apache Spark and Scala for Rapid Prototyping

Categories: Guest Spark Use Case

In this guest post, members of the Barclays Advanced Data Analytics Team describe the results of an offsite hackathon to develop a recommendation system using Apache Spark.

In the depths of the cold, wet British winter, the Advanced Data Analytics team from Barclays escaped to a villa on Lanzarote, Canary Islands, for a week to collaboratively solve a key business problem: how to design a better customer experience. We framed the problem in the context of using customer shopping behavior data to build a personalized recommender system.

Read More