Thanks to Richard Williamson of Silicon Valley Data Science for allowing us to republish the following post about his sample application based on Apache Spark, Apache Kudu (incubating), and Apache Impala (incubating).
Why should your infrastructure maintain a linear growth pattern when your business scales up and down during the day based on natural human cycles? There is an obvious need to maintain a steady baseline infrastructure to keep the lights on for your business,
In this guest post, members of the Barclays Advanced Data Analytics Team describe the results of an offsite hackathon to develop a recommendation system using Apache Spark.
In the depths of the cold, wet British winter, the Advanced Data Analytics team from Barclays escaped to a villa on Lanzarote, Canary Islands, for a week to collaboratively solve a key business problem: how to design a better customer experience. We framed the problem in the context of using customer shopping behavior data to build a personalized recommender system.
In this post, engineers from Wargaming.net, the online game developer and publisher, describe the design of their real-time recommendation engine built on CDH.
The scope of activities at Wargaming.net extends far beyond the development of games. We work on dozens of internal projects simultaneously, and our Data-driven Real-time Rules Engine (DDRRE) is among the most ambitious.
DDRRE is a system that analyzes large amounts of data in real time,
Recently, GoDataDriven installed a Cloudera cluster on Microsoft Azure. This two-part blog post, written by Alexander Bij and Tünde Alkemade and republished with permission, provides information about use case, implemented design, installation.
In the first post we discussed some information about the use case, the design and some basic information about Microsoft Azure. We showed some options how you can install Cloudera on Azure and what best practices we saw when installing a distributed system on Azure.
In this guest post, Deenar Toraskar, founder of risk-analytics solution provider Think Reactive and a contributor to Spark, describes why new requirements for agile, self-service, and VaR reporting help make the case for building out new analytic infrastructure on the Apache Hadoop ecosystem.
As described previously in this post, Value at Risk (VaR) is a popular risk measure used for risk management,