One of the principal features used in analytic databases is table partitioning. This feature is so frequently used because of its ability to significantly reduce query latency by allowing the execution engine to skip reading data that is not necessary for the query. For example, consider a table of events partitioned on the event time using calendar day granularity. If the table contained 2 years of events and a user wanted to find the events for a given 7-day window,
Apache Hadoop’s security was designed and implemented around 2009, and has been stabilizing since then. However, due to a lack of documentation around this area, it’s hard to understand or debug when problems arise. Delegation tokens were designed and are widely used in the Hadoop ecosystem as an authentication method. This blog post introduces the concept of Hadoop Delegation Tokens in the context of Hadoop Distributed File System (HDFS) and Hadoop Key Management Server (KMS),
Tools like Apache Spark bring scale to machine learning, and Cloudera Data Science Workbench brings Spark to data scientists. What happens when a data scientist wants to burst into the cloud to forge models at scale? Cloudera Altus, that’s what.
We’ve heard it a hundred times: big data is here, software is free and open,
Today, we’re really excited to announce the latest innovation from Cloudera and Informatica’s partnership. Companies are increasingly moving their data operations into the cloud. With both companies focusing on helping customers derive business insights out of vast amounts of data, our new joint offering will dramatically simplify leveraging cloud-native infrastructures for big data analytics.
Last May, Cloudera announced Cloudera Altus, a new platform-as-a-service (PaaS) offering in the cloud for big data analytics,
Modeling EHR Data in Healthcare
In this case study, we take a look at modeling electronic health record (EHR) data with deep learning and Deeplearning4j (DL4J). We draw inspiration from recent research showing that carefully designed neural network architectures can learn effectively from the complex, messy data collected in EHRs. Specifically, we describe how to train an long short-term memory recurrent neural network (LSTM RNN) to predict in-hospital mortality among patients hospitalized in the intensive care unit (ICU).