Data analytics is increasingly being brought to bear to treat human disease, but as more and more health data is stored in computer databases, one significant challenge is how to perform analyses across these disparate databases. In this post I take a look at the Observational Health Data Sciences and Informatics (or OHDSI, pronounced “Odyssey”) program that was formed to address this challenge, and which today accounts for 1.26 billion patient records collectively stored across 64 databases in 17 countries.
Apache Hadoop’s security was designed and implemented around 2009, and has been stabilizing since then. However, due to a lack of documentation around this area, it’s hard to understand or debug when problems arise. Delegation tokens were designed and are widely used in the Hadoop ecosystem as an authentication method. This blog post introduces the concept of Hadoop Delegation Tokens in the context of Hadoop Distributed File System (HDFS) and Hadoop Key Management Server (KMS),
Five years ago, Cloudera shared with the world our plan to transfer the lessons from decades of relational database research to the Apache Hadoop platform via a new SQL engine — Apache Impala — the first and fastest open source MPP SQL engine for Hadoop. Impala enabled SQL users to operate on vast amounts of data in open formats, stored on HDFS originally (with Apache Kudu, Amazon S3, and Microsoft ADLS now also native storage options),
Each year in early November, my inbox fills up with people asking advice about certification. Some are reflecting on their careers and looking to move on or move up; others have given themselves or their managers the goal of getting certified this year. They awake one morning in early November and realize the clock is ticking.
The first thing they ask for is a discount, of course. Beyond that, they want to know what a certification is going to do for them more generally,
Azure Data Lake Store (ADLS) is a highly scalable cloud-based data store that is designed for collecting, storing and analyzing large amounts of data, and is ideal for enterprise-grade applications. Data can originate from almost any source, such as Internet applications and mobile devices; it is stored securely and durably, while being highly available in any geographic region. ADLS is performance-tuned for big data analytics and can be easily accessed from many components of the Apache Hadoop ecosystem,