Guest blog post written by Adir Mashiach
In this post I’ll talk about the problem of Hive tables with a lot of small partitions and files and describe my solution in details.
A little background
In my organization, we keep a lot of our data in HDFS. Most of it is the raw data but a significant amount is the final product of many data enrichment processes.