Recent improvements to Apache Hadoop’s native backup utility, which are now shipping in CDH, make that process much faster. DistCp is a popular tool in Apache Hadoop for periodically backing up data across and within clusters. (Each run of DistCp in the backup process is referred to as a backup cycle.) Its popularity has grown […]