HDFS Snapshot Best Practices

Introduction

The snapshots feature of the Apache Hadoop Distributed Filesystem (HDFS) enables you to capture point-in-time copies of the file system and protect your important data against corruption, user-, or application errors.  This feature is available in all versions of Cloudera Data Platform (CDP), Cloudera Distribution for Hadoop (CDH) and Hortonworks Data Platform (HDP). Regardless of whether you’ve been using snapshots for a while or contemplating their use, this blog gives you the insights and techniques to make them look their best.  

Using snapshots to protect data is efficient for a few reasons. First of all, snapshot creation is instantaneous regardless of the size and depth of the directory subtree. Furthermore snapshots capture the block list and file size for a specified subtree without creating extra copies of blocks on the file system. The HDFS snapshot feature is specifically designed to be very efficient for the snapshot creation operation as well as for accessing or modifying the current files and directories in the file system.  Creating a snapshot only adds a snapshot record to the snapshottable directory.  Accessing a current file or directory does not require processing any snapshot records, so there is no additional overhead. Modifying a current file/directory, when it is also in a snapshot, requires adding a modification record for each input path.  The trade-off is that some other operations, such as computing snapshot diffs can be very expensive. In the next couple of sections of this blog, we’ll first look at the complexity of various operations, and then we highlight the best practices that will help mitigate the overhead of these operations. 

Typical Snapshots

Let’s look at the time complexity or overheads dealing with different operations on snapshotted files or directories. For simplicity, we assume the number of modifications (m) for each file/directory is the same across a snapshottable directory subtree, where the modifications for each file/directory are the records generated by the changes (e.g. set permission, create a file/directory, rename, etc.) on that file/directory.

1- Taking a snapshot always takes the same amount of effort: it only creates a record of the snapshottable directory and its state at that time. The overhead is independent of the directory structure and we denote the time overhead as O(1)

2- Accessing a file or a directory in the current state is the same as without taking any snapshots.  The snapshots add zero overhead compared to the non-snapshot access.

3- Modifying a file or a directory in the current state adds no overhead to the non-snapshot access.  It adds a modification record in the filesystem tree for the modified path..

4- Accessing a file or a directory in a particular snapshot is also efficient – it has to traverse the snapshot records from the snapshottable directory down to the desired file/directory and reconstruct the snapshot state from the modification records.  The access imposes an overhead of O(d*m), where 

   d – the depth from the snapshotted directory to the desired file/directory 

   m – the number of modifications captured from the current state to the given snapshot.

5- Deleting a snapshot requires traversing the entire subtree and, for each file or directory, binary search the to-be-deleted snapshot.  It also collects blocks to be deleted as a result of the operation.  This results in an overhead of O(b + n log(m)) where 

   b – the number of blocks to be collected, 

   n – the number of files/directories under the snapshot diff path 

   m – the number of modifications captured from the current state to the to-be-deleted snapshot.

Note that deleting a snapshot only performs log(m) operations for binary searching the to-be-deleted snapshot but not for reconstructing it.

  • When n is large, the delete snapshot operation may take a long time to complete.  Also, the operation holds the namesystem write lock.  All other operations are blocked until it completes.
  • When b is large, the delete snapshot operation may require a large amount of memory for collecting the blocks.

6- Computing the snapshot diff between a newer and an older snapshot has to reconstruct the newer snapshot state for each file and directory under the snapshot diff path. Then the process has to compute the diff between the newer and the older snapshot.  This imposes and overhead of O(n*(m+s)), where 

   n – the number of files and directories under the snapshot diff path, 

   m – the number of  modifications captured from the current state to the newer snapshot 

   s – the number of snapshots between the newer and the older snapshots.  

  • When n*(m+s) is a large number, the snapshot diff operation may take a long time to complete.  Also, the operation holds the namesystem read lock.  All the other write operations are blocked until it completes.
  • When n is large, the snapshot diff operation may require a large amount of memory for storing the diff.

We summarize the operations in the table below:

Operation Overhead Remarks
Taking a snapshot O(1) Adding a snapshot record
Accessing a file/directory in the current state No additional overhead from snapshots. NA
Modifying a file/directory in the current state Adding a modification for each input path. NA
Accessing a file/directory in a particular snapshot O(d*m)
  1. d – the depth
  2. m – the #modifications
Deleting a snapshot O(b + n log(m))
  1. b – the #blocks collected
  2. n – the #files/directories
  3. m – the #modifications
Computing snapshot diff O(n(m+s))
  1. n – the #files/directories
  2. m – the #modifications
  3. s – the #snapshot in between

We provide best practice guidelines in the next section.

Best Practices to avoid pitfalls

Now that you are fully aware of the operational impact operations on snapshotted files and directories have, here are some key tips and tricks to help you get the most benefit from your HDFS Snapshot usage.

  • Don’t create snapshots at the root directory
    • Reason:
      • The root directory includes everything in the file system, including the tmp and the trash directories.  If snapshots are created at the root directory, the snapshots may contain many unwanted files.  Since these files are in some of the snapshots, they will not be deleted until those snapshots are deleted.
      • The snapshot policies must be uniform across the entire file system.  Some projects may require more frequent snapshots but some other projects may not.  However, creating snapshots at the root directory forces everything must have the same snapshot policy.  Also, different projects may have different timing for deleting their own snapshots.  As a result, it is easy to have an out-of-order snapshot deletion.  It may lead to a complicated restructuring of the internal data; see #6 below.
      • A single snapshot diff computation may take a long time since the number of operations is O(n(m+s)) as discussed in the previous section.
    • Recommended approach: Create snapshots at the project directories and the user directories.
  • Avoid taking very frequent snapshots
    • Reason: When taking snapshots too frequently, the snapshots may capture many unwanted transient files such as tmp files or files in trash.  These transient files occupy spaces until the corresponding snapshots are deleted.  The modifications for these files also increase the running time of certain snapshot operations as discussed in the previous section.
    • Recommended approach: Take snapshots only when required, for example only after jobs/workloads have completed in order to avoid capturing tmp files,  and delete the unneeded snapshots.
  • Avoid running snapshot diff when the delta is very large (multiple days/weeks/months of changes or containing more than 1 million changes)
    • Reason: As discussed in the previous section, computing snapshot diff requires O(n(m+s)) operations.  In this case, s is large.  The snapshot diff computation may take a long time.
    • Recommended approach: compute snapshot diff when the delta is small.
  • Avoid running snapshot diff for the snapshots that are far apart (e.g. diff between two snapshots taken a month apart). In such situations the diff is likely to be very large.
    • Reason: As discussed in the previous section, computing snapshot diff requires O(n(m+s)) operations.  In this case, m is large. The snapshot diff computation may take a long time.  Also, snapshot diff is usually for backup or synchronizing directories across clusters.  It is recommended to run the backup or synchronization for the newly created snapshots for the newly created files/directories.
    • Recommended approach: compute snapshot diff for the newly created snapshots.
  • Avoid running snapshot diff at the snapshottable directory
    • Reason: Computing for the entire snapshottable directory may include unwanted files such as files in tmp or trash directories.  Also, since computing snapshot diff requires O(n(m+s)) operations, it may take a long time when there are many files/directories under the snapshottable directory.  
    • Recommended approach: Make sure that the following configuration setting is enabled  dfs.namenode.snapshotdiff.allow.snap-root-descendant (default is true). This is available in all versions of CDP, CDH and HDP.  Then, divide a single diff computation on the snapshottable directory into several subtree computations.  Compute snapshot diffs only for the required subtrees.  Note that rename operations across subtrees will become delete-and-create in subtree snapshot diffs; see the example below.
Example: Suppose we have the following operation.

  1. Take snapshot s0 at /
  2. Rename /foo/bar/file to /sub/file
  3. Take snapshot s1 at /

When running diff at /, it will show the rename operation:

Difference between snapshot s0 and snapshot s1 under directory /:
M ./foo/bar

R ./foo/bar/file -> ./sub/file

M ./sub

When running diff at subtrees /foo and /sub, it will show the rename operation as delete-and-create:

Difference between snapshot s0 and snapshot s1 under directory /sub:

M .

+ ./file

Difference between snapshot s0 and snapshot s1 under directory /foo:

M ./bar

- ./bar/file

 

  • When deleting multiple snapshots, delete from the oldest to the newest.
    • Reason: Deleting snapshots in a random order may lead to a complicated restructuring of the internal data.  Although the known bugs (e.g. HDFS-9406, HDFS-13101, HDFS-15313, HDFS-16972 and HDFS-16975) are already fixed, deleting snapshots from the oldest to the newest is the recommended approach.
    • Recommended approach: To determine the snapshot creation order, use the hdfs lsSnapshot <snapshotDir> command, and then sort the output by the snapshot ID.  If snapshot A is created before snapshot B, the snapshot ID of A is smaller than the snapshot ID of B. The following is the output format of lsSnapshot<permission> <replication> <owner> <group> <length> <modification_time> <snapshot_id> <deletion_status> <path>
  • When the oldest snapshot in the file system is no longer needed, delete it immediately.
    • Reason: When deleting a snapshot in the middle, it may not be able to free up resources since the files/directories in the deleted snapshot may also belong to one or more earlier snapshots.  In addition, it is known that deleting the oldest snapshot in the file system will not cause data loss.  Therefore, when the oldest snapshot is no longer needed, delete it immediately to free up spaces.
    • Recommended approach: See 6b for how to determine the snapshot creation order.

Summary

In this blog, we have explored the HDFS Snapshot feature, how it works, and the impact various file operations in snapshotted directories have on overheads. To help you get started, we also highlighted several best practices and recommendations in working with Snapshots to draw out the benefits with minimal overheads. 

For more information about using HDFS Snapshots, please read the Cloudera Documentation

on the subject. Our Professional Services, Support and Engineering teams are available to share their knowledge and expertise with you to implement Snapshots effectively. Please reach out to your Cloudera account team or get in touch with us here

Tsz Sze
Principal Engineer I
More by this author

Leave a comment

Your email address will not be published. Links are not permitted in comments.