No Data Loss and No Service Interruption – HDF to CFM Rolling Migration

No Data Loss and No Service Interruption – HDF to CFM Rolling Migration

The blog “Migrating Apache NiFi Flows from HDF to CFM with Zero Downtime” detailed how many common NiFi dataflows can be easily migrated when the Hortonworks DataFlow and Cloudera Flow Management clusters are running side-by-side. But what if you lack the resources to run multiple NiFi clusters concurrently?  Not a problem. Rolling migration — decommissioning your HDF NiFi nodes and recycling them for use in a CFM NiFi cluster is an alternative solution to migrate these NiFi dataflows with the same no data loss/no service interruption advantages of a side-by-side migration.

Overview

An HDF to CFM rolling migration assumes the following:

  • The HDF NiFi cluster has enough nodes for the procedure. Nodes will be removed from HDF and converted to CFM cluster nodes in stages. As a result, the data processing workload of the HDF cluster needs to be handled temporarily by the subset of nodes still connected to the cluster.
  • The ZooKeeper service supporting the HDF NiFi cluster is not included in the migration. There is a version change in ZooKeeper from HDF to CFM, so a new ZooKeeper service must be configured.

In this article, we cover the rolling migration of a four node HDF NiFi cluster running the three use cases from “Migrating Apache NiFi Flows from HDF to CFM with Zero Downtime”.

HDF to CFM rolling migration

It is confirmed that two nodes can adequately process the flows during a brief scheduled migration period. As such, the overall migration plan is to:

  • Decommission two of the HDF nodes, so the dataflows can continue to run on the remaining two nodes in the HDF cluster
  • Repurpose those two decommissioned HDF nodes to create a new two node CFM NiFi cluster
  • Migrate the HDF dataflows to run in CFM
  • Decommission the two remaining HDF nodes and add them to CFM, resulting in the dataflows running on a four node CFM NiFi cluster

Step 1: Decommission First Set of NiFi Nodes in HDF

Determine the first set of HDF NiFi nodes to use for the CFM NiFi cluster. Ensure there are no critical services co-located on the chosen HDF nodes as they will be completely removed before use in the CFM cluster.

The HDF cluster has nodes 3, 4, 5 and 6 and the decision is to decommission Nodes 3 and 4 first. Connect to the NiFi UI (via Node 5 or 6) and open the NiFi Cluster window from the Global Menu. Perform the following actions on Node 3:

1. Select Disconnect

Node 3 is disconnected from the cluster.

Alternatively, use the NiFi REST API to perform the disconnect. For example:

# more disconnect_node.json
{
"node": {"nodeId": "3317d62e-b104-4a69-992c-41552e07485c", "status": "DISCONNECTING"}
}
# curl -H "Content-type: application/json" -XPUT http://hostname-000005:9090/nifi-api/controller/cluster/nodes/3317d62e-b104-4a69-992c-41552e07485c -d @disconnect_node.json -vv 

2. Select Offload

This will stop and terminate all processors on the disconnected node (Node 3) and rebalance flowfiles to the connected nodes in the cluster (Nodes 4, 5 and 6).

Alternatively, use the NiFi REST API to perform the offload. For example:

# more offload_node.json
{
"node": {"nodeId": "3317d62e-b104-4a69-992c-41552e07485c", "status": "OFFLOADING"}
}
# curl -H "Content-type: application/json" -XPUT http://hostname-000005:9090/nifi-api/controller/cluster/nodes/3317d62e-b104-4a69-992c-41552e07485c -d @offload_node.json -vv 

3. Select Delete

Node 3 is deleted from the cluster.

Alternatively, use the NiFi REST API to perform the delete. For example:

# curl -H "Content-type: application/json" -XDELETE http://hostname-000005:9090/nifi-api/controller/cluster/nodes/3317d62e-b104-4a69-992c-41552e07485c -vv 

Disconnect, Offload and Delete NiFi Cluster Node

Repeat the steps to disconnect, offload and delete Node 4.

The HDF NiFi cluster now has two nodes (Nodes 5 and 6):

HDF to CFM rolling migration

To complete the decommissioning process, for each node deleted from the HDF NiFi Cluster (Nodes 3 and 4):

  1. Stop and delete NiFi via Ambari
  2. Stop the Ambari agent with the command service ambari-agent stop 
  3. Delete the host from the Ambari cluster

Stop/Delete NiFi and Delete Host

Note: As shown in the video, other components may also need to be stopped in order to delete the host.

Step 2: Confirm Flows on HDF NiFi Cluster

Verify the two-node HDF NiFi cluster is properly handling its data processing responsibilities.

HDF to CFM rolling migration

Step 3: Recycle the First Set of HDF Nodes for CFM

Prerequisites:

  • Clean each host and perform any desired maintenance such as OS upgrades, change hardware specs, etc.
  • Confirm the CDP Private Cloud Base cluster has been prepared for CFM deployment.  Specifically, in Cloudera Manager, make sure the CFM parcel has been downloaded, distributed and activated.  Additionally, verify the NiFi CSD has been added to the Cloudera Manager host. Finally, add any dependent services that NiFi requires such as ZooKeeper.

With the former HDF nodes cleaned and prepped:

  1. Add them as hosts to Cloudera Manager
  2. Add the hosts to the CDP Cluster
  3. Add the NiFi service, selecting Nodes 3 and 4 for the NiFi nodes

Add Nodes to Cloudera Manager, Add Nodes to Cluster, Install NiFi

Step 4: Perform the Migration of HDF flows to CFM

Migrate the HDF flows to the CFM NiFi cluster. Refer to “Migrating Apache NiFi Flows from HDF to CFM with Zero Downtime” for details.

Step 5: Confirm Flows on CFM NiFi Cluster

Verify the two-node CFM NiFi cluster is properly handling its data processing responsibilities.

HDF to CFM rolling migration

Step 6: Decommission the Remaining HDF NiFi Nodes

Confirm all of the source processors in the HDF NiFi cluster have been stopped and all the data has been drained out. It is now safe to stop and delete the NiFi service from the remaining HDF nodes (Nodes 5 and 6). Stop the Ambari agent on each node and delete both hosts from the Ambari cluster. Clean and prep the hosts.

Step 7: Recycle the Second Set of HDF Nodes in CFM

Repeat the steps described earlier to add Nodes 5 and 6 to Cloudera Manager and then to the CDP cluster. These two hosts can now be added to the CFM NiFi cluster.

With the former HDF nodes cleaned and prepped:

  1. On the NiFi service page, select “Add Role Instances” from the Actions menu
  2. Select Nodes 5 and 6 as hosts for NiFi to be installed on them
  3. Start NiFi on both hosts for them to join the CFM cluster

Add Role Instances to NiFi and Start New NiFi Nodes

The original four node HDF cluster and its flows have now been fully migrated to a new four node CFM cluster.

HDF to CFM rolling migration

Conclusion

There has never been a better time to upgrade from HDF to CFM to get the latest and greatest NiFi. Rolling migration is a great option to consider if your organization wants to upgrade but has resource constraints. This method of recycling your hardware is a cost effective solution that can be implemented with minimal disruption to your mission-critical dataflows.

Excited to learn more? Cloudera Professional Services is here to help implement and optimize your data-driven use cases and chart a successful migration path from HDF to CFM.

Andrew Lim
Software Engineer
More by this author

Leave a comment

Your email address will not be published. Links are not permitted in comments.