Summer’s here and perhaps you’ve planned a trip abroad. Allow me to share with you my top travel tip: save a photo of your passport to your smartphone before you leave. Make sure it’s backed up to the cloud and send a copy to someone you trust, just in case. This way if you ever lose your passport or it gets stolen, you can always get to a copy and that will greatly simplify getting a replacement so you can continue your travels. Because: no passport, no travel.
As more and more organisations rely on their data to give them the business insight they need, they face very much the same problem: should they lose their data, they lose their ability to gain business value from it. No data, no insight. Fundamentally, the solution is similar to the one for the passport: have a backup.
In its simplest form, Murphy’s Law states that anything that can go wrong, will go wrong. The law does not discriminate and just like it applies to queues (the one you’re in always moves slowest), it also applies to the data in your data centers. Hardware will fail, humans will make errors, and you will suffer a cyber attack. One way or another, you will lose data; the chances of it happening are a staggering 1 in 3.
A case in point is Amazon’s inadvertent shutdown of a larger than intended number of servers in their S3 service. A simple typo brought the internet to its knees, and the relatively short outage (just over 4h) cost the combined S&P 500 organisations $150m.
The consequential losses of data disasters range anywhere from $50k to $5m and include lost productivity, lost customer confidence, and simply having to rekey or recreate data. Of those companies that suffer a complete, catastrophic data loss, 1 in 20 don’t recover. Ever.
The chances of and challenges around data loss are only looking to get bigger in the future. Data volumes continue to grow as do the number of sources that create the data (think edge devices). Enterprises deploy hybrid infrastructures spanning data centres on-premises and in the cloud with not only processing but also data moving between them as needed. Lastly, national and international regulations like GDPR or CCPA compel organisations to protect their data and prevent its loss or run the risk of large fines for non-compliance.
On the face of it, safeguarding against data loss is as straightforward as making a copy in a safe location. However, in a world of lots of data and storage costs not equal to zero, it becomes a business decision to determine which data is expendable in case of loss and which is critical for business continuity. Furthermore, internal as well as external regulations stipulate rules around data accessibility and visibility; as data is copied, so should these rules so that security and governance can be restored alongside the data if need be.
What looked so straightforward initially is now like going through the looking glass when considered in big data platforms. Data is held in different clusters that may be on different infrastructures and in various stores like HDFS and Hive. There are considerations on where data should be replicated to (a backup data center or public cloud) and whether it should or should not be encrypted at rest and in motion. Should the ultimate copy be active or considered a stand by? What impact does the data replication have on the network bandwidth between your clusters? And crucially: replicating data alone is not enough; how are you synchronising identity and access policies or safeguarding data lineage?
All these deliberations make a manual, DIY approach to replicating big data unfeasible. Time to development is long, you yourself become responsible for future proofing and unless you’re an expect, user experience is likely going to be less brilliant.
Cloudera provides the perfect data backup and disaster recovery solution. Enterprise production strength and ready to go, it provides everything you need to safeguard your critical data assets, including associated data security and governance policies. Supporting on-premises as well as cloud replication to a host of providers, a single interface gives you complete control and insight for any clusters and data sources you need to manage.
Whichever Cloudera platform you are running, backup and disaster recovery of the data you use to get your critical business insight is part and parcel of it. For CDH distributions, BDR is available as part of Cloudera Manager. For HDP, Data Lifecycle Manager 1.5 is now available, bringing long anticipated new capabilities like Hive ACID table replication for guaranteed consistency, advanced conflict resolution for seamless handling of overlapping Hive and HDFS replications, and performance improvements through better handling of file updates in case of HDFS replications.
Make sure your Cloudera platform is part of your wider enterprise disaster recovery plan; our experts are happy to guide you. And while you’re planning for the unexpected: make sure you have a backup of your passport for when you go traveling!