Using a Big Data Platform to Unlock the Potential of Cloud Object Storage

Using a Big Data Platform to Unlock the Potential of Cloud Object Storage

Cloud object storage must offer more to data professionals than an information store does. To make the most of these vessels, your business must be able to find, use, and exploit information quickly—and that's why the right big data platform is crucial.

This post was published on Hortonworks.com before the merger with Cloudera. Some links, resources, or references may no longer be valid.

Companies continue to push more and more data to the cloud. And employees within different organizations across the same business are increasingly using different storage for different use cases and running various workloads on a range of platforms. It’s a modern CIO’s new nightmare.

Cloud object storage, such as Amazon S3 (Simple Storage Services), Microsoft WASB (Windows Azure Storage Blob), and GCS (Google Cloud Storage) offer high-availability access to big data computing power and—crucially, given scalable demands for resources—does so at an affordable cost. This approach is often so successful that almost three-quarters (72 percent) of data operations professionals use some form of cloud object storage today.

Yet there is an underlying issue that must be addressed if businesses are to make the most of these cloud structures: How can cloud object stores must offer more to data professionals than another form of data storage? To make the most of these vessels, your business must be able to find, use, and exploit information quickly—and that’s why the right big data platform is crucial. It can enable the creation of a single data “fabric” for the company, which accelerates delivery of insights by automating ingestion, curation, discovery, preparation, and integration across multiple data and object storage silos.

Using the Right Data Platform to Employ Storage More Effectively

As Ovum analyst Tony Baer notes, cloud object storage is relatively cheap, highly scalable, and—increasingly—accessible. The problem, however, is that too many firms treat cloud storage as an information dumping ground. Data is stored without enough consideration for analytical uses or data protection concerns, especially with the implementation of legislation such as the General Data Protection Regulation (GDPR). This both limits usefulness and increases risk.

Effective use of big data requires a combination of cloud storage, processing power, and effective applications. Think, for example, of a data scientist working for a major European car manufacturer. This data specialist wants to try a new model and is keen to take advantage of the tremendous graphical processing units (GPUs) available in the cloud.

The scientist has the data to run the model but the information is held in various stores. To really make the most of the cloud, the scientist wants to pool GPU resources to create one gigantic computing resource, a “Big Data Fabric,” where the workload can be run across multiple cloud storage platforms simultaneously from a single safe, secure, scalable and compliant platform.

This is where the right data platform plays a crucial role. By pooling resources, the scientist can use an engine—such as a machine-learning or deep-learning platform, including TensorFlow or Apache Spark—to test the model in a secure, sandboxed environment. The scientist can use data to test the model, train the machine-learning approach, and—with the business case proven—take the model back from the cloud to an on-premises platform, if needed.

Layering Up to Keep Data Safe, Secure, and Exploitable

A strong big data platform that can deliver on this must support the following critical functional layers:

First, at the bottom layer rests a series of storage vessels, where the actual data resides, either cloud-based or on-premises.

Next, across these object stores runs a computing layer that helps the business pull together data from its various storage resources. Specialists, such as the data scientist in the use case above, can run workloads across these disparate resources.

To make these workloads run effectively and securely, two further layers are crucial:

  • An operations and orchestration layer. This layer should provide an integrated view, like a single pane of glass, from which your professionals can deploy workloads across multiple providers from the cloud object store that best suits business requirements.
  • A consistent security and governance layer. Your big data platform must help your business ensure its cloud storage provision is only used in verified geographic locations. The GDPR places tight constraints on the processing of data. CIOs can use the security measures in in this type of data platform to geotag data, limiting information accessibility to specific individuals and regions. Such consistency and security mean your professionals know their data is safe and ready to be exploited.

Under such circumstances, your business avoids falling—as Tony Baer suggests can sometimes happen—between a rock and a hard place when trying to account for data.

Finding a Platform Partner to Create Game-Changing Insights

Your partner will help you take advantage of your cloud object stores. Look for a firm that is flexible about provision. While no one can doubt the popularity of cloud provision, your business requirements from on-demand IT are likely to change quickly and regularly, says analyst Gartner. It’s important that your firm does not limit itself to a specific provider.

By finding a partner that allows you to rely on a range of providers, your business can adapt its usage of cloud object stores as requirements change. Flexibility should extend to off-demand use cases, too. While more and more data science will be run on the cloud, your business may wish to derive insights from various stores, including on-premises storage.

So, look for a data platform that offers compatibility with a broad collection of cloud object stores. The platform should include a set of built-in connectors that help your business and its employees make use of a native integration with various cloud object storage services, such as Amazon S3, Microsoft WASB and GCS.

These connectors will give your business the power to access storage services through your data platform. Such connectivity produces a best-of-both-worlds scenario: Employees can work with data where it lies in a storage service, but can also use data science tools—like Apache Hive and Spark—to run analytics and develop game-changing insight.

Making the Right Choices Now

The amount of time it takes your firm to glean insight from data can greatly affect your business competitiveness. The right big data platform will help your professionals run a range of workloads across your cloud object storage. Make the right choices now and the result will be game-changing insight that powers new business models in the future.

To learn more about cloud object storage, and how to get started, keep reading here.

Mark Samuels
Freelance Journalist, Copywriter and Consultant
More by this author

Leave a comment

Your email address will not be published. Links are not permitted in comments.