Two of the more painful things in your everyday life as an analyst or SQL worker are not getting easy access to data when you need it, or not having easy to use, useful tools available to you that don’t get in your way! As one of my dear customers, a data worker in Pharma, said to me: “I really don’t care about bells and whistles, I just want to get my task done.” This simple statement captures the essence of almost 10 years of SQL development with modern data warehousing. And it is the tone of where SQL tools now are heading. Less is more. Simple flow and transition through A-Z data worker tasks and the ability to integrate with subsequent workflows easily – e.g. collaborate and share. That’s it.
Ease of use, seamless integration, and “less coding” are the themes of everyday desires from modern data and SQL workers. Often their workflow starts with a simple copy-paste from someone else’s code and then a series of iterative modifications, preferably as little as possible, from working code snippets. It is less about creating things from scratch and more about augmenting what is already there – building upon what is already working! Then rinse and repeat. Important to note, however, is that this trend comes from higher pressure on time to value and faster SLAs, not the lack of creativity or skills!
This task-completion focused trend has also been the product philosophy of Cloudera’s SQL Workbench (also known as HUE) for years. We have evolved with our users, from early-on Hadoop hackers needing quick access to data in the Data Lake, to a much more sophisticated SQL tool. We have done so through smart integration and abstractions aimed to ease the backend complexity. Our product mission is to help the SQL worker achieve their common tasks in the least intrusive, yet most efficient and helpful way. We want to expedite the A-Z workflow – with intent and focus – so that the user can “get the task done”.
Our mission is to strive for a frictionless SQL experience. The four main pillars of our SQL Tool Design Philosophy consists of:
- Find and understand data – with confidence
- Design and iterate over queries – quickly and easily
- Optimize and troubleshoot – with intelligence
- Collaborate or automate – seamlessly
Intelligent Data Navigation and Discovery
Cloudera’s SQL Workbench helps you find the right tables faster and allows you to sample table data within the tool. If you have data in some other database and want to correlate it with data in your Data Cloud, you can also easily upload csv files or connect to another database for import. HUE will automatically recognize the DDL and create a suggested table for you, that you then can edit to your preference. In some of our Cloudera platform deployment options we also have popularity-based, built-in data guidance. By seeing what data sets are more commonly used, users immediately get confidence which tables or views are more likely to be the right data to work with.
Efficient Query Design
SQL Workers rarely start from scratch. With Cloudera’s SQL Workbench you have access to saved queries, shared queries, and your own query history instantly at log in. You can easily find somewhere to start from and kick-start your task at hand. Our syntax parser is one of the best in the SQL Tool Market, according to our customers’ HUE-users — it auto-completes tables, sql commands, and popular joins, while also making sure the SQL syntax is compliant with Hive QL and Impala QL. If the user needs extra help we have built in full language guides that can be accessed from within the tool and don’t force a user context switch to go find resources on the web.
Optimization as you go
In some flavors of Cloudera’s platform, you can configure hints on query design as you go, which inform the user how they can achieve more efficient (i.e. less resource-consuming or shorter running) queries. There is also excellent tooling for introspecting a query in case it does not perform the way you expected it to – following the query parser steps and DAG of various queries, as well as comparing two queries side by side – but more about that in a future blog.
Having these kinds of deep and full insight into how already executed queries perform is an essential tool in an SQL worker’s ability to self-serve optimization and thereby reduce the impact on resource consumption and SLAs. This is also an area where we continuously invest. Imagine a future where you can actually see the resource or cloud cost impact of a query you are about to run, and advise how to optimize it to be more cost-efficient?
Seamless Collaboration – more important than ever
After an SQL worker has created a query, it is usually not just about running it and viewing some results. It is more often than not a pre-step to schedule the workload on a regular basis. Even more commonly the user is on a mission of sharing the query or result table with a larger data consuming audience – be it uploading the query to github, slacking or emailing it to a coworker, or saving it for reuse among SQL workers across the organization.
HUE comes with a variety of collaboration options – download query files so they can be uploaded to git for versioning of projects, ability to share queries among users. HUE also comes with a simplistic form of pre-visualization of results and download result sets as csv files or pdfs, for local exploration or further insight sharing.
Many seek to also share the result table further via an Enterprise BI Tool (e.g. Qlik, Tableau, etc). It, therefore, makes sense to provide a seamless transition from the context of HUE to Cloudera’s new, built-in Data Visualization tool. Cloudera Visualization allows dashboarding across all the data compute options within Cloudera’s Data Platform (e.g. correlate structured and unstructured data, next to each other, while allowing the results of a production ML model to display and help filter results). The frictionless transition from SQL editing in HUE to visualization in Data Visualization is actually in the works (a tech preview feature at the moment) and we’ll share more about this later. Basically, you configure HUE to be able to export the query context to a Cloudera Data Visualization instance, when the visualization icon button is pressed. The user can now continue with the powerful features of Data Visualization to generate a dashboard of the result table of the query designed in HUE. We see this feature, and many more to come, as a bridge-builder and an accelerator for different personas in an organization to fuel each other and collaborate towards faster insights for the business overall.
Cloudera is on a mission to further expedite each step of the SQL worker’s workflow, and every transition involved within that process. HUE is a rich tool with lots of helpful functionality and many optimizations already built-in. But there is plenty more to come! It is the most popular SQL editing tool on the market today for modern data warehousing and is used by millions of users, worldwide, on a daily basis. Get started using HUE in a Cloudera Data Platform Private Cloud 60-day trial.