Streamlining Generative AI Deployment with New Accelerators

Streamlining Generative AI Deployment with New Accelerators

Overcoming the challenges of developing production ready Generative AI with four new ready-to-deploy Accelerators for ML Projects (AMPs)

The journey from a great idea for a Generative AI use case to deploying it in a production environment often resembles navigating a maze. Every turn presents new challenges—whether it’s technical hurdles, security concerns, or shifting priorities—that can stall progress or even force you to start over. 

Cloudera recognizes the struggles that many enterprises face when setting out on this path, and that’s why we started building Accelerators for ML Projects (AMPs).  AMPs are fully built out ML prototypes that can be deployed with a single click directly from Cloudera Machine Learning . AMPs enable data scientists to go from an idea to a fully working ML use case in a fraction of the time. By providing pre-built workflows, best practices, and integration with enterprise-grade tools, AMPs eliminate much of the complexity involved in building and deploying machine learning models.

In line with our ongoing commitment to supporting ML practitioners, Cloudera is thrilled to announce the release of five new Accelerators! These cutting-edge tools focus on trending topics in generative AI, empowering enterprises to unlock innovation and accelerate the development of impactful solutions.

Fine Tuning Studio

Fine tuning has become an important methodology for creating specialized large language models (LLM). Since LLMs are trained on essentially the entire internet, they are generalists capable of doing many different things very well. However, in order for them to truly excel at specific tasks, like code generation or language translation for rare dialects, they need to be tuned for the task with a more focused and specialized dataset. This process allows the model to refine its understanding and adapt its outputs to better suit the nuances of the specific task, making it more accurate and efficient in that domain.

The Fine Tuning Studio is a Cloudera-developed AMP that provides users with an all-encompassing application and “ecosystem” for managing, fine tuning, and evaluating LLMs. This application is a launcher that helps users organize and dispatch other Cloudera Machine Learning workloads (primarily via the Jobs feature) that are configured specifically for LLM training and evaluation type tasks.

RAG with Knowledge Graph

Retrieval Augmented Generation (RAG) has become one of the default methodologies for adding additional context to responses from a LLM. This application architecture makes use of prompt engineering and vector stores to provide an LLM with new information at the time of inference. However, the performance of RAG applications is far from perfect, prompting innovations like integrating knowledge graphs, which structure data into interconnected entities and relationships. This addition improves retrieval accuracy, contextual relevance, reasoning capabilities, and domain-specific understanding, elevating the overall effectiveness of RAG systems.

RAG with Knowledge Graph demonstrates how integrating knowledge graphs can enhance RAG performance, using a solution designed for academic research paper retrieval. The solution ingests significant AI/ML papers from arXiv into Neo4j’s knowledge graph and vector store. For the LLM, we used Meta-Llama-3.1-8B-Instruct which can be leveraged both remotely or locally. To highlight the improvements that knowledge graphs deliver to RAG, the UI compares the results with and without a knowledge graph.

PromptBrew by Vertav

80% of Generative AI success depends on prompting and yet most AI developers can’t write good prompts. This gap in prompt engineering skills often leads to suboptimal results, as the effectiveness of generative AI models largely hinges on how well they are guided through instructions. Crafting precise, clear, and contextually appropriate prompts is crucial for maximizing the model’s capabilities. Without well-designed prompts, even the most advanced models can produce irrelevant, ambiguous, or low-quality outputs.

PromptBrew provides AI-powered assistance to help developers craft high-performing, reliable prompts with ease. Whether you’re starting with a specific project goal or a draft prompt, PromptBrew guides you through a streamlined process, offering suggestions and optimizations to refine your prompts. By generating multiple candidate prompts and recommending enhancements, it ensures that your inputs are tailored for the best possible outcomes. These optimized prompts can then be seamlessly integrated into your project workflow, improving performance and accuracy in generative AI applications.

Chat with your Documents  

This AMP showcases how to build a chatbot using an open-source, pre-trained, instruction-following Large Language Model (LLM). The chatbot’s responses are improved by providing it with context from an internal knowledge base, created from documents uploaded by users. This context is retrieved through semantic search, powered by an open-source vector database.

In comparison to the original LLM Chatbot Augmented with Enterprise Data AMP, this version includes new features such as user document ingestion, automatic question generation, and result streaming. It also leverages Llama Index to implement the RAG pipeline.

To learn more, click here.

Jacob Bengtson
More by this author

Leave a comment

Your email address will not be published. Links are not permitted in comments.