LLM Architecture: RAG Implementation and Design Patterns

by Dr. Phil Winder , CEO


When: Wed Apr 24, 2024 at 16:30 +0100

Retrieval augmented generation (RAG) has emerged as one of the best ways to incorporate private or niche knowledge. But when designing RAG solutions for production use cases, a wide range of architectural arise. From simple questions like where to store the embeddings, to more technical problems like how to continuously improve retrieval performance.

This presentation investigates several common production-ready architectures for RAG and discusses the pros and cons of each. At the end of this talk you will be able to help design RAG augmented LLM architectures that best fit your use case.

Although all of our talks have beginner-friendly introductions, this presentation will discuss architectural components at a high level. Therefore it would be beneficial if you already have an understanding of machine learning and architectural development.

More articles

Scaling StableAudio.com Generative Models Globally with NVIDIA Triton & Sagemaker

Learn from the trials and tribulations of scaling audio diffusion models with NVIDIA's Triton Inference Server and AWS Sagemaker.

Read more

Big Data in LLMs with Retrieval-Augmented Generation (RAG)

Explore how Retrieval-Augmented Generation (RAG) enhances Language Models by utilizing indexing, retrieval, and generation for up-to-date data access.

Read more