How to Build AI Agents for Data Analytics: A Practical Guide with Python

by Dr. Phil Winder , CEO

Tired of static dashboards that never quite answer the real question you want to ask?

AI language models, popularised by ChatGPT, have gained notoriety for being able to understand queries with human-levels of quality. By plugging this bots into your data sources you can finally get all the answers you desire, by simply asking. (I still say please and thank you!)

In this article, I explore what these so-called “AI Agents” are (not the greatest name), how they work their magic, and the fun things you can do once connected to your data. We’ll dive into some example code and architectures and show off some (hopefully) working demos.

Download Slides Download Demo Python Code

AI Agent Fundamentals

AI Agent architecture showing MDP loop with ML model

What is an AI Agent?

An AI agent is a while loop with a large language model in the middle. For each part of a word (a token) the model predicts what the next token should be. Agents are guided by prompts (a form of meta-programming) to achieve specific tasks, like write an SQL query.

This is formalised as a Markov decision process (MDP), an idea taken from reinforcement learning (RL), without a reward. It comprises of an agent (the ML model in this case) and an environment (the text being streamed in and out). During RL training, the model receives a reward to update the model, but during inference the model is frozen.

How AI Agents Work

Flowchart showing how AI agents process queries and take actions

AI agents work by looping around, predicting whether the next turn is actionable or not. By actionable I mean does the agent think it can answer the user’s query yet. If no, then predict what the agent needs to do in order to solve the task at hand. If yes, then respond to the user.

This is the heart of the agentic loop and each AI assistant encodes this process differently.

Agents are often connected to external tools that allow it to take more actions, like connecting ChatGPT to BigQuery. These tools often operate within a small scope and at a low level. For example, read file, edit file, call a REST API, etc.

But because Agents are powered by LLMs and are therefore statistical ML models, they only work correctly < 100% of the time. AI coding assistants in particular can end up compounding small errors to totally mess up a project.

Code Demo #1: Basic Agent

Python code example of basic AI agent implementation

For the demos I chose to use the Pydantic AI framework. It’s lightweight and easy to understand. But the recommendation is that in most cases frameworks like this are good for demos or POCs, but for production develop your own wrappers to have greater control and avoid dependencies.

This basic example consists of:

  • System prompt (instructions)
  • Tool definition with docstring (crucial for LLM to understand when to use)
  • Docstring = prompt engineering opportunity
  • Memory/history needed for context awareness

AI Agents for Analytics

Download Demo Python Code

The main demo consists of code that generates natural language SQL queries for BigQuery that allows your user to query AI agents for business analytics data. This is best walked through, so please view the video to learn more.

The Problem

Ever since Google released GA4, I’ve always struggled to get plots of really basic information. Simple questions should provide a simple answer.

In the real world, marketing teams need flexibility without technical hurdles like GA4. Agents are one potential way to deliver a user interface that is both flexible and simple. Agents probably won’t replace dashboards, but they are a very useful addition.

Implementation Details

I used a public GA4 demo dataset from Google with anonymized (but real) e-commerce transactions. I loaded this into my BigQuery account and configured access. See the README to learn more.

I then passed in a view of the database schema in the system prompt. This is crucial for the agent to understand what columns it can use in the BigQuery calls. I also needed to do a bit of prompt engineering on the Agent’s BigQuery tool to ensure it was creating BigQuery-specific SQL queries.

You could also add more tools to inspect this information dynamically, but since each LLM + tool call takes approximately 1s, the latency of those round trips adds up quickly.

With this setup, you can call a wide variety of example queries:

  1. Time period coverage
  2. Total revenue calculations
  3. Sales counts
  4. Group by operations (monthly revenue)
  5. Traffic source analysis
  6. Comparative analysis (Dec vs Nov)
  7. Plotting capabilities with Plotly

Hints and Tips

Here are a selection of notes that I captured during development:

  • Include table/column schemas in context
  • Specify BigQuery-specific syntax
  • Handle timestamp formatting issues
  • Guide toward correct table patterns (daily tables with wildcards)
  • Most engineering effort goes to robustness
  • Different requirements for POC vs production
  • Self-correction helps but not perfect

Future Direction

If you are thinking about pushing a project like this into production then I would suggest the following.

Start looking for opportunities to segment the solution into specialized sub-agents. For example, a business analyst agent and a data engineering agent. Use a supervisory agent to coordinate these and report back to the user. This helps with testing and scaling but may also be overly complex for simple projects.

I don’t think dashboards are going away, so you might consider incorporating dashboards into the agents. Maybe the agents could develop dashboards for your users? But for ad-hoc queries that aren’t repeated, some kind of text + visual output is ideal.

Conclusion

AI agents are a new UX paradigm that is gaining traction throughout the business, but the end game is unclear. It’s likely that they are going to be used in conjunction with traditional technologies, like business intelligence and analytics dashboards.

Most of the work in building AI agents is building in resilience and plumbing. Knowledge comes from external enterprise data sources which is easy to add but hard to perfect. Reliability is the key challenge here.

The demos I provided are very simple and don’t consider any production challenges or add any tests for robustness. If you are interested in a commercial deliverable, then please consider our bespoke AI agent development services.

More articles

AI-Native Transformation: How AI is Driving Organisational Change

Learn how AI-native transformation impacts businesses, drives change, and how to prepare your organisation for future waves of innovation.

Read more

AI Strategy for CEOs: Aligning Tech with Business Goals

Phil Winder interviews Charles Humble discussing the state of AI strategy in business. Learn how to align the use of AI with your business goals.

Read more
}