AI Confidence: Evaluating AI Models with Synthetic Data

by Dr. Phil Winder , CEO

When: Wed Nov 12, 2025 at 16:30 +0100

About This Talk

Building AI-powered applications is exciting, until you need to evaluate whether your model actually works. How do you test an AI system that gives different answers every time? How do you gather enough test data to be confident in your results? This one-hour practical webinar will transform you from uncertain to confident about AI evaluation.

In this demo-heavy session, you’ll discover why traditional software testing doesn’t work. Why you need to adopt a statistical mindset and get comfortable with non-determinism. We’ll walk through real-world scenarios using a brewery operations email classification system to demonstrate practical evaluation techniques you can apply immediately.

Key Topics:

  • The Evaluation Mindset Shift: Why assert response == expected fails with AI, and what to do instead
  • Building Effective Benchmarks: Creating test datasets that actually measure what matters for your application
  • Synthetic Data Generation: Using AI to generate realistic test data when real examples are scarce, privacy-sensitive, or don’t cover edge cases
  • Practical Evaluation with Promptfoo: Hands-on demonstration of systematic testing with multiple test cases, custom assertions, and performance tracking
  • Live Demonstrations: You’ll see working code and real examples throughout, including:
    • Generating synthetic test emails using local LLMs
    • Running systematic evaluations with visual results
    • Comparing model performance with concrete metrics
    • Handling the non-deterministic nature of AI outputs

Technical Requirements: All demonstrations use open-source tools (Ollama, Promptfoo) running locally.

We’ll focus on small, fast models (qwen3:1.7b) that anyone can run on standard hardware.

Duration: 60 minutes (40 min presentation + 20 min Q&A)

Format: Live demonstration with real code examples.

More Events

}