Evaluating LLMs in Databricks with RAGAS and MLFlow

2025

TL; DR

Evaluating LLMs is essential for ensuring they perform accurately and align with safety standards. This talk explores two frameworks for LLM Evaluation: RAGAS and MLFlow. We’ll explore practical applications of these frameworks, including a live demo that walks through setting up an evaluation pipeline, monitoring results, and refining metrics.

Session Details

As Large Language Models (LLMs) become increasingly embedded in real-world applications, evaluating their performance and alignment with safety standards has never been more critical. This session delves into the importance of robust evaluation strategies for LLMs and introduces two powerful frameworks: RAGAS and MLFlow.

Attendees will gain insights into how these frameworks can be leveraged to design comprehensive evaluation pipelines tailored to specific use cases. We’ll discuss their core features, strengths, and practical implementations, guiding you from setup to execution. The session includes a live demonstration showcasing the end-to-end process of configuring an evaluation pipeline, monitoring performance metrics, identifying areas for improvement, and refining the model based on evaluation outcomes.

Whether you're developing GenAI-driven applications or looking to optimize existing systems, this talk will equip you with actionable strategies to ensure your models are both effective and aligned with safety and ethical guidelines. Join us to elevate your LLM evaluation practices with hands-on tools and real-world examples.

3 things you'll get out of this session

Show attendees the importance of evaluating LLMs Showcase the two industry standard frameworks for evaluating GenAI Communicate the common pitfalls with evaluating GenAI models vas traditional ML models

Tori Tompkins's other proposed sessions for 2026

Unlocking the Potential of Retrieval-Augmented Generation (RAG) with Advanced Patterns - 2026

You Think Your MLOps Can Scale GenAI? Think Again. - 2026

AgentBricks vs Mosaic AI - 2026

No Code Agents with AgentBricks - 2026

Real-Time AI with Databricks Online Feature Stores: Powered by Lakebase - 2026

Tori Tompkins's previous sessions

A Data Engineer, Scientist and Analyst Walk Into a Bar

In the fast-growing world of data, there's no one-size-fits-all skill set for data professionals. Join us as we define roles, explore essential skillsets, and share our personal journeys, revealing how these roles collaborate in daily operations and shape typical data projects.

Evaluating LLMs in Databricks with RAGAS and MLFlow

TL; DR

Session Details

3 things you'll get out of this session

Speakers

Tori Tompkins

toritompkins.co.uk/blog

Tori Tompkins's other proposed sessions for 2026

Tori Tompkins's previous sessions