๐ฒ Your best Bet: Effortless MLOps with dbt
2025TL; DR
Do you find it takes too long to deploy your ML models responsibly in production? Say goodbye to complex integrations of containerized training and inference, preprocessing pipelines, model registries, monitoring frameworks, feature stores, and prediction stores. As demonstrated in this talk, all of these concepts can be easily implemented within dbt using only Python and SQL, the two languages data scientists truly love to write in.
Session Details
In just a few short years, dbt has taken the data engineering world by storm, becoming the de facto standard for data transformation pipelines. Its primary strength, which endures, is SQL, a ubiquitous programming language shared by data analysts, scientists, and engineers.
However, there are limitations to what SQL can achieve. With the introduction of Python models since version 1.3, data pipelines in dbt have become significantly more expressive, to the extent that you can implement and orchestrate entire batch machine learning pipelines. By harnessing the inherent power of dbt-core, classic SQL, and the richness of Python and its ecosystem, you can finally create a machine learning workflow that is accessible to everyone on your team.
In this talk, we will delve into the dos and don'ts of using Python models in dbt taken from real life professional experience, illustrated through an exemplary daily-running machine learning pipeline aimed at beating football odds provided by bookmakers, utilizing the [European Soccer Dataset](https://www.kaggle.com/datasets/hugomathien/soccer). By the end of this session, you will have a firm grasp of key design patterns for a successful machine learning project within dbt.
**Prerequisites**:
- Required: python, SQL, a base understanding of ML
- Optional: dbt, JinJa, MLOps concepts
However, there are limitations to what SQL can achieve. With the introduction of Python models since version 1.3, data pipelines in dbt have become significantly more expressive, to the extent that you can implement and orchestrate entire batch machine learning pipelines. By harnessing the inherent power of dbt-core, classic SQL, and the richness of Python and its ecosystem, you can finally create a machine learning workflow that is accessible to everyone on your team.
In this talk, we will delve into the dos and don'ts of using Python models in dbt taken from real life professional experience, illustrated through an exemplary daily-running machine learning pipeline aimed at beating football odds provided by bookmakers, utilizing the [European Soccer Dataset](https://www.kaggle.com/datasets/hugomathien/soccer). By the end of this session, you will have a firm grasp of key design patterns for a successful machine learning project within dbt.
**Prerequisites**:
- Required: python, SQL, a base understanding of ML
- Optional: dbt, JinJa, MLOps concepts
3 things you'll get out of this session
Key takeaways for spectators:
- How dbt python models function under the hood
- How dbt can leverage simple ML pipeline design that is open to data scientists with a less technical background
- Enriched knowledge of dbt features to build design patterns to support every step in your MLOps journey