22-25 April 2026

Metadata Driven Pipelines with Python and Databricks

2022

TL; DR

In this demo filled session we'll take a journey from data pipelines built with notebooks to building scaleable, metadata driven pipelines using python functions.

Session Details

In this demo filled session we'll take a journey, starting with some simple data transforms in a notebook. - We'll look at what a Spark transformation function actually does, and how to build our own transformation functions. - We'll see how combining generic functions with some metadata can allow us to perform common data engineering tasks such as data cleansing and validation using less code, in a more testable way. - Finally we will see just how simple it is to deploy these functions into Databricks and get them to production. Attendees should come away with some ideas of how to build more scalable, metadata driven data pipelines using Databricks and Spark.

3 things you'll get out of this session