Schema Madness, handling (incompatible) schema changes in incoming json files with Databricks
2022TL; DR
Databricks has a schema evolution feature which can automatically handle schema changes. This seems amazing in theory, but there are quite a bit of practical gotcha's I've run into. This is a notes-from-the-field demo rich session where I show you what problems I ran into and how to fix them.
Session Details
Databricks has a schema evolution feature which can automatically handle schema changes.
This seems amazing in theory, but there are quite a bit of practical gotcha's I've run into.
This is a notes-from-the-field demo rich session where I show you what problems I ran into and how to fix them.
3 things you'll get out of this session
Speakers
André Kamman's previous sessions
FinOps, how data engineers get their cloud cost under control
Managing cloud cost is no longer a "management approves the budget" type of thing. Cloud Engineers need to architecht their solutions in such a way that cost can be kept under control. This is not a one time thing. Monitoring, automatic downsizing, re-factoring are all parts of the yearly tasks of any cloud team. We'll discuss theory, techniques, best practices and lessons learned.
Generate test data quick, easy and lots of it with the Databricks Labs Data Generator
We're not supposed to use production in dev right! But generating proper test data is not easy, get's even harder when you need quite a lot of it. I generate Terabytes of it, and without much trouble. Let me show you how!
Keynote by The Community
Ben and Rob have found some wonderful folk to actually do the important parts of the community keynote. on the theme of
How to be a nonpassive member of the data community
Building your first Metadata Driven Azure Data Factory
Let's unleash the true power of ADF, it's ability to dynamically inject metadata almost anywhere. No complicated frameworks in this session, I'll show you some simple but very powerful examples.
Looking under the hood of the parquet format
Understanding how the parquet format works helps with understanding why it can help you retreive your data fast, or perhaps why you struggle to get the desired performance out of your design.