All coupons / Development

Azure Databricks and Spark SQL (Python)

Course Description

I’m Malvik Vaghadia, a Data Engineer and Architect with nearly 15 years of professional experience. I'm also a recognised Databricks Champion, an honour given to a small global community for deep platform expertise and contribution to the wider ecosystem. I’ve worked on multiple large-scale lakehouse implementations and consulted for enterprise clients. As an instructor, I’ve taught 200,000+ students worldwide and hold a 4.6+ instructor rating. Since launching this course, it has become one of Udemy’s best-sellers in the Databricks category, and this new version (Sept 2025) has been completely rebuilt with 17 hours of brand-new content. Why Learn Databricks Databricks is recognised as a Leader in the Gartner Magic Quadrant for Data & AI platforms. It has become the go-to lakehouse platform for modern data engineering, enabling organisations to build, orchestrate, and optimise pipelines at scale. By mastering Databricks, you’ll be learning one of the most in-demand skills in today’s data landscape. Course Delivery Style This course is designed with the right balance of theory, hands-on coding, and practical projects. Every concept is explained clearly, then demonstrated live in Databricks, and reinforced with a multi-phase, end-to-end project that you’ll build step by step. You’ll also get all course notebooks as downloadable materials, containing the full code, step-by-step documentation, and extra resources so you can follow along easily. Curriculum Highlights: Four Part Course Project: End-to-end NYC Taxi project and further pipeline builds across multiple parts as you develop your knowledge. Foundations: What data engineering is, why Databricks, the Spark architecture, PySpark, and the Lakehouse. Azure setup: Account creation, resources, role-based access control, naming conventions, and cost management. Databricks setup: Creating and configuring a workspace, navigating the UI, and handling personal email restrictions. Databricks notebooks and workspace: Markdown, comments, organising objects, mixing languages, and notebook tips. Databricks compute: Clusters, DBU pricing, runtimes, serverless vs all-purpose compute, instance pools, and SQL warehouses. Spark SQL (Python): Writing Spark SQL code using both SQL syntax and DataFrame APIs, reading/writing different file formats, defining schemas, and managing tables and views. PySpark Transformations: Column operations, functions, filtering, sorting, joining, aggregations, pivots, and conditional logic. Medallion architecture: Bronze, Silver, and Gold layers explained and implemented. Delta Lake: Transaction log, schema enforcement and evolution, time travel, and DML operations (MERGE, UPDATE, DELETE). Workflows and jobs: Passing parameters, handling failures, concurrency, conditional tasks, and monitoring. Git & local development: VS Code setup, linking with GitHub, repos, and workflow best practices. Functions and modularization: Creating and importing Python modules, UDFs, and project structuring. Unity Catalog & governance: Metastores, securable objects, workspace roles, external locations, and permissions. Streaming & Lakeflow pipelines: Structured Streaming concepts, Auto Loader, watermarking, triggers, and the new Lakeflow (DLT) pipeline model. Performance: Lazy evaluation, explain plans, caching, shuffles, broadcast joins, partitioning, Z-ORDER, and Liquid Clustering. Automation & CI/CD: Programmatic interaction with Databricks, CLI demo, and high-level CI/CD overview. By the end of the course, you’ll have both the knowledge and confidence to design, build, and optimise production-grade data pipelines on Databricks.