The modern data stack promises modular components that can be combined together to build a modern and easy-to-use data platform. This article demonstrates an example data platform built on top of Snowflake, dbt, and Prefect. It showcases how to organize and orchestrate a variety of flows written by different teams, and how to deploy the entire project to a Kubernetes cluster on AWS.
Table of contents:
· Snowflake configuration
∘ Creating database credentials
∘ SQL alchemy connection
∘ Using the connection to load raw data (Extract & Load)
∘ Turning the extract & load script into a Prefect flow
· dbt configuration
· Deploying your flows to a remote Kubernetes cluster on AWS EKS
∘ 1. Building a custom Docker image
∘ 2. Pushing the image to ECR
∘ 3. Creating a demo Kubernetes cluster on AWS EKS
∘ 4. Deploying a Prefect’s Kubernetes agent
∘ Changing the run configuration in your flows to KubernetesRun
∘ Cleanup no longer needed AWS resources
· Building a repeatable CI/CD process