How can I get started with deployments?
Discourse - getting started recipes incl. AWS, GCP and Azure specific setup
With the General Availability release of Prefect 2.0, we modified the approach to creating deployments.
This approach is:
more intuitive - you can create your deployments from the UI and CLI,
compatible with CI/CD frameworks,
and more comprehensive, allowing you to use storage blocks not only for the flow code, but also for your custom modules and files, such as Python modules, configuration files, SQL scripts and dbt models!
For a detailed description, see the documentation .
How…
How to deploy all flows from a project at once? (both locally and from CI/CD)
This Discourse topic includes plenty of practical deployment CLI commands :
Each of the following GitHub repositories contains a file that allows to easily create deployments for all projects:
Manually from any terminal - using deploy_flows.bash
From CI/CD - using the GitHub Actions workflows in the directory .github
Repository for AWS ECS
Repository for AWS EKS
Example bash scripts to create deployments for all flows
DEFAULT STORAGE & INFRASTRUCTURE: locally stored flow code + Local Process; -a stands for --apply; no upload is happening because no re…
Docs
Blog posts
Getting started repositories with sample project structure
Repository templates
Repositories with examples
How to use the deployment CLI?
We assume that an agent is spun up with a work queue name dev in all the examples below.:
prefect agent start -q dev
What does prefect deployment build do?
The build step is needed to:
Access the flow object and extract information from it, such as:
parameter schema
parameter defaults
flow entrypoint
Upload the entire workflow directory to a configurable storage location (the default is a local file system that does not perform any movement of files - instead, it only stores the locat…
Platform
Storage Block
Infrastructure Block
End Result
CLI Build Command for hello.py flow with flow function hello
Local/VM
N/A
N/A
Local storage and local process on the same machine from which you created a deployment
prefect deployment build hello.py :hello -n implicit -q dev
Local/VM
N/A
N/A
Local storage and local process on the same machine from which you created a deployment — but with version and storing the output YAML manifest with the given file name in the deploy directory
p…
Can I re-upload my code without recreating a deployment?
Yes, but only if the parameter schema for the flow remains unchanged!
This is why the manifest file exists - to detect changes that could break a deployment.
Why does the build
step need to evaluate the flow object?
To extract parameter schema and import paths for the entrypoint.
Can the apply
step be executed from an environment that doesn’t have access to the flow code?
Yes, e.g., from a CI/CD tool or build server, as long as the YAML file has been already generated.
Do I need to keep and version-control the YAML file?
Storing the YAML file in your project repository with flows is entirely optional. If you want to, you can version-control those YAML files as a scaffold of deployment information for each flow.
One common use case we anticipate is using the YAML files as build artifacts of the CI/CD process, and relying purely on blocks and CLI as a way to continuously send deployment metadata to the Prefect backend.
How can I build a CI/CD pipeline for my flow deployments?
Here are some repository templates and resources to help you build a CI/CD pipeline for your flow deployments.
Prefect 2.0 and Cloud 2.0 recipes
Repository templates
Repositories with code examples
Self-hosted Prefect 2.0 recipes
Creating a CI/CD pipeline for a self-hosted Orion instance is difficult since you would likely want to deploy this to private infrastructure, preventing using tools such as GitHub Actions.
You may explore:
custom GitHub Actions runners d…
Can I specify deployments using Python code rather than CLI?
Yes, that’s possible by using the prefect.deployments.Deployment class.
from flows.healthcheck import healthcheck
from prefect.deployments import Deployment
from prefect.filesystems import S3
deployment = Deployment.build_from_flow(
flow=healthcheck,
name="pythonic_s3",
description="hi!",
version="v1",
work_queue_name="dev",
tags=["myproject"],
storage=S3.load("dev"),
infra_overrides=dict(env={"PREFECT_LOGGING_LEVEL": "DEBUG"}),
output="your.yaml"
)
if __na…
What is the relationship between an agent, work queue and a deployment?
The following diagram helps illustrate the relationships between different objects, including the underlying schema (at the time of writing).
[q]
Note that the manifest_path is marked red since this argument is no longer used, and it’s maintained in the schema only for backward compatibility with Prefect ≤ 2.0.3.
Deployments and work queues
Deployments and work queues have a one-to-many relationship.
Any given deployment can have only one work queue name assigned (e.g. -q dev …
I’m getting an error “File system … could not be created. You are likely missing a Python module required to use the given storage protocol” - how to solve that issue?
There are several ways you can solve that problem:
Install the required file-system subpackage in your environment, e.g.:
pip install s3fs
pip install gcsfs
pip install adlfs
Include it as environment variable called EXTRA_PIP_PACKAGES - works only for DockerContainer block - docs
Build a custom image with those dependencies - you can use this GitHub Actions workflow template to build and push such custom DockerHub image in a single click:
You can configure and trigger the image bu…
What happens when agent picks up a deployed flow run from a work queue?
It loads the deployment’s infrastructure block; if no block is associated with this deployment, the agent will use a preconfigured default configuration that can be configured on a per-agent basis
It executes the command python -m prefect.engine $FLOW_RUN_ID within this infrastructure
It then loads the deployment’s storage block, which has the ability to download the flow’s definition along with any supporting files into a temporary directory from which execution proceeds
What is the single source of truth for my deployment definition?
TL;DR it’s the state of flow deployment on the API backend, not the YAML file, Python declaration, or CLI command - those are only used to create or modify a deployment, but the API representation is the single source of truth.
Deployment is an API-first concept. Therefore the API is a single source of truth rather than the YAML manifest or CLI command.
This means that anytime you modify any attribute of deployment from the UI, CLI, or via an API call (incl. changing default parameters or schedules, storage, and infrastructure blocks), any of those actions trigger a change in the corresponding deployment attribute.
Relying purely on YAML for that can only serve as a single source of truth if the YAML file is the only…
How can I run my flow in a Docker container?
Prefect 2.0
In Prefect 2.0, deployments are extremely flexible and allow you to use any type of infrastructure - from a local process, a Docker container to a job running on a remote Kubernetes cluster.
Here is the syntax to build and create dockerized flow run deployments:
Build step:
prefect deployment build path/to/flow_script.py:flow_name \
--name deployment_name --tag dev -sb storage_block_type/storage_block_name \
-ib infrastructure_block_type/infrastructure_block_name
In practic…
How to build deployments with flow code and dependencies being baked into a Docker image?
After we introduced the new way of deploying flows in the Prefect 2.0 General Availability release, many users requested a setup where they don’t need to have a separation between storage and infrastructure blocks and can bake all flow code and module dependencies into a single Docker image.
With this PR , we introduced:
--apply flag on build CLI and a corresponding apply kwarg on Deployment.build_from_flow allowing you to build and apply the deployment in one step
automatic detection of d…
How can I delete all deployments to clear my workspace?
Some users like to operate in a Terraform-style and have each CI/CD workflow deleting all existing flow deployments and creating new ones.
Here is how you can delete all deployments existing in your workspace (note, this is irreversible ):
import asyncio
from prefect.client import get_client
async def remove_all_deployments():
client = get_client()
deployments = await client.read_deployments()
for deployment in deployments:
print(f"Deleting deployment: {deployme…
How can I toggle the schedule off for my flow deployment? (pause a schedule)
There are several ways to do it.
UI
[image]
API call
Prefect Cloud
import requests
API_KEY = "pnu_xxxx"
def set_schedule_inactive(
deployment_id: str,
base_url="https://api.prefect.cloud/api/accounts/c5276cbb-62a2-4501-b64a-74d3d900d781/workspaces/aaeffa0e-13fa-460e-a1f9-79b53c05ab36",
):
return requests.post(
url=f"{base_url}/deployments/{deployment_id}/set_schedule_inactive",
headers={
"Content-Type": "application/json",
"Authorizati…
Do I need to schedule my flows, or can I run those based on events?
How to run my flows from AWS Lambda?
How to deploy my flow as a continuous real-time streaming service to AWS?
How can I orchestrate multiple deployments in a flow of flows (orchestrator pattern)?
Currently, there is no special task for that. You would need to
retrieve the deployment ID corresponding to the the flow you try to trigger, e.g. using prefect deployment ls
[image]
use Orion client as follows:
import asyncio
from prefect.client import get_client
async def main():
async with get_client() as client:
depl_id = "074db2e5-229a-460e-85ad-fca31b379fd2"
response = await client.create_flow_run_from_deployment(depl_id)
print(response)
if __name_…
How to implement a manual approval
In certain scenarios, you may want to continue processing your flow only once you’ve checked the status of previous processing.
Concrete scenarios
This typically involves some semi-manual processes that involve both some automation and some human component, e.g.:
Run automated data extraction, transformation, and processing; then automatically inform some stakeholders about the status and let them know this data/process is ready for a manual quality check; once this person manually gives …
Scheduling FAQ
Yes! Your run will still be executed.
The toggle is only used for scheduling. This means that even if you set the schedule as inactive, the deployment still exists and can be triggered ad-hoc from the UI, CLI, run_deployment Python function or via an API call.
[image]
The UI makes it easy to distinguish between scheduled runs and ad-hoc runs from deployment thanks to the auto-scheduled tag applied only to the scheduled runs:
[image]
2 Likes